Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 34 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,36 @@
# B.I.O -- The Biological Input-Output library
# B.I.O. – the Biological Input/Output library

B.I.O. is a C++ library for reading and writing files in the field of Bioinformatics and in particular sequence
analysis. It provides easy-to-use interfaces for the following formats:

* Plain I/O: plain-text, CSV, TSV, …
* Map I/O: SAM, BAM, …
* Seq I/O: FastA, FastQ, …
* Var I/O: VCF, BCF, …

The primary goal of this library is to offer higher level abstractions than the C libraries typically used in this
domain (e.g. htslib) while at the same time offering an excellent performance.
It hopes to offer a modern, well-integrated design that covers most typical I/O use-cases Bioinformaticians encounter.

The library relies strongly on *Modern C++* and plays well with other Modern C++ libraries.

Please see the [online documentation](TODO) for more details.

## Current state

The library is currently under heavy development. There is no release, yet, and all interfaces are subject to change.

## Dependencies

| | requirement | version | comment |
|-------------------|-------------------------------------------|----------|---------------------------------------------|
|**compiler** | [GCC](https://gcc.gnu.org) | ≥ 10 | no other compiler is currently supported! |
|**required libs** | [SeqAn3](https://github.com/seqan/seqan3) | ≥ 3 | |
|**optional libs** | [zlib](https://github.com/madler/zlib) | ≥ 1.2 | required for `*.gz` and `.bam` file support |
| | [bzip2](https://www.sourceware.org/bzip2) | ≥ 1.0 | required for `*.bz2` file support |

## Usage

* Using the library entails no build-steps, it is header-only and can be used as-is.
* A single-header version is available (TODO).
* CMake files are provided for easy integration into applications (and automatic detection/inclusion of dependencies).
367 changes: 173 additions & 194 deletions include/bio/format/bcf_input_handler.hpp

Large diffs are not rendered by default.

23 changes: 8 additions & 15 deletions include/bio/format/bcf_output_handler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -555,13 +555,7 @@ class format_output_handler<bcf> : public format_output_handler_base<format_outp
}

//!\brief Overload for n_fmt.
void set_core_n_fmt(auto & field)
{
if constexpr (detail::genotypes_vcf_style_writer_concept<decltype(field)>)
record_core.n_fmt = std::ranges::distance(detail::get_first(field));
else
record_core.n_fmt = detail::range_or_tuple_size(field);
}
void set_core_n_fmt(auto & field) { record_core.n_fmt = detail::range_or_tuple_size(field); }
//!\}

/*!\name Field writers
Expand Down Expand Up @@ -651,7 +645,7 @@ class format_output_handler<bcf> : public format_output_handler_base<format_outp
// explicit integer width given in header
if (hdr_entry.other_fields.find("IntegerBits") != hdr_entry.other_fields.end())
{
desc = detail::dynamic_type_id_2_type_descriptor(hdr_entry.type);
desc = detail::value_type_id_2_type_descriptor(hdr_entry.type);
if (!detail::type_descriptor_is_int(desc)) // ignore header value if it isn't intX
desc = c_desc;
}
Expand All @@ -665,7 +659,7 @@ class format_output_handler<bcf> : public format_output_handler_base<format_outp

if (verify_header_types)
{
detail::bcf_type_descriptor header_desc = detail::dynamic_type_id_2_type_descriptor(hdr_entry.type);
detail::bcf_type_descriptor header_desc = detail::value_type_id_2_type_descriptor(hdr_entry.type);
if (desc != header_desc || !detail::type_descriptor_is_int(desc) ||
!detail::type_descriptor_is_int(header_desc))
{
Expand Down Expand Up @@ -707,7 +701,7 @@ class format_output_handler<bcf> : public format_output_handler_base<format_outp
var_io::header::info_t const & info = header->infos.at(header->idx_to_info_pos().at(idx));

/* VALUE */
if constexpr (detail::is_dynamic_type<value_t>)
if constexpr (detail::is_info_element_value_type<value_t>)
{
auto func = [&](auto & param) { write_typed_data(param, get_desc(param, info)); };
std::visit(func, value);
Expand Down Expand Up @@ -950,15 +944,15 @@ class format_output_handler<bcf> : public format_output_handler_base<format_outp
}
};

if constexpr (detail::is_dynamic_vector_type<value_t>)
if constexpr (detail::is_genotype_element_value_type<value_t>)
std::visit(func, value);
else
func(value);
}

//!\brief Overload for GENOTYPES; genotypes_bcf_style.
//!\brief Overload for GENOTYPES.
template <std::ranges::forward_range range_t>
requires(detail::genotype_bcf_style_writer_concept<std::ranges::range_reference_t<range_t>>)
requires(detail::genotype_writer_concept<std::ranges::range_reference_t<range_t>>)
void write_field(vtag_t<field::genotypes> /**/, range_t && range)
{
for (auto && genotype : range)
Expand All @@ -967,13 +961,12 @@ class format_output_handler<bcf> : public format_output_handler_base<format_outp

//!\brief Overload for GENOTYPES; tuple of pairs.
template <typename... elem_ts>
requires(detail::genotype_bcf_style_writer_concept<elem_ts> &&...)
requires(detail::genotype_writer_concept<elem_ts> &&...)
void write_field(vtag_t<field::genotypes> /**/, std::tuple<elem_ts...> & tup) // TODO add const version
{
auto func = [&](auto &... field) { (write_genotypes_element(field), ...); };
std::apply(func, tup);
}
// TODO vcf-style
//!\}

//!\brief Write the header.
Expand Down
Loading