Skip to content

BCF reader partial unpacking for site-only queries #411

Open
@bguo068

Description

@bguo068

Thank you for the wonderful binding to htslib c library. It makes htslib so much easier to work with.

Is there a way to unpack the site-only information to quickly get site information when the BCF file contains 200k samples?

Currently, I experimented with cloning the repo and changing the BCF_UN_ALL to BCF_UN_SHR in
htslib::bcf_unpack(record.inner_mut(), htslib::BCF_UN_ALL as i32);. It worked and ran very quickly.
However, it might be unsafe if we are trying to access genotype data from the resulting record.

Any suggestions or plans to provide a rust interface for partial unpacking?

https://github.com/rust-bio/rust-htslib/blob/3008a131f241b423d041c756fc96410f6412e3d8/src/bcf/mod.rs#L210C6-L223

    fn read(&mut self, record: &mut record::Record) -> Option<Result<()>> {
        match unsafe { htslib::bcf_read(self.inner, self.header.inner, record.inner) } {
            0 => {
                unsafe {
                    // Always unpack record.
                    htslib::bcf_unpack(record.inner_mut(), htslib::BCF_UN_ALL as i32);
                }
                record.set_header(Rc::clone(&self.header));
                Some(Ok(()))
            }
            -1 => None,
            _ => Some(Err(Error::BcfInvalidRecord)),
        }
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions