Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcf_to_hdf5 and tetrad: 'str' object has no attribute 'decode' #451

Closed
casparbein opened this issue Jul 19, 2021 · 1 comment
Closed

vcf_to_hdf5 and tetrad: 'str' object has no attribute 'decode' #451

casparbein opened this issue Jul 19, 2021 · 1 comment

Comments

@casparbein
Copy link

Hi,

in order to run a tetrad analysis on a couple of SNP datasets that we filtered with vcftools, I converted a bunch of vcf-files that were originally produced as output of an ipyrad assembly and subsequently filtered with vcftools into hdf5-files.
I used the following command from the ipyrad analysis toolkit cookbook:

converter = ipa.vcf_to_hdf5(  
    name="0miss", 
    data="~/0miss.recode.vcf.gz")
converter.run()

Which runs without a problem. Now, when I try to use this file in a tetrad analysis with the following command:

tet = ipa.tetrad(
    name="octo",
    data="~/analysis-vcf2hdf5/0miss.snps.hdf5",
    nquartets=1e6,
    nboots=16,
)

I get the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-42-9df7aceb61a7> in <module>
      3     data="/data/home/wolfproj/wolfproj-03/analysis-vcf2hdf5/0miss.snps.hdf5",
      4     nquartets=1e6,
----> 5     nboots=16,
      6 )

/opt/miniconda3/envs/ipyrad/lib/python3.7/site-packages/tetrad/tetrad.py in __init__(self, name, data, workdir, nquartets, nboots, save_invariants, seed, load, *args, **kwargs)
    176         else:
    177             # if self.kwargs["initarr"]:
--> 178             self._init_seqarray()
    179 
    180         # check input files

/opt/miniconda3/envs/ipyrad/lib/python3.7/site-packages/tetrad/tetrad.py in _init_seqarray(self, quiet)
    334         assert ".snps.hdf5" in self.files.data, "data file is not .snps.hdf5"
    335         io5 = h5py.File(self.files.data, 'r')
--> 336         names = [i.decode() for i in io5["snps"].attrs["names"]]
    337         self.samples = names
    338         ntaxa = len(names)

/opt/miniconda3/envs/ipyrad/lib/python3.7/site-packages/tetrad/tetrad.py in <listcomp>(.0)
    334         assert ".snps.hdf5" in self.files.data, "data file is not .snps.hdf5"
    335         io5 = h5py.File(self.files.data, 'r')
--> 336         names = [i.decode() for i in io5["snps"].attrs["names"]]
    337         self.samples = names
    338         ntaxa = len(names)

AttributeError: 'str' object has no attribute 'decode'

I tried perfiltering indels and multiallelic SNPs, but apparently there are no indels in the vcf, and the error occurs invariably both with heavily filtered files and even the original output vcf produced by ipyrad (when converted to hdf5). When I use the snps.hdf5 file of the ipyrad output directly, however, I get no error and tetrad runs smoothly:

tet = ipa.tetrad(
    name="octo",
    data="~/ipyrad_assemblies_start/exclude_outfiles/exclude.snps.hdf5",
    nquartets=1e6,
    nboots=16,
)

## no error

Any idea what I am doing wrong? I am running ipyrad v. 0.9.65 and Python v.3.7.10 on a remote machine.

Thanks a lot in advance,

Bernhard

@isaacovercast
Copy link
Collaborator

Yes, hello Berhard. First, let me say, thank you for carefully including so much useful information in your issue, it's super helpful.

This is actually a known bug in the tetrad codebase, which I have actually "fixed" but I don't have permissions on the repository to apply said fix. Was going to make a pr, but I didn't get around to it. I posted the diff to fix this in the issue on the tetrad github:

eaton-lab/tetrad#5 (comment)

If you can clone the tetrad repo and apply this diff that'll be the fastest way to get you going. Otherwise watch that tetrad issue for when the fix is merged in.

Closing this ticket as a dupe of the tetrad one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants