You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in order to run a tetrad analysis on a couple of SNP datasets that we filtered with vcftools, I converted a bunch of vcf-files that were originally produced as output of an ipyrad assembly and subsequently filtered with vcftools into hdf5-files.
I used the following command from the ipyrad analysis toolkit cookbook:
AttributeError Traceback (most recent call last)
<ipython-input-42-9df7aceb61a7> in <module>
3 data="/data/home/wolfproj/wolfproj-03/analysis-vcf2hdf5/0miss.snps.hdf5",
4 nquartets=1e6,
----> 5 nboots=16,
6 )
/opt/miniconda3/envs/ipyrad/lib/python3.7/site-packages/tetrad/tetrad.py in __init__(self, name, data, workdir, nquartets, nboots, save_invariants, seed, load, *args, **kwargs)
176 else:
177 # if self.kwargs["initarr"]:
--> 178 self._init_seqarray()
179
180 # check input files
/opt/miniconda3/envs/ipyrad/lib/python3.7/site-packages/tetrad/tetrad.py in _init_seqarray(self, quiet)
334 assert ".snps.hdf5" in self.files.data, "data file is not .snps.hdf5"
335 io5 = h5py.File(self.files.data, 'r')
--> 336 names = [i.decode() for i in io5["snps"].attrs["names"]]
337 self.samples = names
338 ntaxa = len(names)
/opt/miniconda3/envs/ipyrad/lib/python3.7/site-packages/tetrad/tetrad.py in <listcomp>(.0)
334 assert ".snps.hdf5" in self.files.data, "data file is not .snps.hdf5"
335 io5 = h5py.File(self.files.data, 'r')
--> 336 names = [i.decode() for i in io5["snps"].attrs["names"]]
337 self.samples = names
338 ntaxa = len(names)
AttributeError: 'str' object has no attribute 'decode'
I tried perfiltering indels and multiallelic SNPs, but apparently there are no indels in the vcf, and the error occurs invariably both with heavily filtered files and even the original output vcf produced by ipyrad (when converted to hdf5). When I use the snps.hdf5 file of the ipyrad output directly, however, I get no error and tetrad runs smoothly:
Yes, hello Berhard. First, let me say, thank you for carefully including so much useful information in your issue, it's super helpful.
This is actually a known bug in the tetrad codebase, which I have actually "fixed" but I don't have permissions on the repository to apply said fix. Was going to make a pr, but I didn't get around to it. I posted the diff to fix this in the issue on the tetrad github:
If you can clone the tetrad repo and apply this diff that'll be the fastest way to get you going. Otherwise watch that tetrad issue for when the fix is merged in.
Hi,
in order to run a tetrad analysis on a couple of SNP datasets that we filtered with vcftools, I converted a bunch of vcf-files that were originally produced as output of an ipyrad assembly and subsequently filtered with vcftools into hdf5-files.
I used the following command from the ipyrad analysis toolkit cookbook:
Which runs without a problem. Now, when I try to use this file in a tetrad analysis with the following command:
I get the following error:
I tried perfiltering indels and multiallelic SNPs, but apparently there are no indels in the vcf, and the error occurs invariably both with heavily filtered files and even the original output vcf produced by ipyrad (when converted to hdf5). When I use the snps.hdf5 file of the ipyrad output directly, however, I get no error and tetrad runs smoothly:
Any idea what I am doing wrong? I am running ipyrad v. 0.9.65 and Python v.3.7.10 on a remote machine.
Thanks a lot in advance,
Bernhard
The text was updated successfully, but these errors were encountered: