Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'str' object has no attribute 'decode' #5

Closed
joelnitta opened this issue Jun 24, 2021 · 7 comments
Closed

AttributeError: 'str' object has no attribute 'decode' #5

joelnitta opened this issue Jun 24, 2021 · 7 comments

Comments

@joelnitta
Copy link

joelnitta commented Jun 24, 2021

Hello!

I was wondering if I can get some help with the error, AttributeError: 'str' object has no attribute 'decode' when running tetrad on an hdf5 file from ipyrad:

(tetrad) root@29ea72f56332:/wd/intermediates/ipyrad/hymeno-v2_outfiles# tetrad -i hymeno-v2.snps.hdf5 -o outdir -n test 
                                                                                                                        
-------------------------------------------------------                                                                 
tetrad [v.0.9.13]                                                                                                       
Quartet inference from phylogenetic invariants                                                                          
-------------------------------------------------------                                                                 
tetrad instance: test                                                                                                   
Traceback (most recent call last):                                                                                      
  File "/opt/conda/envs/tetrad/bin/tetrad", line 11, in <module>                                                        
    sys.exit(main())                                                                                                    
  File "/opt/conda/envs/tetrad/lib/python3.7/site-packages/tetrad/__main__.py", line 287, in main                       
    CLI()                                                                                                               
  File "/opt/conda/envs/tetrad/lib/python3.7/site-packages/tetrad/__main__.py", line 159, in __init__                   
    self.get_data()                                                                                                     
  File "/opt/conda/envs/tetrad/lib/python3.7/site-packages/tetrad/__main__.py", line 233, in get_data                   
    load=self.load,                                                                                                     
  File "/opt/conda/envs/tetrad/lib/python3.7/site-packages/tetrad/tetrad.py", line 178, in __init__                     
    self._init_seqarray()                                                                                               
  File "/opt/conda/envs/tetrad/lib/python3.7/site-packages/tetrad/tetrad.py", line 336, in _init_seqarray               
    names = [i.decode() for i in io5["snps"].attrs["names"]]                                                            
  File "/opt/conda/envs/tetrad/lib/python3.7/site-packages/tetrad/tetrad.py", line 336, in <listcomp>                   
    names = [i.decode() for i in io5["snps"].attrs["names"]]                                                            
AttributeError: 'str' object has no attribute 'decode'                                                                  

Here is a link to the hdf5 file on Dropbox if it helps: https://www.dropbox.com/s/be0cdol47f4renx/hymeno-v2.snps.hdf5?dl=0

The snps.hdf5 file was generated with ipyrad v.0.9.65.

Thanks!

@isaacovercast
Copy link
Collaborator

The decode error has to do with a difference between binary and plain text string representations. When I have seen this in the past it was caused by running the ipyrad assembly with python 2, then trying to run downstream tools with ipyrad/tetrad and python 3. Can you please try re-running step 7 with ipyrad after verifying your python version is 3.7 or greater? If you are going to re-run step 7 I would also suggest updating to the newest version of ipyrad (it never hurts). Good luck.

@joelnitta
Copy link
Author

Thanks for the suggestions.

I ran ipyrad in a docker container (tag 0.9.65--pyh3252c3a_0), which is based off the bioconda package (recipe here).

Running these commands in the container indicates it is using python3:

bash-4.2# which python
/usr/local/bin/python
bash-4.2# python --version
Python 3.7.9

I tried steps 2-7 with the most recent version of ipyrad (tag 0.9.81--pyh5e36f6f_0), but got the same error from tetrad.

@isaacovercast
Copy link
Collaborator

Ok. I dl'd and ran this docker image, and it seems fine. The docker container doesn't include tetrad, so how are you installing and running it?

@isaacovercast
Copy link
Collaborator

Ok, well I fixed it but I don't have permissions on this repository. Here is the diff for the working version:

diff --git a/tetrad/tetrad.py b/tetrad/tetrad.py
index a0c70c3..9ed2e9a 100644
--- a/tetrad/tetrad.py
+++ b/tetrad/tetrad.py
@@ -316,7 +316,10 @@ class Tetrad(object):
         # reloading info from hdf5
         assert ".snps.hdf5" in self.files.data, "data file is not .snps.hdf5"
         io5 = h5py.File(self.files.data, 'r')
-        names = [i.decode() for i in io5["snps"].attrs["names"]]
+        try:
+            names = [i.decode() for i in io5["snps"].attrs["names"]]
+        except AttributeError:
+            names = [i for i in io5["snps"].attrs["names"]]
         self.samples = names
 
 
@@ -333,7 +336,10 @@ class Tetrad(object):
         # get data shape from io5 input file       
         assert ".snps.hdf5" in self.files.data, "data file is not .snps.hdf5"
         io5 = h5py.File(self.files.data, 'r')
-        names = [i.decode() for i in io5["snps"].attrs["names"]]
+        try:
+            names = [i.decode() for i in io5["snps"].attrs["names"]]
+        except AttributeError:
+            names = [i for i in io5["snps"].attrs["names"]]
         self.samples = names
         ntaxa = len(names)
         nsnps = io5["snps"].shape[1]

You can pull the repo, apply the diff, and then pip install -e . the top level tetrad repository directory, and it should work fine.

@joelnitta
Copy link
Author

Thanks!

I was running tetrad in a custom docker image (has since been updated to apply the patch).

However, I now have a different problem... I think this is just my incomplete understanding of how tetrad works. I was able to run it successfully, but it returns 0 bootstrap result trees already exist for test and no other output. What do I need to do for tetrad to produce a tree?

(base) joelnitta@beyond:~/hymeno-migseq/test$ docker run --rm -v /home/joelnitta/hymeno-migseq/test:/wd -w /wd joelnitta/tetrad:0.9.14-patch tetrad -i hymeno-v2.snps.hdf5

-------------------------------------------------------
tetrad [v.0.9.14] 
Quartet inference from phylogenetic invariants
-------------------------------------------------------
tetrad instance: test
loading snps array [134 taxa x 206615 snps]
max unlinked SNPs per quartet [nloci]: 18785
quartet sampler [random, nsamples**2.8]: 903427 / 12840751
0 bootstrap result trees already exist for test.

@joelnitta
Copy link
Author

Nevermind... it was indeed just my usage of tetrad. Once I added -b 100, the analysis is producing output as expected. (Suggestion: if tetrad -i input without additional arguments doesn't actually do anything, you may want to change that example in the README).

I am leaving this open for now because it seems to me it shouldn't be considered resolved until the patch gets merged.

@isaacovercast
Copy link
Collaborator

Good directions for installing the patch from @bmichanderson:

First you may want to create a new environment without tetrad or uninstall it from the current one and make sure it isn't installed anywhere else (typing tetrad --version shouldn't give anything). Then you can install it as I mentioned. Where you clone the repository doesn't matter, as you can just delete it after. The install puts the program and scripts in the right places.
If you are in your home directory cd ~, you can run the commands I put above. First clone the repository, then change into the repository directory (tetrad). Now make a file called mydiff (or whatever) and paste the text from the comment I linked. Save the file, then use the command git apply mydiff or whatever you called it git apply <your_file>. After it completes, while you are still in the tetrad directory, you can run the python setup.py install .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants