-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
annotate_data crashes on cell ontology with UnicodeDecodeError [Colab] #20
Comments
Please set ontology_folder in Process_Query to the PopV GitHub repository https://github.com/czbiohub/PopV/blob/main/ontology (it will otherwise fail for some organs as our ontology is newer and contains recently added cell-types). The issue is otherwise on the obonet side (I don't think one can fix it though and it was an unusual setting for Python to store a file UTF8 encoded in the file you are trying to use). |
Hi @cane11, thanks for the fast reply! I set the ontology_folder as per your recommendation and I am running into the same problem again. The obo file loads correctly if I run it before annotate_data
However, once I run annotate_data (which crashes with the unicode error), the above snippet also crashes with the same unicode error. So something weird is happening when running annotate_data? I am using a Colab pro account with a standard GPU + high RAM backend. I understand if this is still an external package (obonet) issue. |
I guess you started in a fresh runtime(?). Can you print the output of adata.uns["_cl_obo_file"] before and after running annotate_data? Can you check the md5sum of the cl_obo file before and after running the script. The script shouldn't change the downloaded ontology files, so I am indeed a bit puzzled. Can you replace the cl_obo_file with a backup version after failing in annotate_data? The other thing that could cause the failing is that the encoder is switching during running. I need to figure out how to get the current encoder. |
Hi @cane11, Yes started with a fresh runtime. I have narrowed down the issue to the scVI (and as consequence, scANVI) method ("knn_on_scvi"). I was able to run In one instance, after
After the So my guess here is that training the scvi model (on the gpu) somehow messes with the locale encoding? Any chance it has something to do with the pretrained model ? I wondered if this was a gpu issue and tried to set
Hope this helps with the diagnosis. Thanks again for the quick support! |
I thought initially it is ANSI encoder and turns to UTF-8 after some import in scVI or model loading. Can you verify that importing scVI doesn't make it fail? In the best case setting mode='retrain' in Process_Query to train all classifiers from scratch. When this work, I will have a look next week at the pretrained models and the best way for now is to train from scratch (should take 40 minutes with GPU enabled). |
I tried debugging this. However, when I run PopV ten times it is maybe happening once. In this fashion, it is not possible to debug it. I guess it is a Colab problem and I'm not sure what is causing this. Is it more reproducible in your hands @nagendraKU? |
@cane11 Yes, happens every time when I choose scVI or scANVI. I also think this is a Colab issue as I encountered the When I get time, I will try PopV in a local environment. Thanks for all your efforts! |
It should be fixed in the newest version of PopV (as it is not fully reproducible for me, rerunning it on your side would be great). Obonet released a new version that allows setting the text encoding, which is fixed in the current master. |
Hi!
Thanks for the recent updates to the repo! I was able to get through the pre-processing steps but I am running into an issue at the annotate_data step. I am running the updated Tabula sapiens tutorial on Colab with high RAM backend.
I downloaded the ontology files from https://figshare.com/articles/dataset/OnClass_data_minimal/14776281.
Here's the preprocessing setup
With the anndata from the above step, I run the following code:
Error:
Any help in fixing this error is appreciated!
The text was updated successfully, but these errors were encountered: