New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in rule pangolearn after 04/09/2022 update #84
Comments
This looks like a duplicate of: cov-lineages/pangolin#427 I've encountered this error before, too. Try to reinstall your environment. This warning gives a hint about a possible reason: you may not be using the sklearn version expected. Setting up a fresh environment with pangolin should fix this. Let me know if reinstalling doesn't solve the problem and then share information on the exact packages and their versions installed in your environment. |
@aineniamh @corneliusroemer I think this issue deserves reopening. This error comes from the fact that apparently the most recent pangoLEARN models have been built using a more recent version of scikit-learn.
and though I have no idea whether the model would really be compromised that doesn't sound encouraging. Since dumped scikit-learn models are generally not guaranteed to be reloadable with different versions, I think the bioconda approach of pinning a given pangolin release to a specific version of scikit-learn is the right thing to do, but it requires that:
For 3.1.20 I'm not sure what should be done now. Fact is that models since 2022-04-09 won't work with fresh conda installs of pangolin 3.1.20, but there's no simple fix I can see. The question is whether you'd want to switch back to building future models with scikit-learn 0.24 agaiiin as you did previously? More importantly, however, the same logic holds for pangolin v4 and its pangoLEARN part of pangolin-data, too. Again, it would be good to have the scikit-learn version clearly stated, and most importantly not changing unnecessarily. |
@egenomics a solution to fix your issue (without updating to pangolin 4) is to:
This will enable you to run recent models of pangoLEARN with your pangolin. However, you'll see the UserWarning above when trying to run with older models. |
I'd like to just give a warning that when we released pangolin 4.0, I intended to maintain pangoLEARN for a couple of months before phasing it out. This was just to give a buffer zone of time for people to update to pangolin 4.0. It's been about 5 weeks, so bear in mind that this repository won't be maintained much longer! I think this is a good point about scikit-learn versions though, as this is relevant to the random forest model too (you don't see the warnings in 4.0 but the same thing exists that people's local version of scikit-learn may be different to what we've trained on). We can specify a particular version of scikit-learn if this might be an issue, but I've never noticed the version of scikit-learn effecting the inference from the model. |
Thanks @wm75 for investigating and giving such a detailed description of what's behind the error here and in cov-lineages/pangolin#427 The happy path is to use up to date If for reproducibility one needs to use an old @aineniamh Do I understand you correctly that |
Yeah as it's no longer needed in pangolin 4.0, I'll archive the repo at some point in the not too distant future. |
Hi,
We have been using pangolin (through conda) for a while now. With the last pangolearn update our pipeline broke. We are using
pangolin: 3.1.20
pangolearn: 2022-04-09
constellations: v0.1.7
scorpio: 0.3.16
pango-designation used by pangoLEARN/Usher: v1.3
pango-designation aliases: 1.6
We get the following error:
All dependencies satisfied.
The query file is:/datos/MiSeq/MICRO/COVID/analysis/2022_04_19_R2247/consensus/consensus.R2247.fna
** Running sequence QC **
Number of sequences detected: 48
Total passing QC: 44
Data files found:
Trained model: /root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/pangoLEARN/data/decisionTree_v1.joblib
Header file: /root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/pangoLEARN/data/decisionTreeHeaders_v1.joblib
Designated hash: /root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/pangoLEARN/data/lineages.hash.csv
Job stats:
job count min threads max threads
add_failed_seqs 1 1 1
align_to_reference 1 1 1
all 1 1 1
generate_report 1 1 1
get_constellations 1 1 1
hash_sequence_assign 1 1 1
pangolearn 1 1 1
scorpio 1 1 1
total 8 1 1
loading model 04/19/2022, 14:24:50
/root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.0.1 when using version 0.23.1. This might lead to breaking code or invalid results. Use at your own risk.
warnings.warn(
processing block of 44 sequences 04/19/2022, 14:24:51
[Tue Apr 19 14:24:52 2022]
Error in rule pangolearn:
jobid: 0
output: /tmp/tmpz2w2ggj4/lineage_report.pass_qc.csv
RuleException:
AttributeError in line 112 of /root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/pangolin/scripts/pangolearn.smk:
'DecisionTreeClassifier' object has no attribute 'n_features_'
File "/root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/pangolin/scripts/pangolearn.smk", line 112, in __rule_pangolearn
File "/root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/pangolin/pangolearn/pangolearn.py", line 170, in assign_lineage
File "/root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/sklearn/tree/_classes.py", line 922, in predict_proba
File "/root/miniconda3/envs/pangolin_test/lib/python3.8/site-packages/sklearn/tree/_classes.py", line 395, in _validate_X_predict
File "/root/miniconda3/envs/pangolin_test/lib/python3.8/concurrent/futures/thread.py", line 57, in run
Exiting because a job execution failed. Look above for error message
Exiting because a job execution failed. Look above for error message
The text was updated successfully, but these errors were encountered: