Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError in notebooks/proteomics_data_relaxations.ipynb cell 17 line 1. #1

Closed
Soratake-HirotakaYajima opened this issue Oct 24, 2023 · 2 comments · Fixed by ginkgobioworks/geckopy#13

Comments

@Soratake-HirotakaYajima

Hello.

I got a ValueError in notebooks/proteomics_data_relaxations.ipynb cell 17 line 1.

I checked excel file that there are not MW row in msb20209536-sup-0010-datasetev9.xlsx sheet 'EV9-AbsoluteMassFractions-2'.

Perhaps, it is lack of data about MW.

Could you tell me about error below.

Best regards.


ValueError Traceback (most recent call last)
/Users/userName/geckopy/simult_supp_material/notebooks/proteomics_data_relaxations.ipynb セル 17 line 1
----> 1 ev9_valid["MW"] = extract_proteins(None, prot_dict)["MW"]

File ~/.pyenv/versions/3.8.18/envs/geckopy/lib/python3.8/site-packages/geckopy/experimental/molecular_weights.py:194, in extract_proteins(model, all_proteins, key_fn)
192 df_prot["MW"] = 0
193 for index, row in df_prot.iterrows():
--> 194 df_prot.loc[index, "MW"] = _molecular_weight(row["Sequence"])
195 return df_prot[["uniprot", "protein_id", "MW", "Sequence"]]

File ~/.pyenv/versions/3.8.18/envs/geckopy/lib/python3.8/site-packages/geckopy/experimental/molecular_weights.py:95, in _molecular_weight(seq)
93 weight = sum(weight_table[x] for x in seq) - (len(seq) - 1) * water
94 except KeyError as e:
---> 95 raise ValueError(
96 f"'{e}' is not a valid unambiguous letter for proteins"
97 ) from None
99 return weight

ValueError: ''X'' is not a valid unambiguous letter for proteins

@carrascomj
Copy link
Owner

Hi! Thank you for opening this issue. Unfortunately, I cannot reproduce this.

I have set up a fresh virtual environment (python3.10), installed latest master from geckopy,

pip install git+https://github.com/ginkgobioworks/geckopy.git

cloned this repository and run the notebook. The molecular weights are properly fetched as they were.

I checked excel file that there are not MW row in msb20209536-sup-0010-datasetev9.xlsx sheet 'EV9-AbsoluteMassFractions-2'. Perhaps, it is lack of data about MW.

The molecular weights are fetched from uniprot, using the extract_proteins function, so there should not be any MW column. The ValueError that you are getting is that a protein sequence was fetched with an unknown "X" aminoacid, whose molecular weight cannot be determined. The extraction, however, runs for me without issues.

Possible solutions to your problem:

  1. Try reinstalling geckopy to a fresh environment and running the notebook there.
  2. Wait some time. Uniprot may block an IP temporarily if there are too many requests. I don't think that's the issue here because it should error out before spitting out a sequence, but this has happened before to me.
  3. I will rerun it with python3.8 as you were and see if that helps me reproduce your issue.

Let me know if there is any progress!

@carrascomj
Copy link
Owner

Wait, I was on the wrong commit, I can reproduce this now.

The reason I don't get the error with an earlier commit is because the MW fallback was removed for some reason, I have open a pull request for this on upstream geckopy.

Installing geckopy from the branch should work now

pip install git+https://github.com/ginkgobioworks/geckopy.git@fix-re-mw-fallback
# cobrapy may cause some issues, so it is safer to downgrade it
pip install --upgrade "cobra==0.22.1"

Thanks for spotting this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants