Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mangled gene names with matlab-incompatible characters #11

Open
exaexa opened this issue May 16, 2024 · 0 comments
Open

mangled gene names with matlab-incompatible characters #11

exaexa opened this issue May 16, 2024 · 0 comments

Comments

@exaexa
Copy link

exaexa commented May 16, 2024

We've seen a case now where if the model gene product IDs contain "special" characters such as - or ( or similar, these get mangled by cobratoolbox by encoding to their ASCII values. In turn, we've seen a report in curation where there is the following difference in gene product IDs in the report and in the model:

julia> setdiff(genes(model), genes_report)       # gene IDs that are in fbc_curation_matlab report but not in the model
13-element Vector{Any}:
 "G_YBR058C__45__A"
 "G_YCL005W__45__A"
 "G_YCR024C__45__A"
 "G_YDR322C__45__A"
 "G_YEL017C__45__A"
 "G_YER060W__45__A"
 "G_YHR001W__45__A"
 "G_YHR039C__45__A"
 "G_YLL018C__45__A"
 "G_YML081C__45__A"
 "G_YOL077W__45__A"
 "G_YPL096C__45__A"
 "G_YPR170W__45__B"

julia> setdiff(genes_report, genes(model))   # gene IDs in the model that are not in the report
13-element Vector{Any}:
 "G_YBR058C-A"
 "G_YCR024C-A"
 "G_YDR322C-A"
 "G_YEL017C-A"
 "G_YER060W-A"
 "G_YHR001W-A"
 "G_YHR039C-A"
 "G_YML081C-A"
 "G_YCL005W-A"
 "G_YOL077W-A"
 "G_YLL018C-A"
 "G_YPL096C-A"
 "G_YPR170W-B"

Technically this is an easy fix (the curators "just" walk the output CSVs manually and replace the mangled representations back), but it would be great to have some automated tool for this. Or at least have a warning printed, so that the users know that either

  • their model should have the - characters removed to work perfectly
  • they need to fix the reports manually

Thanks!

PS I think it would be greater to fix this directly in cobratoolbox, but since they depend on this mangling because of their eval use I somehow don't have much illusion about a good solution existing there.

PPS. the model is yeast-gem, in this particular instance here: https://www.ebi.ac.uk/biomodels/MODEL2204280003#Files

cc @feiranl @rsmsheriff @ntung

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant