Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching for CO returns methanol. #41

Closed
PeterKraus opened this issue Mar 24, 2022 · 2 comments
Closed

Searching for CO returns methanol. #41

PeterKraus opened this issue Mar 24, 2022 · 2 comments

Comments

@PeterKraus
Copy link

What is the search string

>>> chemicals.identifiers.search_chemical("CO")
<ChemicalMetadata, name=methanol, formula=CH4O, smiles=CO, MW=32.0419>

Which chemical in the database do you believe should be found?

>>> chemicals.identifiers.search_chemical("carbon monoxide")
<ChemicalMetadata, name=carbon monoxide, formula=CO, smiles=[C-]#[O+], MW=28.0101>

Perhaps a toggle to prefer searching by formulas over smiles first should be added?

@CalebBell
Copy link
Owner

Hi Peter,
I always recommend strongly against searching by formula because formulas are NOT unique. If you really want to find a chemical by formula, you can do so as follows:

import chemicals
chemicals.identifiers.pubchem_db.search_formula('CO', autoload=True)

You will have to deal with potentially having a different chemical returned in the future though, although I'm not aware of another chemical with that formula in this case.

For example, C12H26 has 347 compounds in thermo.
Sincerely,
Caleb

@PeterKraus
Copy link
Author

PeterKraus commented Mar 25, 2022

I completely understand the issues for higher alkanes etc. which have many isomers, and indeed there's no reasonable way of defining a cut-off.

I think it's worth considering whether the current priority of "formula after smiles" is a good one in general, especially given that one can already search by smiles explicitly using the smiles= prefix.

In my particular case (catalysis), nobody is going to call CO by the full name anywhere in their data tables. chemicals does a great job at disambiguating stuff like C3H6, propylene, propene to ensure it's the same molecule (smiles=CCC); however I'd argue that CO and methanol / MeOH should never both evaluate to smiles=CO as it's super unexpected (although technically correct).

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants