Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Assigning OSM Codes to Chiral Compounds #172
Comments
mattodd
added the
question
label
Apr 7, 2014
I'd say yes.
Again, I'd say yes
No, I think we decide on an ee threshold, anything below this is considered racemic.
Same as 3 I think or it just becomes a minefield of numbers and as has been shown with PZQ project ee is not directly proportional to activity. |
cdsouthan
commented
Apr 7, 2014
|
Agreed, this is analogous to the PubChem rules. If you read our MIABE On Mon, Apr 7, 2014 at 10:26 AM, Murray Robertson
|
|
So we use distinct IDs for enantiopure and rac. If a compound is suspected of being enantioenriched (because it has been prepared with an enantioselective reaction) we assign the same number as the enantiopure material? If the enantiomeric excess has been measured we append with a number, e.g. OSM-S-175-78? What about having OSM-S-175-E to indicate a likely but indeterminate enantiomeric excess? No compound is ever enantiomerically pure, so we should always have some suffix describing enantiopurity, no, given how important this is for activity? Counter to this argument, the biological activity difference between rac and enantiopure is rarely that significant unless one is unlucky (vs activity of two enantiomers) so we need to limit the effort here. Batches are easily captured using sample IDs used in the lab notebooks, i.e. with a second number that involves the chemist's initials. We ought to include such things in the biological screening data sheets as a separate column, but I think there's no need to make the compound ID more cumbersome with that. |
|
I think in terms of enantioenriched materials, we should only describe things as a single enantiomer if they are say >95% or >90% and otherwise we should view them as racemic as we can't comment on them with any degree of accuracy. I'm not sure that putting another letter 'OSM-S-175-E' is a good idea. I think it might confuse rather than clarify. |
|
Another point - am I right in thinking that SMILES for enantiomers are different, while InChI is the same? We want to avoid the situation where someone searches one of the project-related spreadsheets for a molecule with a defined stereocentre and misses the racemate. |
|
I think it depends how they are generated. Both have the ability to assign stereocenters but its maybe not always the case. |
madgpap
commented
Apr 11, 2014
|
Both representations distinguish between enantiomers or non-chiral/racemate versions. See for example https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL175 vs. https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL521 |
madgpap
closed this
Apr 11, 2014
madgpap
reopened this
Apr 11, 2014
|
OK, so we need not worry about strings if the databases are happy with searches where the stereocentre configuration is variable (i.e. a search for a structure with an undefined stereocentre returns search results that include racemates and enantiopure compounds.
The scientifically appropriate thing to do with such compounds is to classify them as racemic (as Alice suggests), since highly enantioenriched compounds are special. But when one is searching for examples of enantioenriched compounds one presumably wants to see all the examples where people have attempted to generate an ee as well (including examples where people obtain modest enantioenrichment), meaning it would be more useful to classify a "hopeful" ee along with the "known" enantioenriched samples by giving them the same codes. |
mattodd
referenced
this issue
Jun 29, 2014
Closed
Racemic synthesis of MMV669844 (4-(5-(2-(3,4-difluorophenyl)-2-methoxyethoxy)-[1,2,4]triazolo[4,3-a]pyrazin-3-yl)benzonitrile) #166
mattodd
closed this
Jun 29, 2014
mattodd
reopened this
Jun 29, 2014
|
Referenced in agenda of July 2014 meeting To my mind is still seems neatest to have a suffix for chiral compounds This preserves differences of code for rac, scalemic, enantiopure but maintains an obvious connection between the samples. Thus we'd be able to have all these compounds grouped in the same compound page for OSM-S-208. Isn't that what you'd like to see when you go here: http://malaria.ourexperiment.org/osm_procedures/9907/OSMS208.html Compounds synthesised non-racemically could be assigned the E code until they are measured at which point they could be assigned the other codes. We could, if people wanted, assign a threshold of e.g. 80% ee that distinguishes racemic from enantiopure. That's less important in my view. Would we be committing informatics-cide by having some codes longer than others @cdsouthan ? |
cdsouthan
commented
Jun 29, 2014
|
Hmm, JFTR, I don't accept the mantle of "database policeman" (its futile anyway) but I can merely state what empirically makes findability and searchability in support ot this project, easy or diffcult from where I sit. I thus suggest code-splitting/forking via suffixes or any other extension in your public identifieres is not a good idea (yet) since there is no precedent for e-codes. It is simpler just to stick to "flat" or R/S and E/Z resolved as best represents and fits the analytical data. They would also get InChIs for what you have actually made (and someone else could make) and all the isomers are x-mapped via PubChem "same connectivity" anyway. Internaly, (but still "open" in the Google sense) you can obviously do whatever has utility in you internal registration system e.g. adding synthesis batch nos and ee numbers as code extensions. However, I suggest they only beome valuable externaly if they robustly split the bioassay data (e.g. your ee batches give significantly different IC50s). If this proves to be so, it does then raise the interesting precedent of splitting the external ID via suffix codes in the same way, but also complications for public assay result mapping with the "same" strucutres (but could add the code in the SID records) |
|
The only problem with the "E" suffix is what if the compound has a double bond of unknown/mixture of configurations? I accept this is not likely, but is possible |
|
Perfectly reasonable point, yes, but the issue is whether suffices are Again, there will rarely be times when this is crucial, given the likely On 29 June 2014 13:29, Murray Robertson notifications@github.com wrote:
MATTHEW TODD | Associate Professor THE UNIVERSITY OF SYDNEY CRICOS 00026A |
cdsouthan
commented
Jun 30, 2014
|
Its not realy a cost issue, its more you just need to think out how your internal registration rules can mesh with the external chemistry rules (i.e. in PubChem). As said you can internally suffix and fork off your "core" codes as much as you like, with as many layers as you like. Its just when you go external I'd be circumspect about splitting database submissions with suffixes, since they will merge to the same CIDs by default (n.b. you'll have to fit ChEMBL rules first and them PubChem rules). JFTR here is an example of antimalarial assay data being split by SID https://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=52949030&loc=ec_rcs |
|
OK, so wait - in ChEMBL rac and enantiopure (above a threshold of ee) have In OSM most things are external. We have codes for batches, from the lab On 30 June 2014 18:22, cdsouthan notifications@github.com wrote:
MATTHEW TODD | Associate Professor THE UNIVERSITY OF SYDNEY CRICOS 00026A |
mattodd commentedApr 7, 2014
We need to assign OSM codes to inherited compounds that currently only have MMV codes, since compounds in the project need to have their data collected together. I just did the first:
http://malaria.ourexperiment.org/osm_procedures/9557/Preparation_of_OSMS175.html
But in doing the second (MMV669844) I hit a snag. Raising these questions:
These questions are relevant here since MMV669844 was prepared (by a CRO) with an asymmetric reaction, so is shown with the expected stereocentre, but the ee was not measured. How do we number this? I'm assuming the answer is that we give unique numbers to enantiomers and racemates, and simply include asterisks when we're not sure of the ee, but I'd be interested to hear from @cdsouthan @madgpap and @murrayfold who will have dealt with this in the past.
This paper (http://www.jcheminf.com/content/4/1/11) adopts the molecule-substance-batch approach, so we could use OSM-S-XXX-Y-Z if we had to, but we've not been doing this to date.