Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Round 2 (#1) is complete! Thank you again to all who participated.
We are pleased to announce the winner of the competition is Giovani Cincilla (@gcincilla; Molomics). A very close second place goes jointly to Willem van Hoorn (@wvanhoorn; Ex Scientia) and Ben Irwin/Mario Öeren/Tom Whitehead (@BenedictIrwin; Obtibrium and Intelligens). We’re also adding Davy Guan (@IamDavyG; USyd) to the list of winners as the best non-company entrant.
Congratulations! We will be awarding prizes to the best model(s) from a company as well as one from the wider community. £100 each will go to Giovanni and Davy for the company and non-company winners. £50 will also go to Willem and Ben/Mario/Tom (combined) as the runners-up. The prizes will be presented at the upcoming meeting (see below).
What just happened?
The results from Round 1 may be found here. The submitted models for Round 2 were all here. These were evaluated vs. the actual potencies observed for these compounds, which are shown here and here. The analysis of the entries may be found here. This analysis has been conducted by Ed Tse (USyd & OSM), Murray Robertson (Strathclyde), Robert Glen (Cambridge) and Mat Todd (UCL & OSM).
Analysis of the models was done as follows:
n.b. The initial analysis was done with relatively tight classifications limiting the results to >2.5 uM. The following analysis was relaxed by increasing the limit to >25 uM. Both can be found in the respective tabs at the bottom of the spreadsheet.
It’s quite interesting how similar some of the training structures were to the test structures. Murray has done an analysis of this here. This is to some extent expected, given that this is a lead optimisation project, but worth noticing in passing.
We are now moving on to the next phase of this project, which will involve:
So well done everyone, and we’re excited to synthesise the new predictions of novel actives!
Congratulations to the winners!
This type of competition / online collaboration has been very fruitful and stimulating for me. I'm learning all the different ways one could model chemical data and I'm looking forward to see the results from the Validation Phase!
There is a chance I might be in London at the end of January so I might get to meet some of you in person :)
Thanks everyone this looks great! The breakdown is interesting as some compounds were easy to predict by most models and others more varied. I'm glad you came up with a way to compare the classification and regression models.
I'm also glad we built the model with the 'small set' only for comparison. There is clearly a lot to be learned from the master chemical list when comparing the two models.
Is there a deadline/timescale for the generative predictions yet?
I know some methods cannot generate compounds however well they can predict them.
Week of 20th Jan is looking free for the time being.
Congratulations to the winners and everyone who took part in this competition! Very interesting results and not being chemist, it would be interesting to find out why some compounds were easier to predict than others.
It would be nice to meet all of you in London in mid-January. But it is difficult for me to confirm whether I would be there at this point.
Thank you Edwin, a wonderful learning experience! I can see that taking a simple approach was a gamble that didn't pay off. Congratulations to the winners! Slade…
Sent from my iPad On 4 Nov 2019, at 9:11 PM, Edwin Tse <email@example.com<mailto:firstname.lastname@example.org>> wrote: Round 2 (#1<#1>) is complete! Thank you again to all who participated. We are pleased to announce the winner of the competition is Giovani Cincilla (@gcincilla<https://github.com/gcincilla>; Molomics). A very close second place goes jointly to Willem van Hoorn (@wvanhoorn<https://github.com/wvanhoorn>; Ex Scientia) and Ben Irwin/Mario Öeren/Tom Whitehead (@BenedictIrwin<https://github.com/BenedictIrwin>; Obtibrium and Intelligens). We’re also adding Davy Guan (@IamDavyG<https://github.com/IamDavyG>; USyd) to the list of winners as the best non-company entrant. Congratulations! We will be awarding prizes to the best model(s) from a company as well as one from the wider community. £100 each will go to Giovanni and Davy for the company and non-company winners. £50 will also go to Willem and Ben/Mario/Tom (combined) as the runners-up. The prizes will be presented at the upcoming meeting (see below). Huge thanks go to the other entrants (@mmgalushka<https://github.com/mmgalushka>, @jonjoncardoso<https://github.com/jonjoncardoso>, @holeung<https://github.com/holeung>, @spadavec<https://github.com/spadavec>, @luiraym<https://github.com/luiraym>, @sladem-tox<https://github.com/sladem-tox>) – though you didn’t win, you’re still involved – see below!
________________________________ What just happened? The results from Round 1 may be found here<https://docs.google.com/spreadsheets/d/1uaQ_mSVY6vQSbnDHD5gf232ySH-KVjUg--dQCsilGWI/edit?usp=sharing>. The submitted models for Round 2 were all here<https://github.com/OpenSourceMalaria/Series4_PredictiveModel/tree/master/Submitted%20Models>. These were evaluated vs. the actual potencies observed for these compounds, which are shown here<OpenSourceMalaria/Series4#71> and here<OpenSourceMalaria/Series4#73>. The analysis of the entries may be found here<https://github.com/OpenSourceMalaria/Series4_PredictiveModel/blob/master/Paper%20SI%20files/Round%202%20Submission%20Analysis_V2.xlsx>. This analysis has been conducted by Ed Tse (USyd & OSM), Murray Robertson (Strathclyde), Robert Glen (Cambridge) and Mat Todd (UCL & OSM). [Screen Shot 2019-11-04 at 10 05 32 am]<https://user-images.githubusercontent.com/18062981/68113029-b12f9a00-feea-11e9-90b1-afa2738afc1f.png> Analysis of the models was done as follows: * All entries were collated into a table along with the experimental values * For easier comparison between entries, the values were normalised between 0 and 1 * The predictions for each compound were plotted in separate column graphs to visually compare with the experimental results * The predictions were then classified as ‘correct’ or ‘incorrect’ (and further separated into false positives and false negatives) * Precision and recall were also calculated for each model as a further comparison * Winners were determined based on the highest precision calculated for each model from the “Raw (>25)” analysis n.b. The initial analysis was done with relatively tight classifications limiting the results to >2.5 uM. The following analysis was relaxed by increasing the limit to >25 uM. Both can be found in the respective tabs at the bottom of the spreadsheet. It’s quite interesting how similar some of the training structures were to the test structures. Murray has done an analysis of this here<https://github.com/OpenSourceMalaria/Series4_PredictiveModel/blob/master/Paper%20SI%20files/All_34_similarities.pdf>. This is to some extent expected, given that this is a lead optimisation project, but worth noticing in passing.
________________________________ What’s next? We are now moving on to the next phase of this project, which will involve: 1. The prediction of new active compounds to be synthesised here in the lab, i.e. the Validation Phase. We will be reaching out to the winners to use their models to predict 2 new compounds each. This will give us a total of 8 compounds to make (by yours truly) which will be tested for parasite killing before Christmas (fingers crossed). The idea is that these are predictions of molecules that are ideally structurally distinct from the known molecules (as much as possible), to try to identify a highly potent new active series. However, the ideal case would be for each of the 4 teams to predict one molecule that can be as similar as they like to a known, and one “moonshot” molecule that is as structurally distinct from the model as possible, while being predicted to be active according to the model. 2. Finishing off the paper describing this competition (here<https://docs.google.com/document/d/1aD29GjC8RjqrSDcWcEUptS04Z2v10deReRp0eB3kcp4/edit?usp=sharing>). All of the entrants from this round are asked to provide a brief summary of their methods (if possible) in a similar manner to what is currently in the paper for Round 1. Everyone please add your author details to the paper. This is a joint effort. 3. Running a one-day meeting on AI/ML in drug discovery, using this competition as the focus of the meeting. This will be in London in mid-January (e.g. week of Jan 20th), and details will come as soon as we’ve booked the space. Hopefully everyone can attend, and we have some money available to cover some travel expenses. More on this ASAP. If anyone would like to come but cannot make mid-January, please say. So well done everyone, and we’re excited to synthesise the new predictions of novel actives! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#18?email_source=notifications&email_token=AMYHQP6ODOWEINAT6ST6RZDQR7YOTA5CNFSM4JIRK6A2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HWRUB6A>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AMYHQP7ATNF2ECQT5BJJI3DQR7YOTANCNFSM4JIRK6AQ>.
Great fun! It's important that we also follow up with a discussion and analysis of why some strategies worked better than others, why some molecules were easier/harder than others to predict, retrospective thoughts, etc. This might work better as a separate paper, focused on computational and technical details. Please let me know what you think! For industry participants, will it be a problem discussing your strategies?
It seems we could reach useful predictive models, that’s great! Congratulation to everybody!
To what deals with the next steps, I reply to the main points of this thread:
Looking forward to your feedback!
Also, and this is my fault for not clarifying in my model, but it appears the wrong prediction was used in my model--there is a "class" prediction that should have been used in the evaluation, not the raw ic50 predictions. I'm also having some confusion over the scaling method used--I was under the impression that the goal was to simply differentiate active (<1uM) from inactive (>1uM), and the range of outputs wouldn't be scaled/altered. If a simple 1uM cutoff is used for active/inactive, then you can assign classes to each of the predictions made from each model, and use the matthews correlation coefficient (MCC; https://en.wikipedia.org/wiki/Matthews_correlation_coefficient) as a balanced measure of classification accuracy. When you do this, it appears Giovanni did the best still, but the numbers change a bit:
This was just my first pass at analyzing the data, so any clarification would be great!
Hi @spadavec did you use the "Master List" set for our MCC value in that analysis? The small set was only for a model trained on the series 4 subset of the data. I'm surprised it looks so different to Willem's model when they were so close before.
I was under the impression it was a regression task to "predict" the activity, but I didn't read the extended discussion in too much depth, I saw some content on classification in there.
Our model also has the added benefit of producing an error bar/confidence region, so some predictions will be more confident than others and we could focus in on the best predictions. I believe this will be of great importance when generating new compounds as we can choose the most likely to succeed in trade off of purely a high activity. Being a deep neural network, it is easy to stick a generative element on top for the next round. It was not clear to me how the simpler models would generate any compounds on the onset of the project. Now I realise it is a process of generating compounds manually and predicting their properties using the model.
We also made predictions for all of the assays individually, which I believe would be useful in selecting for activity against, say, the drug resistant strain.
It will be hard to find any one metric which shows all the pro's and con's of each model.
One could plot the MCC for all (regression at least) models as a function of the cutoff value. Could also use other metrics like Cohens Kappa?
Some regression metrics such as R^2 (coefficient of determination), r^2 (Pearson) and RMSE might be interesting for the further analysis on some of the models?
The proof in the pudding would be to use the model to generate a compound with suitable properties I guess.
@holeung The normalised values were calculated (normalised value = (X-min)/(max-min) were X is the predicted value, min is 0 uM and max is 25 uM or 2.5 uM) to make it easier and quicker to compare each predicted value against the experimental. These normalised values are plotted for each compound in the separate tabs at the bottom of the spreadsheet file.
@spadavec Apologies for the confusion about the predictions from your model. Since we are more interested in the ability to accurately predict the potency values, your IC50 predictions were used for the analysis. The tally does improve when using your class predictions but I'm a bit confused about how these correlate to the IC50 predictions? Both class and value predictions were considered when doing the initial analysis (limiting things to <2.5 uM) but we considered the tolerance to be a bit tight so the final analysis was relaxed to <25 uM.
I hope that clarifies things a bit.
I don't believe I did--it was a very quick and slapdash analysis, and I may have grabbed the wrong column for your entry (I remember there being 2).
I can easily do this--I'll do a work up later today with some more careful analyses (using different cutoffs, metrics, etc).
I'm not sure I see the value in this, given that the predictions were directly comparable from experimental to predictions-- especially given that some predictions were classifiers, and others were linear regressions of IC50 directly. If we were comparing IC50 values directly, wouldn't we be more interested in a metric like MUE/MAE/RMSE on the unscaled predictions?
Basically any ML model with moderate data to predict an IC50/EC50 value will be able to only get ~6x potency difference averaged between actual and predicted (e.g. +/- 0.7 pIC50)--even modern methods like FEP can only get ~1 kcal/mol accuracy at best. The measurements alone in wet lab experiments can swing as much as 5x, so given the inherent differences in potencies, I created a cutoff (I believe 300 nM) to be active, and everything else inactive. In retrospect, had I known there would be different cutoffs for activity, I would have definitely taken a different approach. I was primarily only interested in being able to distinguish active from inactive, and didn't care about the MUE/MAE of my pIC50 differences (which I now know matters quite a bit), especially for compounds far from the cutoff.
I can see some discussion about generating new compounds. I just want to make some comments regarding this topic, which someone may find useful. Since my knowledge in chemistry quite limited, I apologize in advance something if something that a say does not make sense.
First a little bit of history. A few years ago my company (Auromind) has been involved in the project for predicting properties of chemical compounds using SMILES. We were training DNN to predict LogD. The way how we approached this task was to create a variational autoencoder which can reconstruct SMILES. Then we isolated encoder and attached MLP instead of the decoder to learn either classification or regression task. In our case, it was to predict LogD (regression task). We trained autoencoder and then regressor using 1.7M SMLES from ChEMBL. The results were quite good, we wrote a paper draft but still struggled with time to finish it.
Autoencoder is quite an interesting DNN model. It uses information bottleneck to compress original SMILES (in our case to 1024 real values vector) and then uses this vector to reconstruct original SMILES. This 1024 real values vector often called a latent-vector. In some sense, it is a chemical compound fingerprint (alike Tanimoto).
The cool thing about latent vector, if we introduce a small random "epsilon" to the latent vector (for some target SMILES), in theory, autoencoder should generate very similar compounds to the original one (since similar SMILESs will be closely approximated in the latent space). We didn't try this functionality but I saw several publications describing this. So this is something I think would be interesting to explore.
One more interesting thing we observed during our experiments. If we relax encoder's weights, dring the model training, It will regroup compounds (latent vectors) according to the target property such as LogD. In my understanding, it tries to consider not only SMILES but also target properties during adjusting the latent space. In practice, it means, if know the compound X is "active" we can generate using random "epsilon" compounds Y1, Y2, Y3, ... which also are likely to be active. But this needs much more research to prove :)
We released this project on Github (https://github.com/mmgalushka/armchem) and also used the autoencoder to generate compounds fingerprint in this competition.
As mentioned above we launched an interactive and collaborative design approach (through Molomics Technology) where anybody can participate in the optimization of OSm Series-4 compounds from anywhere at any time.
Instructions are available in issue #24