-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notes from Informatics Meeting Dec 2015 #364
Comments
A quick comment about Openbabel, (full disclosure I’m a contributor to OpenBabel), it is released under GPL license I think that you are simply calling it as a web service this probably does not cause any issues but I thought I’d mention it. There are many chemical drawing packages that now have both desktop and web flavours, (even Chemdraw now has a mobile version) so I think users will expect to be able to use the same chemical drawing package for all their needs. ChemDoodle web components are already moving along these lines https://web.chemdoodle.com https://web.chemdoodle.com/ and there is clear overlap with the sort of things Luc has so nicely demonstrated. Released under GNU GENERAL PUBLIC LICENSE. Similarly Marvin JS https://www.chemaxon.com/products/marvin/marvin-js/ https://www.chemaxon.com/products/marvin/marvin-js/ offers drawing and querying tools, requires commercial license. Elemental http://www.dotmatics.com/products/elemental/ http://www.dotmatics.com/products/elemental/ is another javascript based chemical drawing application, free but not sure of licensing Ketcher http://lifescience.opensource.epam.com/ketcher/ http://lifescience.opensource.epam.com/ketcher/ javascript based drawing package free and open source, I think it requires a server back end to provide some of the functionality. Uses GNU Affero General Public License. JSME is a javascript port of JME, JSME is released under a BSD license but I’m not sure about JME (which is only free to non-commercial applications) and I’m not sure about the situation with respect to Open Source. No desktop version. |
JME does not matter, it is old java stuff so it became useless and was replaced by JSME. ChemDoodle web component : we had trouble dealing with GPL license. The companies expect that everything that "touch" the web component has to be GPL. This means that if you include any of their component on a webpage the server should be GPL as well as the server towards which you send the request. We have therefore rewrite the jcamp converter and a much better jcamp visualizer. I hope indeed that if you have a call towards a webserver that has openbabel it is not an issue. @drc007 maybe you could check this. If one webpage use a webservice towards openbabel, should the original server also be GPL ? So it could not be windows for example ? In general I'm quite tired with GPL license, seems a lawyer problem and I don't feel like loosing any time with those things. This is the reason we do everything in MIT (or BSD). To draw on-line we will probably go towards openChemLib 👍 I It can enforce stereochemistry ! You need to specifiy if it is racemate, diastereoisomers, ... (molfile v3000 enhanced stereochemistry) and it is MIT. |
@lpatiny I'm not a lawyer so I can't give a ruling, I just know that these sort of things have been an issue in the past. |
@lpatiny does openChemLib support query atoms etc? |
@lpatiny Excellent. I presume it is all written in javascript? |
Well technically not ... it is written in Java : https://github.com/Actelion/openchemlib |
@mattodd - Thank you for the excellent meeting summary. I did go back to re-read the "How-To's" that were already available on the wiki on compound number registration. There was a single line that I previously must have missed that has made searching more productive. In the instructions, it does mention to check the "Use simple text search" box. By doing that I was able to get much more relevant and focused search results. While reaction searching is still not possible, running several compound searches using SMILES or InChI was possible to find compounds of interest as either starting materials or products. By looking at the Master Sheet, a chemist could make educated guesses at intermediates necessary to make those particular products and then search the notebooks for those particular intermediates to view the experimental conditions. What I am still a little unclear on is whether or not we should be reproducing our experimentals in the OSM LabTrove ELN or only maintain them in the publicly available LabArchives ELN that we are currently using? Please let me know what is preferred by the group. @lpatiny - I also wanted to thank you for your demo of the cheminfo.org tools available for use in the OSM program. I was looking over the cheminfo.org site in Firefox instead of Chrome and I noticed that under the 'Chemistry' menu selection, there was a subsection called 'Parsing Data' and then under that there was a choice for 'SDF 3D Plot'. Would it be possible to add the 'SDF 3D Plot' as an option to the tools arrayed in the OSM project? I have been importing the data into another program to examine the data graphically, but I think it would be advantageous to have it available on the web interface. Finally, @drc007 @cdsouthan @mattodd , I am more then willing to pitch in on the informatics grant process. Please let me know what might be needed or where you may want some help. |
Hi all, looks very interesting. Also, it would be really helpful to have a 'how to' document to assist in the preparation of a 'model' open entry using all of the file types suggested in this thread and others. I'd be delighted to try out any and all suggestions and of course open to feedback. Cheers Alice |
Good meeting and write-up NOBA being wowed by the @lpatiny open toolbox. I will check in with the PubChem folk about piping results into BioAssay and auto-allerts along the lines of "a new CID within 0.85 Tanimoto of our lead OSMXXXX structure was submitted by source Y this week" possibly also "tell me when new (but non OSM) bioassay data was added to these CIDs that are within 0.90 Tanmimoto of those OSM leads" The aspect of structurally allerting (beyond designs and end products) to what synthetic steps global open teams are grappling with is, AWAK, more of challenge. I know NextMove are working on this commercialy https://www.nextmovesoftware.com/hazelnut.html but their seem to be some open standards evolving on reaction schema. The other angle is ChemSpider synthetic pages http://cssp.chemspider.com/ where groups can deposit and pick-up |
The open science prize. There's a webinar on Dec 10th - see lower down here. It's 3am Sydney time. Anyone else able to go along and see how good a fit we are and pick up tips for an application? |
@mattodd My apologies for missing the meeting on the 10th (I have been away for the past several weeks and I did not have the means to attend on that particular date.) I am back now and trying to play catch up. I was able to follow a brief exchange on Twitter between @cdsouthan and @drc007 in regards to @aclarkxyz and his open source reaction XMDS paradigm (http://cheminf20.org/2015/11/29/reactions-in-xmds-2/). I have had some exchanges with Alex Clark and I am a big fan of his mobile applications. Might this be a direction to take with the Open Science Prize since there is already a working model of the searchable reaction information? |
Hi Chase - sorry for the delay. I'm not qualified to comment on different On 22 December 2015 at 03:10, Chase Smith notifications@github.com wrote:
MATTHEW TODD | Associate Professor THE UNIVERSITY OF SYDNEY CRICOS 00026A |
@mattodd - I know that @lpatiny was already working on a more robust ELN, but since I did not know the exact status, I wanted to propose a possible Google Hangout Meeting to discuss the status. The reason that I bring this up, as was mentioned earlier, is the clock is ticking down on time to propose an entry for the Open Science Prize. I briefly discussed the reaction searching issue with Dr. Alex Clark who has already been developing some tools along this line through his company Molecular Materials Informatics. He seems interested in contributing to the proposal, so I thought it would be best to bring in some of the more knowledgeable members of the OSM to discuss. |
Hi Chase. Sounds perfect, yes. You mean this time? If so, we should try to get people together online, yes. Would you be happy to host, Chase? Available @lpatiny @drc007 @cdsouthan ? I guess at this stage we are wanting to decide on the essential features of a striking proposal - something that delivers something new and useful for health research - and something for which we can deliver a prototype as part of the first stage of the competition. I'd want to avoid too many other hypotheticals for this call (i.e. all the stuff that we'd like for OSM but don't yet have). I do think that an ELN, or something equivalent, that can identify people working on related chemistry in real time would be new and useful. i.e. a system that knows what you're working on and links you with others currently working on that science. I'm thinking (as a model) of the adverts in Gmail, but related to chemistry and therefore not as sinister. |
I'm happy to take part. |
I've forwarded this thread to Alexander Savelyev who may be might be able to provide insights into how the indigo toolkit might also be useful for reaction enumeration and searching. http://lifescience.opensource.epam.com/indigo/ |
Looks good. I'd not seen Indigo. On 14 January 2016 at 18:11, Chris Swain notifications@github.com wrote:
MATTHEW TODD | Associate Professor THE UNIVERSITY OF SYDNEY CRICOS 00026A |
@mattodd Yes, I meant to list the time that you posted (http://www.timeanddate.com/worldclock/meetingdetails.html?year=2016&month=1&day=18&hour=21&min=0&sec=0&p1=240&p2=43&p3=136). I will host on Google Hangouts as you suggested. I need to go back in to figure out how to send out the link, but I will do that in the next few days. |
It is ok for me, I will give you an update of the ELN on Monday. |
Hello everyone. I'm happy to join, but unfortunately the proposed time is too late for me. It will be 00.00 local time in Saint-Petersburg (Russia) Please let me know if I can help by giving some descriptions and notes for the Indigo toolkit |
@MedChemProf Will you post the hangouts link on here? Or did I miss a message? |
Here are the slides I will present: https://docs.google.com/presentation/d/1CMbBp9jti9qQ9hG9YsTZyygFt6ok7H8K0Pa2hBnxvcE/edit?usp=sharing |
Here is the link: https://plus.google.com/events/ccqgie0kscqjan5v08b8dk00f7c |
It looks like I am using Hangouts in the Air which might be different than Hangouts. I did not know another way to schedule the meeting in Hangouts. Please let me know if you can see the meeting. |
I'm watching you, but not sure how to activate my mic :-) On Mon, Jan 18, 2016 at 3:54 PM, Chase Smith notifications@github.com
|
?OK No idea anyone was on. ... Chase Smith, PhD 19 Foster Street | Worcester MA 01608 [cid:mcphsu_logo_c68d3913-2e4e-4db1-818e-86ca08ac55df.jpg] From: Alex Clark notifications@github.com I'm watching you, but not sure how to activate my mic :-) On Mon, Jan 18, 2016 at 3:54 PM, Chase Smith notifications@github.com
Reply to this email directly or view it on GitHubhttps://github.com//issues/364#issuecomment-172651306. |
Try: https://hangouts.google.com/hangouts/_/mlxvyemnvezx6bwpavznaqwbsya? ... Chase Smith, PhD 19 Foster Street | Worcester MA 01608 [cid:mcphsu_logo_c68d3913-2e4e-4db1-818e-86ca08ac55df.jpg] From: Alex Clark notifications@github.com I'm watching you, but not sure how to activate my mic :-) On Mon, Jan 18, 2016 at 3:54 PM, Chase Smith notifications@github.com
Reply to this email directly or view it on GitHubhttps://github.com//issues/364#issuecomment-172651306. |
Link to his slides: https://docs.google.com/presentation/d/1CMbBp9jti9qQ9hG9YsTZyygFt6ok7H8K0Pa2hBnxvcE/edit?usp=sharing ? ... Chase Smith, PhD 19 Foster Street | Worcester MA 01608 [cid:mcphsu_logo_c68d3913-2e4e-4db1-818e-86ca08ac55df.jpg] From: Alex Clark notifications@github.com I'm watching you, but not sure how to activate my mic :-) On Mon, Jan 18, 2016 at 3:54 PM, Chase Smith notifications@github.com
Reply to this email directly or view it on GitHubhttps://github.com//issues/364#issuecomment-172651306. |
A few thoughts for "Live Searching" |
Yes indeed, great scenarios. Both useful to researchers and potentially commercially valuable. The thing I like about this system is that it does not depend on people spotting connections, or text-heavy Q&A ("How do I purify this diamine?", as in known platforms like stackoverflow) but on an machine-based method of spotting similarity in real time. |
Agreed. Setting up auto-alert triggers that have useful specificity will be one of the challenges but tacklable nontherless. Note we already can select the analogues via any similarity cut against vendor sources directly in PubChem and/or likely synthetic description in patents via SureChEMBL (n.b. ZINC has just refreshed to 23 mill CIDs) |
We are still improving the system. Now as described during the hangout we can add the chemical structure from the reagents table as well as create the product based on the reaction scheme. |
hi @mattoddchem - could this be a suitable TSP project? On Fri, Jan 22, 2016 at 8:55 PM, lpatiny notifications@github.com wrote:
|
Hmm - not sure @alintheopen I think we'd need some chemical experimental content for this to count as a TSP project. However, we should put out a call for community volunteers @lpatiny - for that would you be able to create a new Issue here on OSM and write a few lines about what you are looking for for the CSS formatting (because I don't understand what you mean, so can't explain it to anyone else (which is OK) but it's important volunteers know the amount of work involved) as well as what you'd need (approximately) from ELN testers. We can then appeal to the community. |
@mattodd @lpatiny @drc007 @alintheopen @cdsouthan @aclarkxyz I created a GoogleDrive folder with the following items included regarding the OSM submission to the Open Science Prize:
For all of those interested in contributing to the proposal and project I can open the folder up to you for editing. At the moment, the shared documents are housed in my GoogleDrive account with the edit setting to 'Shared with Specific People'. I would appreciate any suggestions on how best to share this with others or all appropriately. Should this just be opened up to all for editing? Or do we have a sub-set of users? Any help appreciated and then I will change things accordingly. I was not sure how things were set-up or where documents were stored when you were writing drafts for publication. Thanks in advance. |
Hi Chase - great to get this started. I would set this document up with the setting "anyone with the link can view" and post the link to a new Issue here on Github so that we can have a separate discussion thread purely on this. Then share with specific people, enabling those people to edit. Is that OK as a first step? |
I don't have edition access to the document but here are some information about the technical part: The new ELN will be construct based on open-source MIT or BSD projects. The main projects are couchDB (http://couchdb.apache.org/) and the visualizer (https://github.com/npellet/visualizer). |
Hangout (#363) took place. Thanks to everyone for giving up their time. Timezones are a challenge, but feel free to have franchised discussions separately - but please just report back here any thoughts/ideas for those who can't make them.
(I'm pasting notes here so that anyone can correct, or supplement, or add questions. If branching action items are needed, please create those separately. Once this post remains inactive for more than a week, it can be closed.)
ChemInfo: @lpatiny outlined the many capabilities of the Cheminfo system, and some future plans related to the development of an ELN incorporating many of these features. Overarching idea is that data are stored on a server, while calculations are carried out locally on the client computer - e.g. NMR data can be stored inside an ELN, but processing can be carried out live in the browser. Cheminfo can do a number of things of immediate use for the OSM project, all the way to allowing calculations of inter-atomic distances from rendered 3D models. There is an upper size limit for browser-based activities of 50K molecules, beyond which (for e.g. visualising Chembl) a web server can be used.
Q: Different Chemical Drawing Packages: what if we’re all using different drawing packages? Luc mentioned that there is already a comprehensive file converter (is this openbabel?). The mol file is the recommended type, which can be read by everything. Can this be included in future ELNs? The idea is to make sure that chemical information, from anyone, can be read? Chemdraw is still heavily embedded in labs, and people feel happy using it to generate camera-ready figures.
Q: What is the primary identifier for molecules? @cdsouthan suggested this is the pubchem CID since it can handle complexities such as salts, ee’s. A feature @mattodd wanted to have, which would help the bench chemist, would be for a system to spot when a molecule being worked on was already associated with a pubchem ID. Could this be captured by Luc’s system?
@MedChemProf : The Workflow Problem. At the moment it's essentially impossible to find relevant synthetic chemistry people have carried out. @mattodd: This is a recurrent and major problem in OSM. We urgently need an ELN that is substructure-searchable. That this might be close is the only reason we’ve not carried on with the manual collection of synthetic data, which was impossibly cumbersome (though very useful for writing up papers). @MedChemProf : Need clearly defined workflows nonetheless - when people make molecules, or want to make molecules, what are the steps. (Do I have this right, Chase? We should e.g. do a flowchart, as a how-to but maybe somewhere like the wiki?)
Responsibility for data. The data generated by the OSM consortium are by default CC-BY. They can be anywhere, physically. However, who takes responsibility for maintaining the existence of the datasets? @lpatiny 's view is that the data are kept available through there being many copies. This is likely to be the most effective strategy. But in the case of a researcher working with OSM at University X, using public ELN Y, who accepts responsibility for making the data permanently available? Should this nontrivial problem fall to the parent institution (“As an institution carrying out primary research we guarantee the permanence of the research record”) or should OSM attempt to guarantee this (perhaps through Luc’s strategy of redundancy) by becoming a legal entity in itself and sourcing funds, then using e.g. CLOCKSS? To be discussed further. Mat: There might be other reasons why OSM should become a non-profit in its own right (longevity, ability to raise funds from more diverse sources etc) so we could talk about this later).
Desirable extras for Luc’s system:
i) When writing a procedure in English, for the system to understand which molecules are being discussed, i.e. an active link between table of reagents and the procedure.
ii) Automatic import of safety-related information from reagent table into the ELN.
iii) Auto-search of other resources, e.g. Chemspider synthetic pages, for related chemistry.
iv) @drc007 emphasized that though consideration of data location and provenance was important, the most important thing was further development of the front end of the cheminfo system, along the lines that is currently taking place.
Open Science Prize:
The pre-application to the Wellcome Trust by Mat, Luc and others for the MolTrakr idea (#361) is for a major undertaking. The full proposal to the WT would be written if the pre-app is approved (people here are very welcome to be part of that if it's of interest, by the way). The open science prize is smaller, and relies on developing a prototype in 2016. There is a heavy emphasis on proposals that "unleash the power of open content and data to advance biomedical research". A possibility discussed at the end of the meeting between Chris Swain, Chris Southan, Luc and Mat involved a variant of the “who’s making what and where?” problem. Could we develop a tool (as part of Luc’s system, or as a component of the MolTrakr platform) that, through “knowing” what chemistry is being undertaken in a med chem project, is able to connect researchers working on the same or related chemistry in real time. If, for example, a Friedel-Crafts reaction is being carried out in Sydney, a flag would go up to say that yesterday a chemist in Indiana, who also has an open lab notebook, was working on the same chemistry earlier that day. This would create a network that is social (because we want to see what other people are doing), but through raw data captured in lab notebooks. There is also the novel emphasis on reactions that are underway, rather than final compounds. It's also unlikely that anything proprietary could be used for this, since openness provides a strength - more open lab notebooks = more people able to work with you to enhance your research. There is an obvious possible "ResearchGate" model here for monetisation, along the lines of "27 people are working on this chemistry today! Sign up for details." though I would not want to pursue this as part of this prize. (Chris Southan said this reminded him of Biostars. Mat: not quite - that’s more of a Stackexchange site for expertise. We’d be talking about something different that helps you find people who are doing things related to what you’re doing, using passive, behind-the-scenes searching rather than Q&A. More like the ads that appear in Gmail, without that uneasy feeling).
Immediate ways forward: Luc’s ELN will be ready for testing at the end of 2015. However, we can already start posting data in the right format for automated indexing and extraction, via posting RXL, mol and JCAMP files. I'll make a separate issue on what’s needed from the bench chemist. Done: #365.
Not discussed in this meeting:
How best to store data in the Master Sheet? Strings are important for now, but hopefully generation of molecule-related data/strings is more automated in future.
The SGC has a ChemReg system. OSM have been offered a trial. Does anyone have any time to evaluate, as a place to store biological activity data for OSM compounds?
(unrelated, but occurred to me as we were talking). The HRMS calculator Luc showed: which molecular formulae best match the HRMS peak found. Could be adapted for elemental analysis calculator? Given % found from elemental analysis, and a suspected molecular formula of the analyte, which solvents/salts/water and in what ratio would provide the best fit to the observed data? This would automate the (sometimes rather humiliating) manual shoehorning we sometimes have to do with elemental analysis data in which we try to make the data fit some realistic combination of molecules.
The text was updated successfully, but these errors were encountered: