Project data is growing by the day and its important that the team find a way to archive, share and visualise data efficiently - ideally without team members requiring additional software. See also #99
A couple of suggestions on this topic:
1) Using a Google Docs/Dropbox system to store relevant documents such as literature, protocols, summaries or spreadsheets. Within this directory there should be folders for easier browsing. Links to these files can be generated and posted in GitHub / blog. Anyone outside the team can have view-only access, but only active member of the project are granted editing rights.
2) OSM numbering system. It is a bit tricky to keep track of the compounds made in-house IMO. A simple way of fixing this issue would be to register compounds in a master file. This could be done via Google Docs too. The spreadsheet would provide the OSM number, a batch number, project ID and chemist ID of every compound made by the group. One OSM number can have several batch numbers. Possibility to register synthetic intermediates.
This could be an addition to the visualization options mentioned in #99
In the long run you are better off with a robust compound registration system e.g. http://www.chemaxon.com/products/compound-registration/
And @sabinllm - the current system is that each compound has an ID when it is synthesised. If two chemists make the same molecule then we may have one molecule with two IDs. However, as soon as a molecule is sent for biological evaluation we have one fixed OSM number, including a middle initial to denote the city in which it was first made. A Google sheet was found to be no good for this because we wanted structures in there and the sheet couldn't handle that many images. If there were a simple way to visualise the sd file, we would solve many of these issues, though the sd file does not capture synthesised compounds (those not yet evaluated). For that - we probably need to be ruthless with our pasting of SMILES/InChI data in the ELN, meaning all examples of those molecules could be found. But Sabin - what else is difficult about the OSM numbering system?
On a related note about visualisation - Aaron Hart has been playing with the Chembl data @madgpap
@mattodd Right now there are MMV, OSM and personal lab notebook IDs. The purpose of the Google spreadsheet is to tie up all this information in one place, allowing the chemists to number their compounds in an orderly fashion, not to visualize them. This online spreadsheet is constantly up-to-date. I am aware that it is not an ideal system, but allows everyone to input data and it's free ;) Forgive my ignorance if such system is already in place.
@mattodd @cdsouthan The online visualization options mentioned so far, such as http://apps.ideaconsult.net:8080/ambit2/dataset/1898501?page=0&pagesize=100 or Chemicalizing this http://malaria.ourexperiment.org/osm_procedures/5402/OSM_Compound_List.html give a nice overview of the project to the general public, but do no allow for thorough data analysis and substructure searches.
In addition to these options, I would suggest placing an SDF file (managed by project administrator/s) in a shared folder which would allow everyone to download a local copy and scrutinize the database using appropriate software. Again, cloud storage might not be an ideal option, but just a temporary fix until a better online browsing alternative is found.
I've created a new repo for this project/discussion here.
Have added @egonw and @madgpap to https://github.com/OpenSourceMalaria/visualisation - @sabinllm , do you want to be added too? @miike - is that how this works, that I need to add people so that they see posts there?
Seems to me the public sdf, followed by methods of visualising it, is the way to go. It would also seem that this could take care of any compounds numbering system we wished to employ.
@mattodd please do!
Closing issue because of action in more recent Issues linked above.