A crowdsourcing/expert curation platform for metadata categorization
The Metadata Categorization web application allows expert biological indexers to curate a sample set drawn from NCBI BioSample data. The purpose of the annotation tool is to identify terms in the user submission that can be correlated to controlled terms from the Cell Line Ontology. In the best case, the annotator can identify the correct cell line from submitted data and then additional information about the submitted data such as cell type, tissue, and disease can be populated from the Cell Line Ontology.
Solr-based Data Sources
This project depends on two Solr cores generated for this purpose which contain the relevant data from BioSample and the Cell Line Ontology. The Solr core generated from BioSample will store the user-submitted data extracted from BioSample as well as the annotations supplied by the curators.
The back-end uses Django, a high-level Python web framework, to handle URL routing, aggregating individual records from BioSample into summary records, as well as reading from and writing to Solr.
The front-end uses the free version of Handsontable, an Excel-like composite spreadsheet component. Handsontable provides built-in editing functionality for tabular data. Dialogs and certain other features use jQuery and jQuery UI.
- Issue 3: Incorporate a recommendations feature generated from minimal text processing to provide annotators with a "best guess" at the cell line to facilitate the annotation task.
- Issue 6: Incorporate a search feature into the web interface so annotators can look up controlled terms from the Cell Line Ontology.
- Issue 7: Add "Recently Used" shortcut buttons to the web interface so that annotators can quickly access frequently used annotation terms
Individual records for "HEK293T" source cell line, shown upon clicking the "+" button at left in the HEK293T summary record. Note the user-annotated value for "Tissue" field in green. Unannotated (i.e. source) values are displayed in red.