Skip to content

arschat/dcp_to_tier1

Repository files navigation

DCP spreadsheet to Tier 1 metadata

This is a project to convert the metadata schema from Human Cell Atlas Data Platform (formelrly known as DCP, Data Coordination Platform) to Human Cell Atlas Tier 1.

We first convert the DCP metadata spreadsheet to an intermediate flat csv file (in flat_dcp) and then we convert the flat_dcp file to tier1 based on the mapping specified in the metadata_dict.py.

Usage

To convert metadata, run the two notebooks in this sequence:

  1. flatten_dcp_metadata.ipynb to create a flatten csv version of the dcp_spreadsheet on the cell_suspension/ library level
  2. flat_dcp_to_tier1.ipynb to convert the dcp field from the flatten csv file, to the Tier 1 metadata fields (based on the mapping of fields specified on metadata_dict.py), and produce an excel file with the _Tier1.xlsx extension, and two csv files with the _tier1_uns.csv and _tier1_obs.csv

Please specify the file_name of the dcp_spreadsheet found in the dcp_spreadsheets folder, in both notebooks.

Requirements

The packages needed for these notebooks are listed in the requirements.txt file. To install via pip use:

pip install -r requirements.txt

Known limitations

  • flatten_dcp_metadata.ipynb
    • Tested only on simple experimental design (Donor organism -> Specimen from organism/ Sample -> Cell suspension/ Library -> Analysis File & Sequence file)
    • No support for "Spatial transcriptomics" data
  • flat_dcp_to_tier1.ipynb
    • Will not populate Tier 1 fields at the cell level (cell type related fields)
    • Some automations that map conditionally DCP values to Tier 1, are not yet implemented
      • institute
      • sample_collection_relative_time_point
      • cell_enrichment is not ontologised
      • sample_collection_year is not generalised to year

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published