Skip to content

Commit

Permalink
Update methods.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ekiernan committed Mar 10, 2020
1 parent 98b5450 commit bc72e6d
Showing 1 changed file with 6 additions and 3 deletions.
9 changes: 6 additions & 3 deletions content/releases/2020-mar/methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,17 @@ title: "Methods"
draft: true
---

# HCA March 2020 Release Methods
# March 2020 Release Methods
## Overview
This document details the Human Cell Atlas (HCA) Data Coordination Platform’s (DCP) methods for cell clustering, differential expression analyses, and data visualization used in the HCA March 2020 Data Release. Overall, 13 individual DCP projects were stratified into 24 datasets by organ, developmental stage and sample processing technology. Unnormalized count matrices for each dataset were uploaded into the cloud-based platform [Terra](app.terra.bio) and analyzed using Cumulus (v0.13.0), a single-cell analysis workflow ([Li et al. 2019](https://www.biorxiv.org/content/10.1101/823682v1)). All Release files are available for download on the [main Data Release Page](data.humancellatlas.org/explore/releases/2020-mar)

A step-by-step tutorial for replicating the release analyses is provided in the [Release documentation](data.humancellatlas.org/releases/2020-mar/replicating-the-release-analysis). Additionally, you can find techniques for manipulating and interacting with release files in the [Working with Release Files guide](data.humancellatlas.org/releases/2020-mar/working-with-release-files).


## Project stratification and unnormalized count matrix preparation
The March 2020 Release includes all human DCP projects that were processed with DCP standardized pipelines (Optimus or Smart-seq2). Each project was stratified into individual datasets by organ and when applicable, by developmental stage (adult or fetal) or by sample processing technology (10x or Smart-seq2). Unnormalized gene count matrices (loom files) for the stratified Release datasets were obtained by filtering projects on the [DCP Data Portal](data.humancellatlas.org/explore/) using the stratification criteria above. The metadata in each count matrix file was then modified to include new ontology labels and corrections to existing ontology labels. Additionally, due to a processing error, all EmptyDrops data was removed from files produced with the Optimus pipeline. All updated unnormalized matrix files (loom format) used for the March 2020 Release are available for download under the Release Files column of the [DCP Portal’s Data Release page](data.humancellatlas.org/explore/releases/2020-mar) (see the Pipeline Input files section).
The March 2020 Release includes all human DCP projects that were processed with DCP standardized pipelines (Optimus or Smart-seq2). Each project was stratified into individual datasets by organ and when applicable, by developmental stage (adult or fetal) or by sample processing technology (10x or Smart-seq2). Unnormalized gene count matrices (loom files) for the stratified Release datasets were obtained by filtering projects on the [DCP Data Portal](data.humancellatlas.org/explore/) using the stratification criteria above.

The metadata in each count matrix file was modified to include new ontology labels and corrections to existing ontology labels. Additionally, due to a processing error, all EmptyDrops data was removed from files produced with the Optimus pipeline. All updated unnormalized matrix files (loom format) used for the March 2020 Release are available for download under the Release Files column of the [DCP Portal’s Data Release page](data.humancellatlas.org/explore/releases/2020-mar) (see the Pipeline Input files section).

## Dataset IDs
Each dataset was given a unique ID with a “2020-Mar...” prefix. All dataset IDs are listed on the March 2020 Release page in the “Dataset” column. This Dataset ID was used to name all input and output files relevant to each dataset.
Expand Down Expand Up @@ -125,7 +127,7 @@ All final annotated expression matrices are available for download on the DCP [M


## Final March 2020 Release files
The following table describes the final Release files available in the DCP Release page and in interactive portals. Files with a “.scp” demarcation are Single Cell Portal-specific files and are only available on the Single Cell Portal study page (see the [Single Cell Portal March 2020 Release Page](https://singlecell.broadinstitute.org/single_cell?scpbr=human-cell-atlas-march-2020-release). All other files, including the output loom, h5ad, and differential expression xlsx and CSV files are available for download on the dataset-specific pages listed on the [DCP Release page] (data.humancellatlas.org/explore/releases/2020-mar). All output file names start with the Dataset ID, the unique ID given to each release dataset and is listed on the DCP Release page.
The following table describes the final Release files available in the DCP Release page and in interactive portals. Files with a “.scp” demarcation are Single Cell Portal-specific files and are only available on the Single Cell Portal study page (see the [Single Cell Portal March 2020 Release Page](https://singlecell.broadinstitute.org/single_cell?scpbr=human-cell-atlas-march-2020-release). All other files, including the output loom, h5ad, and differential expression xlsx and CSV files are available for download on the dataset-specific pages listed on the [DCP Release page](data.humancellatlas.org/explore/releases/2020-mar). All output file names start with the Dataset ID, the unique ID given to each release dataset and is listed on the DCP Release page.

| File name | File location: DCP) and/or SCP | Description | Format |
| --- | --- | --- | --- |
Expand Down Expand Up @@ -158,3 +160,4 @@ Techniques for uploading loom and h5ad files into common analysis software are d




0 comments on commit bc72e6d

Please sign in to comment.