Projects are a basic unit of data organization in the HCA Data Coordination Platform (HCA DCP). Project contributors contribute raw sequencing and associated files along with rich metadata describing:
- the origin and type of the cells used in the project
- the processes and protocols used to collect and process the cells prior to sequencing
- the sequencing methods used
- details about the project contributors and their institutions
Finding a Project of Interest
The HCA Data Explorer lists all projects on its home page along with key project metadata. The project list is filterable by the metadata values.
Viewing Project Details
Selecting a project title on the project list takes you to the project's detail page.
The project detail page contains:
the project title and description
contributor information, collaborating organizations, and project contacts
any publications or accessions associated with the project
project details such as species, organ and library construction method
counts of input, analysis and matrix files
a project metadata download
a project expression matrix download (if available)
Downloading Project Metadata
For each project, the HCA DCP maintains a project specific tsv file containing the full project metadata. The tsv contains a row for each file in the project and columns for each metadata property. Meanings of the metadata properties are listed in the HCA Metadata Dictionary.
The metadata tsv file gives a flattened representation of the projects metadata graph that can be sorted and filtered using standard spreadsheet or data manipulation tools.
The "Project Downloads" section of the project details page contains a link to download the project metadata file.
Metadata file sizes vary across projects but will generally be between 1 and 100 megabytes.
The tsv file is named after the project and includes the date and time the file was created. For example:
CD4+ cytotoxic T lymphocytes 2019-07-19 19.09.tsv
A partial example of a tsv file is listed below:
Downloading Project Expression Matrices
For projects with supported library construction approaches, the project detail page will also contain a link to download expression matrices pre-generated for the project by the HCA Matrix Service.
The rows in the expression matrix represent cells, columns give the expression value for the column's gene.