Skip to content

Latest commit

 

History

History
67 lines (48 loc) · 6.01 KB

README.md

File metadata and controls

67 lines (48 loc) · 6.01 KB

Transcriptional signatures of perturbation from LINCS L1000

DOI

Python analysis of the LINCS L1000 data.

The repository consists of python notebooks which are executed in the following order:

  1. api.ipynb retreives metadata from the L1000 API. Retrieved data is converted into a dataframe and saved as a tsv. Files are created for perturbations, signatures, cells, and probes.
  2. database.ipynb creates a SQLite database containing the metadata retrieved from the API. Data cleaning occurs here. The database resides at data/l1000.db but is ignored due to file size. However, the populated database is available on figshare.
  3. unichem.ipynb maps compounds to external databases and adds the mapping to the database. See this comment for more information.
  4. chemical-similarity.ipynb computes chemical similarities between compounds and adds these similarities to the database.
  5. consensi.ipynb computes consensus signatures for each perturbagen. The following consensus files are created:
  1. significance.ipynb converts consensus z-scores into significant up/down-regulation values. The following files are created:

See this comment for more information on steps 5 & 6.

Note: This is not an official LINCS L1000 repository. Users are warned that our modifications may have introduced errors or removed signal that was present the original data.

Inputs

This repository depends on modzs.gctx — a legacy probe × signature matrix of differential expression z-scores. Due to large file size (42.5 GB) this file is not uploaded to GitHub. To recreate this analysis rather than just use the results, users should retrieve modzs.gctx from figshare and place it in the download directory.

Citation

See the Transcriptional signatures of perturbation from LINCS L1000 section of the Rephetio manuscript for the final description of this work. Citations related to this repository are below:

  1. Systematic integration of biomedical knowledge prioritizes drugs for repurposing
    Daniel Scott Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, Sergio E Baranzini
    eLife (2017-09-22) https://doi.org/cdfk
    DOI: 10.7554/elife.26726 · PMID: 28936969 · PMCID: PMC5640425

  2. Consensus signatures for LINCS L1000 perturbations
    Daniel Himmelstein, Leo Brueggeman, Sergio Baranzini
    Figshare (2016-03-08) https://doi.org/f3mqvs
    DOI: 10.6084/m9.figshare.3085426.v1

  3. dhimmel/lincs v2.0: Refined Consensus Signatures From Lincs L1000
    Daniel Himmelstein, Leo Brueggeman, Sergio Baranzini
    Zenodo (2016-03-08) https://doi.org/f3mqvr
    DOI: 10.5281/zenodo.47223

  4. Computing consensus transcriptional profiles for LINCS L1000 perturbations
    Daniel Himmelstein, Caty Chung
    ThinkLab (2015-03-26) https://doi.org/f3mqwc
    DOI: 10.15363/thinklab.d43

Environment

Create the conda environment for this repository using:

conda env create --file environment.yml

License

All original content in this repository is released under CC0 1.0. LINCS data and derivatives are released under CC BY 4.0 — please refer to the LINCS data policy and attribute this repository and LINCS L1000.