# LINCS Data Portal 3.0 Documentation

## Table of contents
1. [About LDP3](#About-LDP3-)
2. [Computing the L1000 Signatures](#Computing-the-L1000-Signatures-)
3. [Benchmarking the L1000 Signatures](#Benchmarking-the-L1000-Signatures-)
4. [Web Interface](#Web-Interface-)<br>
    a. [Signature Search](#Signature-Search-)<br>
    b. [Metadata Search](#Metadata-Search-)<br>
    c. [Metadata Pages](#Metadata-Pages-)<br>
    d. [Download Page](#Download-Page-)<br>
    e. [API Page](#API-Page)<br>
4. [Using the APIs](#Using-the-APIs-)<br>
    a. [Full-text metadata search](#Full-text-metadata-search-)<br>
    b. [Filtering by metadata fields](#Filtering-by-metadata-fields-)<br>
    c. [Finding available fields](#Finding-available-fields-)<br>
    d. [Counting the results](#Counting-the-results-)<br>
    e. [Fetching the value counts](#Fetching-the-value-counts-)<br>
    f. [Performing Signature Search](#Performing-Signature-Search-)<br>
    g. [Listing available Data API Databases](#Listing-available-Data-API-Databases-)<br>
5. [References](#References-)
   

## About LDP3 <a name="about"></a>
The LINCS Data Portal 3.0 (LDP3) allows users to query the datasets and signatures generated the LINCS consortium. LDP3 is built over Signature Commons (Figure [1](#architecture)) Platform and which provides us with REST APIs, and the dedicated website for exploring LINCS data. In this documentation, we'll provide details on how the data was processed, ingested, and benchmarked, as well as provide tutorials on how to use the website and the APIs.

![sigcom_architecture.jpg](attachment:fig_2a_sigcom_architecture.jpg)
**Figure 1** Signature Commons Architecture <a name="architecture"></a>

## Computing the L1000 Signatures <a name="computing"></a>

Level 3 L1000 gene expression profile datasets were downloaded from CLUE.io (https://clue.io/) on June 2 2021, and divided into batches based on the batch ID of each profile. For example, profiles corresponding to the IDs LJP008_NEU_24H_X1_B21:N15 and LJP008_NEU_24H_X2_B21:O06 would both belong to the batch LJP008_NEU_24H. 

Using the Level 5 signature metadata file, also downloaded from CLUE.io on June 2 2021, the replicate profile IDs corresponding to each signature were identified. For each signature, the signature was computed using the Characteristic Direction method [1], which compares the corresponding replicate profiles (the “perturbations”) to all other profiles in the batch to which the signature belongs (the “controls”). The resulting signature is a vector of coefficients representing the differential expression of each gene in the perturbation profiles when compared to the control profiles. 

All signatures may be accessed individually via the persistent_id field in the metadata for each signature. Compiled signatures by perturbation type can also be accessed from the Downloads page. 

## Benchmarking the L1000 Signatures <a name="benchmarking"></a>
The characteristic direction signatures were benchmarked against three other differential gene expression characterization methods: fold change, limma [2], and MODZ [3]. Six signatures corresponding to the same perturbation conditions were chosen for benchmarking, the chosen perturbation being 10 uM of dexamethasone at 24 hours of exposure in the A549 cell line. Each signature was computed four times, using each of the four differential gene expression characterization methods, including characteristic direction. The top up- and down-regulated genes were identified for each method applied to each signature, and then compared to the NR3C1/Glucocorticoid Receptor Target Gene Set from the ENCODE ChIP-seq library in ChEA3 [4], as well as to manually computed up and down gene sets from all dexamethasone-related GEO studies in CREEDS (Wang et al. 2016). Overlap between the signature-ranked genes and the gene sets was visualized in bridge plots. 

The plots below show the results after the signatures computed using each method from all six signatures were averaged in order to generate a single line corresponding to a single differential gene expression characterization method. In general, the characteristic direction method performs better than the MODZ method, and is at least comparable to fold change and limma. It is important to note that the L1000 assay only captures about 12,000 genes, and produces expression values that differ from RNA-seq expression values even for the same gene. 



![bechmarking.png](attachment:bechmarking.png)
**Figure 2** Benchmarking L1000 Signatures <a name="benchmark"></a>

## Web Interface <a name="interface"></a>
### Signature Search <a name="sigsearch"></a>
![signature_search.png](attachment:signature_search.png)
**Figure 3** Signature Search Workflow

LDP 3.0's signature search allows users to perform signature similarity search across over 1.5 million L1000 signatures. Users can input up and down genes which will be validated against all the genes in the metadata database. This automated validation also provides synonym suggestions for non-standard gene names. Enrichment analysis is performed against seven LINCS L1000 Datasets, with each dataset corresponding to a specific perturbation type. The top mimicker and reverser signatures are then displayed to the users as bar plots.

<img src="img/fig_3a2_sigsearch_validation.png" style="float: left; width: 50%; margin-right: 25px"></img><br/><br/><br/><br/><br/>**Figure 4** Genes are validated automatically against the genes in the metadata database. Genes are labeled based on their validity and suggestions are suggested for non-standard gene names

<img src="img/fig_3b_coexpression.png" style="float: right; width: 50%; margin-leftt: 25px"></img><br/><br/><br/><br/><br/><br/>**Figure 5**
Users can use a gene and (1) convert its co-expressed genes [5] to up and down input gene sets, (2) find signatures with the gene as its perturbagen, or (3) view signatures that maximally up or down regulate the gene ([more info](#metasearch)).

<img src="img/results.png" style="float: left; width: 50%; margin-right: 25px"></img><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/>**Figure 6** The top mimicker and reverser signatures for each dataset are returned as bar charts. Hovering over a barchart opens a tooltip that displays the signature name as well as the z-scores.

<img src="img/expanded.png" style="float: right; width: 50%; margin-leftt: 25px"></img><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/>**Figure 7**
Each column from Figure 6 are expandable. Upon expansion, users can view the top mimickers and reversers in both bar and tabular formats. A dedicated search bar is also available for filtering the search results by term. Download buttons are also provided for the both tables.

<img src="img/scatterplot.png" style="float: left; width: 50%; margin-right: 25px"></img><br/><br/><br/><br/><br/><br/>**Figure 8** Users can alernatively view the results on a scatter plot with the z-up and z-down scores as the x and y axis respectively. Signatures located on the first quadrant are the mimickers while those in the third quadrant are the reversers. Hovering over a node shows the metadata of the signature as well as the z-scores. Users can change how the nodes are colored by clicking on the radio buttons at the right. Furthermore, users can perform signature search against signatures with specific perturbagen either by (1) clicking on the top perturbagen, (2) searching for a perturbagen using the text field, or (3) clicking on a node with the desired perturbagen.

### Metadata Search <a name="metasearch"></a>
<img src="img/metasearch.png" style="float: left; width: 50%; margin-right: 25px"></img><br/><br/><br/><br/><br/><br/><br/><br/><br/>**Figure 9** LDP 3.0 also includes metadata search functionality that allows users to search for datasets, signatures, or genes by terms. This can be an assay, a cell line, or even a gene name. The search results can be further refined by using the filters on the right.

<img src="img/options.png" style="float: right; width: 50%; margin-leftt: 25px"></img><br/><br/>**Figure 10**
For L1000 signature results, users have the option to download them as (1) full rank signature file, or (2) a GMT file containing the top up- and down-regulated genes. Furthermore, users have the option to use these top up- and down-regulated genes as input for signature search, or send them to Enrichr for enrichment [6-8].

<img src="img/insignia.png" style="float: left; width: 50%; margin-right: 25px"></img><br/><br/><br/>**Figure 11** LINCS Datasets were assessed for FAIRness [9] and LDP 3.0 shows these assessments as FAIRshake insignias [10]. Download buttons are also provided to download the whole dataset.

### Metadata Pages <a name="metapage"></a>
When users click the results from the metadata search, they are brought to the respective metadata page of the entry. Depending on the model, this page may display the metadata of the entry as well as the entry's children (signatures of a dataset, genes of a signature, etc)
#### Gene Pages <a name="genepage"></a>
<img src="img/gene.png" style="float: left; width: 50%; margin-right: 25px"></img><br/><br/><br/><br/><br/><br/>**Figure 12** Gene pages contains the signature that maximally up- or down-regulate the gene. Results can be filtered by DSGC, dataset, cell line, or perturbagen.

#### Signature Pages <a name="sigpage"></a>
<img src="img/signature.png" style="float: right; width: 50%; margin-leftt: 25px"></img><br/><br/><br/><br/><br/><br/><br/><br/>**Figure 13**
Signature pages displays the available metadata for the signature as well as the top up- and down-regulated genes.

#### Dataset Pages <a name="datasetpage"></a>
<img src="img/dataset.png" style="float: left; width: 50%; margin-right: 25px"></img><br/><br/><br/><br/><br/><br/>**Figure 14** Dataset pages shows the metadata of the dataset along with the relevant download links, as well as the signatures (if any) that is under that dataset.

#### DSGC Pages <a name="dsgcpage"></a>
<img src="img/dsgc.png" style="float: right; width: 50%; margin-leftt: 25px"></img><br/><br/><br/><br/><br/><br/><br/><br/>**Figure 15**
DSGC pages can be accessed from the DSGCs page and displays the datasets that a DSGC have produced.

### Download Page <a name="download"></a>
The processed LINCS data can be easily downloaded by going to the download page. Users can download the coefficient tables, GMT files of the up and down gene sets, as well as the predicted RNA-Seq profiles using CycleGAN.
![download.png](attachment:download.png)
**Figure 16** Download Page

### API Page<a name="api"></a>
LDP 3.0's REST APIs are documented with smartAPI [11]. These documentation can be viewed by going to the API page.
![api.png](attachment:api.png)
**Figure 17** API page

The next section provides examples on how to use the REST APIs.

## Using the APIs <a name="usage"></a>
In this section, we describe the the two REST APIs provided by LDP 3.0 for users who want to programmatically access LINCS data. The metadata api (https://ldp3.cloud/metadata-api) provides fast full-text searches as well as queries for data aggregation. Metadata searches are structured using [LoopBack 4 queries](https://loopback.io/doc/en/lb4/Where-filter.html). The data-api (https://ldp3.cloud/data-api), on the other hand, provides enrichment analysis against LINCS signatures.

Here, we'll explore several use cases and provide Python code that users can use as is or as a template for more complex queries. To start, make sure you have the [request](http://docs.python-requests.org/en/latest/) library installed via pip. For more information regarding the LDP 3.0 API, you can check: https://ldp3.cloud/#/API.

### Full-text metadata search <a name="fulltext"></a>

Users can utilize the full-text search capabilities of LDP 3.0 to search for datasets, signatures, or genes using the following url `https://ldp3.cloud/metadata-api/<model>/find`. Below are the mappings for the model:
- DSGCs: resources
- Datasets: libraries
- Signatures: signatures
- Genes: entities

Suppose we want to search for datasets (libraries) that contains the word ***proteomics***. We structure our query like this:

In [1]:
import requests
import json

API_URL = "https://ldp3.cloud/metadata-api/libraries/find"

payload = {
    "filter": {
        "where": {
            "meta": {
                "fullTextSearch": "proteomics"
            }
        },
        "limit": 2
    }
}

res = requests.post(API_URL, json=payload)
results = res.json()

print(json.dumps(results, indent=2))

[
  {
    "$validator": "/dcic/signature-commons-schema/v5/core/library.json",
    "id": "b745f1cf-cd38-4c38-ace3-bdfa1e224586",
    "resource": "09d107aa-4273-43c4-8f03-f985ff9041d1",
    "dataset": "LINCS targeted proteomics signatures",
    "dataset_type": "rank_matrix",
    "meta": {
      "icon": "static/images/lincs/lincs-pccse.png",
      "libraryID": "LIB_9",
      "$validator": "/@dcic/signature-commons-schema/core/unknown.json",
      "libraryInfo": "Signatures of perturbations assayed by P100 against 96 phosphopeptide probes and GCP assay against ~60 probes that monitor combinations of post-translational modifications on histones. The data is generated by using mass spectrometry techniques to characterize proteome level molecular signatures of responses to small molecule and genetic perturbations in a number of different cell lines.",
      "libraryName": "LINCS targeted proteomics signatures",
      "sigidPrefix": "LINCSTP_",
      "displayOrder": 5,
      "signatureCount":

### Filtering by metadata fields <a name="filtering"></a>

LDP 3.0 stores its metadata as semi-structured JSON serialized entries. Because of this, we can also filter via metadata fields. Here we show how to find signatures that are perturbed using ***CRISPR Knockdown***:

In [2]:
import requests
import json

API_URL = "https://ldp3.cloud/metadata-api/signatures/find"

payload = {
    "filter": {
        "where": {
            "meta.pert_type":  "CRISPR Knockdown"
        },
        "limit": 2
    }
}

res = requests.post(API_URL, json=payload)
results = res.json()

print(json.dumps(results, indent=2))

[
  {
    "$validator": "/dcic/signature-commons-schema/v5/core/signature.json",
    "id": "75165485-46d8-5940-8e96-b17bdb9a65a7",
    "library": "96c7b8c5-1eca-5764-88e4-e4ccaee6603f",
    "meta": {
      "md5": "4ad3feda54fccf953b6036c0dbdb279b",
      "sha256": "ad6d68ab31a99681b887d994a2470166c5e900f69d85495316b0b92ea83fdd90",
      "tissue": "prostate gland",
      "anatomy": "UBERON:0002367",
      "cmap_id": "XPR012_PC3.311B_96H:J07",
      "version": 1,
      "filename": "L1000_LINCS_DCIC_XPR012_PC3.311B_96H_J07_BFSP2.tsv",
      "local_id": "XPR012_PC3.311B_96H_J07_BFSP2",
      "cell_line": "PC3",
      "pert_name": "BFSP2",
      "pert_time": "96 h",
      "pert_type": "CRISPR Knockdown",
      "$validator": "https://raw.githubusercontent.com/MaayanLab/sigcom-lincs/main/validators/l1000_signatures.json",
      "data_level": 5,
      "creation_time": "2021-05-24",
      "persistent_id": "https://lincs-dcic.s3.amazonaws.com/LINCS-sigs-2021/cd/xpr/L1000_LINCS_DCIC_XPR012_PC3.31

### Finding available fields <a name="keycount"></a>
We mentioned previously that LDP 3.0 stores metadata as semi-structured JSON serialized entries. This means that although JSON serialization provides opportunities for diverse metadata fields, we still follow certain structures for LDP 3.0 which are defined [here](https://github.com/MaayanLab/sigcom-lincs/tree/main/validators). These structures allow us to perform queries like the previous example as we are sure that the field `meta.pert_type` exists, and is in fact [required](https://github.com/MaayanLab/sigcom-lincs/blob/a0e6403a668e8397f190fa4e9f398dae7485f0cb/validators/l1000_signatures.json#L94). An easy way to get the available fields for querying without going through the validators is to fetch them using `https://ldp3.cloud/metadata-api/<model>/key_count`. This returns the available available keys, and how many entries have that field. For example, we want to view the available search keys for the genes (entities):

In [3]:
import requests
import json

API_URL = "https://ldp3.cloud/metadata-api/entities/key_count"

payload = {
    "limit": 15
}

res = requests.get(API_URL, params={"filter": json.dumps(payload)})
results = res.json()

print(json.dumps(list(results.keys()), indent=2))

[
  "meta.synonyms",
  "id",
  "uuid",
  "meta.$validator",
  "meta.geneid",
  "meta.maplocation",
  "meta.nomstatus",
  "meta.taxid",
  "meta.namenomauth",
  "meta.dbxrefs",
  "meta.description",
  "meta.locustag",
  "meta.symbol",
  "meta.chromosome",
  "meta.symbonomauth"
]


### Counting the results <a name="count"></a>

The count endpoint `https://ldp3.cloud/metadata-api/<model>/count` accepts GET requests to count the number of entries in a model. Users can also pass a `where` filter to get a filtered count. Here we show how to get the number of datasets (libraries) that use the ***L1000 mRNA profiling assay***.

In [4]:
import requests
import json

API_URL = "https://ldp3.cloud/metadata-api/libraries/count"

payload = {
    "meta.assay": "L1000 mRNA profiling assay"
}

res = requests.get(API_URL, params={"where": json.dumps(payload)})
results = res.json()

print(json.dumps(results, indent=2))

{
  "count": 19
}


### Fetching the value counts <a name="valuecount"></a>
The value count endpoint `https://ldp3.cloud/metadata-api/<model>/value_count` is used to count the values of a specific field. This is particularly useful for getting the top assays, cell line, perturbations of a model. Below we show how to get the top 25 cell lines of the signatures perturbed with ***dexamethasone***.

In [5]:
import requests
import json

API_URL = "https://ldp3.cloud/metadata-api/signatures/value_count"

payload = {
    "where": {
        "meta.pert_name":  "dexamethasone"
    },
    "fields": ["meta.cell_line"],
    "limit": 25
}

res = requests.get(API_URL, params={"filter": json.dumps(payload)})
results = res.json()

print(json.dumps(results, indent=2))

{
  "meta.cell_line": {
    "MCF7": 46,
    "A549": 39,
    "PC3": 34,
    "A375": 33,
    "HA1E": 27,
    "HT29": 23,
    "YAPC": 16,
    "HTB-22": 14,
    "VCAP": 14,
    "CRL-1619": 13,
    "CRL-1435": 13,
    "JURKAT": 12,
    "MDAMB231": 12,
    "MCF10A": 12,
    "HPTEC": 12,
    "HEK293": 12,
    "HELA": 12,
    "HCC515": 11,
    "U2OS": 11,
    "HTB-38": 10,
    "NPC": 10,
    "THP1": 8,
    "HEPG2": 8,
    "CCL-185": 7,
    "CCL-2": 6
  }
}


### Performing Signature Search <a name="sigsearch-api"></a>
Because the data API is separate from the metadata API, signature search is a multi-step process:

#### Step 1: Convert gene names to UUIDs using metadata API

In [6]:
import requests
import json

METADATA_API = "https://ldp3.cloud/metadata-api/"
DATA_API = "https://ldp3.cloud/data-api/api/v1/"

input_gene_set = {
    "up_genes": ["TARBP1", "APP", "RAP1GAP", "UFM1", "DNAJA3", "PCBD1", "CSRP1"],
    "down_genes": ["CEBPA", "STAT5B", "DSE", "EIF4EBP1", "CARD8", "HLA-DMA", "SERPINE1"]
}

all_genes = input_gene_set["up_genes"] + input_gene_set["down_genes"]

payload = {
    "filter": {
        "where": {
            "meta.symbol": {
                "inq": all_genes
            }
        },
        "fields": ["id", "meta.symbol"]
    }
}
res = requests.post(METADATA_API + "entities/find", json=payload)
entities = res.json()

for_enrichment = {
    "up_entities": [],
    "down_entities": []
}

for e in entities:
    symbol = e["meta"]["symbol"]
    if symbol in input_gene_set["up_genes"]:
        for_enrichment["up_entities"].append(e["id"])
    elif symbol in input_gene_set["down_genes"]:
        for_enrichment["down_entities"].append(e["id"])

print(json.dumps(for_enrichment, indent=2))

{
  "up_entities": [
    "41f5b538-2b08-4286-86f3-3e88470ed141",
    "8088700c-6942-4c4b-8ee2-3dc67b7a1a45",
    "3d768c6e-b3fa-4dd2-b740-540e03e29384",
    "e9e7aaca-1b14-4f1b-af61-6deb85d90399",
    "c52c0f4f-4696-4d91-8c15-c9245546787a",
    "e91105d5-a9a7-4a39-bff4-a8cbe685bc08",
    "4c4bfc3f-5455-4327-bfbe-3dcd3cc8091b"
  ],
  "down_entities": [
    "566ef317-6636-4993-8a5b-341b00c70498",
    "d23d4236-6a13-4bda-9c53-f5b6e0f13263",
    "ca7488f8-3265-4872-8bea-67a571cfa23d",
    "a743b103-c306-48d5-a3cf-36a2fedd379e",
    "5fc61a62-f46d-4a15-953e-90daca7e73d2",
    "c70cac7b-35df-41b5-9295-fa6141071c7a",
    "16c838f0-f8f7-401f-919a-30b139a7d7cb"
  ]
}


#### Step 2 Perform signature search using the ranktwosided endpoint of the data API
For general use, the ranktwosided endpoint takes the up and down entities, a limit, and the database you want to use for enrichment. Getting the available databases for enrichment is discussed in the next section. The data API returns the top matching signatures along with the enrichment scores, ranked by the absolute product of z-up and z-down. A positive z-up means that the genes in the up gene set are positioned on the top of the ranking, meanwhile, a positive z-down means that the genes in the down gene set are positioned in the bottom of the ranking. We can optionally multiply z-down and direction down with $-1$ to be consistent with the scatter plots on LDP3. Using this convention, we define reversers as having negative z-up and positive z-down while mimickers have positive z-up and negative z-down.

In [7]:
query = {
    **for_enrichment,
    "limit": 10,
    "database": "l1000_xpr"
}

res = requests.post(DATA_API + "enrich/ranktwosided", json=query)
results = res.json()

# Optional, multiply z-down and direction-down with -1
for i in results["results"]:
    i["z-down"] = -i["z-down"]
    i["direction-down"] = -i["direction-down"]
print(json.dumps(results, indent=2))

{
  "queryTimeSec": 0.357,
  "results": [
    {
      "direction-up": -1,
      "fdr-down": 0.08597436403632452,
      "logp-avg": 6.645764878704664,
      "logp-fisher": 13.298411272025652,
      "p-up": 0.0011918982073126916,
      "uuid": "cb03f2fd-2ce7-5ee7-85f1-dff255d942b0",
      "z-down": 3.1932109963550976,
      "fdr-up": 0.08631352888700067,
      "z-up": -3.2408415054818382,
      "p-down": 0.0014071299274773796,
      "p-down-bonferroni": 1,
      "p-up-bonferroni": 1,
      "direction-down": 1
    },
    {
      "direction-up": 1,
      "fdr-down": 0.08597436403632452,
      "logp-avg": 5.454303139526856,
      "logp-fisher": 13.42078558581136,
      "p-up": 0.008378598563641315,
      "uuid": "a1d5e0de-4a40-5523-998b-145a1406ff3f",
      "z-down": -3.7496799355724173,
      "fdr-up": 0.08631352888700067,
      "z-up": 2.6364231028355904,
      "p-down": 0.00017711506152195966,
      "p-down-bonferroni": 1,
      "p-up-bonferroni": 1,
      "direction-down": -1
    },
   

#### Step 3: Resolve signature UUIDs using the metadata API
We can now resolve the UUIDs of the returned signatures using the metadata API

In [8]:
sigids = {i["uuid"]: i for i in results["results"]}

payload = {
    "filter": {
        "where": {
            "id": {
                "inq": list(sigids.keys())
            }
        }
    }
}

res = requests.post(METADATA_API + "signatures/find", json=payload)
signatures = res.json()

## Merge the scores and the metadata
for sig in signatures:
    uid = sig["id"]
    scores = sigids[uid]
    scores.pop("uuid")
    sig["scores"] = scores

print(json.dumps(signatures, indent=2))

[
  {
    "$validator": "/dcic/signature-commons-schema/v5/core/signature.json",
    "id": "0d7566c1-d8b7-53e6-b987-b9dc58d2ea54",
    "library": "96c7b8c5-1eca-5764-88e4-e4ccaee6603f",
    "meta": {
      "md5": "412d5750f07e026a92db9c3ebc4ddd16",
      "sha256": "6c7b8ec5000429ce3b2572440a7343b928142ca1ac40edb47f1e6c2fc2a3ad3d",
      "tissue": "skin of body",
      "anatomy": "UBERON:0002097",
      "cmap_id": "XPR045_HS944T.311_96H:M15",
      "version": 1,
      "filename": "L1000_LINCS_DCIC_XPR045_HS944T.311_96H_M15_BRDN0001145659.tsv",
      "local_id": "XPR045_HS944T.311_96H_M15_BRDN0001145659",
      "cell_line": "HS944T",
      "pert_name": "BRDN0001145659",
      "pert_time": "96 h",
      "pert_type": "CRISPR Knockdown",
      "$validator": "https://raw.githubusercontent.com/MaayanLab/sigcom-lincs/main/validators/l1000_signatures.json",
      "data_level": 5,
      "creation_time": "2021-05-26",
      "persistent_id": "https://lincs-dcic.s3.amazonaws.com/LINCS-sigs-2021/cd/

### Listing available Data API Databases <a name="listdata"></a>
To view the available databases for signature search, users can use the `/listdata` endpoint

In [9]:
import requests
import json

API_URL = "https://ldp3.cloud/data-api/api/v1/listdata"

res = requests.post(API_URL)
databases = res.json()

for_enrichment = {
    "up_entities": [],
    "down_entities": []
}

print(json.dumps(databases, indent=2))

{
  "repositories": [
    {
      "datatype": "rank_matrix",
      "uuid": "LINCS chemical perturbagen signatures"
    },
    {
      "datatype": "rank_matrix",
      "uuid": "LINCS consensus gene (CGS) knockdown signatures"
    },
    {
      "datatype": "rank_matrix",
      "uuid": "l1000_cp"
    },
    {
      "datatype": "rank_matrix",
      "uuid": "l1000_aby"
    },
    {
      "datatype": "rank_matrix",
      "uuid": "l1000_lig"
    },
    {
      "datatype": "rank_matrix",
      "uuid": "l1000_xpr"
    },
    {
      "datatype": "rank_matrix",
      "uuid": "LINCS gene overexpression signatures"
    },
    {
      "datatype": "rank_matrix",
      "uuid": "l1000_oe"
    },
    {
      "datatype": "rank_matrix",
      "uuid": "l1000_shRNA"
    },
    {
      "datatype": "rank_matrix",
      "uuid": "l1000_siRNA"
    }
  ]
}


### References <a name="ref"></a>
[1] Clark NR, Hu KS, Feldmann AS, Kou Y, Chen EY, Duan Q, Ma'ayan A. The characteristic direction: a geometrical approach to identify differentially expressed genes. BMC Bioinformatics. 2014 Mar 21;15:79. doi: 10.1186/1471-2105-15-79. PMID: 24650281; PMCID: PMC4000056. 

[2] Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015). “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Research, 43(7), e47. doi: 10.1093/nar/gkv007. 

[3] Subramanian A, Narayan R, Corsello SM, Peck DD, Natoli TE, Lu X, Gould J, Davis JF, Tubelli AA, Asiedu JK, Lahr DL, Hirschman JE, Liu Z, Donahue M, Julian B, Khan M, Wadden D, Smith IC, Lam D, Liberzon A, Toder C, Bagul M, Orzechowski M, Enache OM, Piccioni F, Johnson SA, Lyons NJ, Berger AH, Shamji AF, Brooks AN, Vrcic A, Flynn C, Rosains J, Takeda DY, Hu R, Davison D, Lamb J, Ardlie K, Hogstrom L, Greenside P, Gray NS, Clemons PA, Silver S, Wu X, Zhao WN, Read-Button W, Wu X, Haggarty SJ, Ronco LV, Boehm JS, Schreiber SL, Doench JG, Bittker JA, Root DE, Wong B, Golub TR. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell. 2017 Nov 30;171(6):1437-1452.e17. doi: 10.1016/j.cell.2017.10.049. PMID: 29195078; PMCID: PMC5990023. 

[4] Keenan AB, Torre D, Lachmann A, Leong AK, Wojciechowicz ML, Utti V, Jagodnik KM, Kropiwnicki E, Wang Z, Ma'ayan A. ChEA3: transcription factor enrichment analysis by orthogonal omics integration. Nucleic Acids Res. 2019 Jul 2;47(W1):W212-W224. doi: 10.1093/nar/gkz446. PMID: 31114921; PMCID: PMC6602523.

[5] Lachmann A, Torre D, Keenan AB, Jagodnik KM, Lee HJ, Wang L, Silverstein MC, Ma’ayan A. Massive mining of publicly available RNA-seq data from human and mouse. Nature Communications 9. Article number: 1366 (2018), doi:10.1038/s41467-018-03751-6

[6] Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, Clark NR, Ma'ayan A.
Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics. 2013; 128(14).

[7] Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A, McDermott MG, Monteiro CD, Gundersen GW, Ma'ayan A.
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research. 2016; gkw377 .

[8] Xie Z, Bailey A, Kuleshov MV, Clarke DJB., Evangelista JE, Jenkins SL, Lachmann A, Wojciechowicz ML, Kropiwnicki E, Jagodnik KM, Jeon M, & Ma’ayan A.
Gene set knowledge discovery with Enrichr. Current Protocols, 1, e90. 2021. doi: 10.1002/cpz1.90 

[9] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18

[10] Clarke, D.J., Wang, L., Jones, A., Wojciechowicz, M.L., Torre, D., Jagodnik, K.M., Jenkins, S.L., McQuilton, P., Flamholz, Z., Silverstein, M.C. and Schilder, B.M., 2019. FAIRshake: Toolkit to evaluate the FAIRness of research digital resources. Cell systems, 9(5), pp.417-421.

[11] Zaveri, A., Dastgheib, S., Wu, C., Whetzel, T., Verborgh, R., Avillach, P., Korodi, G., Terryn, R., Jagodnik, K.M., Assis, P., & Dumontier, M. (2017). smartAPI: Towards a More Intelligent Network of Web APIs. ESWC.