Processing done for integration of BioXpress data into OncoMX and Glygen.
The final output from BioXpress v-5.0 is available on the OncoMX-tst server at the path: /software/pipeline/integrator/downloads/bioxpress/v-5.0/`
For OncoMX, the de_per_tissue.csv is used to report gene expression per tissue, however data.oncomx.org hosts both per tissue and per study datasets. The files are processed with the recipe pipeline. The recipes filter for all genes that are successfully mapped to uniprotkb accession IDs.
Recipes
human_cancer_mRNA_expression_per_study.json
human_cancer_mRNA_expression_per_tissue.json
The output is available on the OncoMX-tst server at the path: /software/pipeline/integrator/unreviewed
Final output files
human_cancer_mRNA_expression_per_study.csv
human_cancer_mRNA_expression_per_tissue.csv
The final output from BioXpress v-5.0 was modified to align with the previous input for cancer gene expression, and now includes the following columns:
- pmid
- sample_name
- same as DOID and name
- parent_doid
- same as DOID
- All DOIDs in v-5.0 are parent terms
- parent_doname
- same as DOID and name
- All DOIDs in v-5.0 are parent terms
- sample_id
- Taken from previous version, unclear on the origin of these numbers
The following mapping for the column sample_id was recovered from the previous version and mapped to DOIDs present in v-5.0
sample_name | sample_id |
---|---|
DOID:10283 / Prostate cancer [PCa] | 42 |
DOID:10534 / Stomach cancer [Stoca] | 19 |
DOID:11054 / Urinary bladder cancer [UBC] | 34 |
DOID:11934 / Head and neck cancer [H&NC] | 46 |
DOID:1612 / Breast cancer [BRCA] | 70 |
DOID:1781 / Thyroid cancer [Thyca] | 16 |
DOID:234 / Colon adenocarcinoma | 3 |
DOID:263 / Kidney cancer [Kidca] & Kidney renal cl ... | 61 |
DOID:3571 / Liver cancer [Livca] | 60 |
DOID:3907 / Lung squamous cell carcinoma | 33 |
DOID:3910 / Lung adenocarcinoma | 53 |
DOID:4465 / Papillary renal cell carcinoma | 57 |
DOID:4471 / Chromophobe adenocarcinoma | 23 |
DOID:5041 / Esophageal cancer [EC] | 32 |
The processed file for Glygen is available on the glygen-vm-dev server at /software/pipeline/integrator/downloads/bioxpress/August_2021/human_cancer_mRNA_expression_per_tissue_glygen.csv