# IL6 analysis for a particular sample

An example demonstrating:

With an sample ID (for example, INCOV002-BL, from a patient in the ICU), calculate a mean value on the feature (IL6) value, which can be extracted from the transcriptomics feature data value file.

To do:
1. find the samples from the patients in the ICU and the patients not in the ICU
2. perform a differential expression analysis statistical difference for example t-test
3. Find the genes and proteins in WikiPathways that are in this pathway. How to get genes and proteins in the IL6 pathway?
(sparql endpoint, check wikipathway endpoint to identify genes interested , IL6 engaged, output a list of genes)

4. find the right features for these genes and proteins in transcriptomics and the proteomics dataset
We use IL6 as an example, do we need other features?


Note: the conda environment py36isa can be created as described here: https://gist.github.com/niehues/2e968e222489692f281d8361045766b0

In [1]:
# import functions from functions.py
import sys
sys.path.append(".")
from functions import *

## Read transcriptomics data file

In [2]:
transcriptomics_feature_data = "../../data/Su_2020_FAIR/transcriptomics/transcriptomics_Su_2020_feature-data.csv"
df = read_omics_data(transcriptomics_feature_data)
print(df.shape)
df.iloc[:4,:6]

(18318, 143)


Unnamed: 0_level_0,INCOV100-BL,INCOV101-BL,INCOV010-BL,INCOV102-BL,INCOV103-BL,INCOV104-BL
feature.name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A1BG,451.835,883.444,4707.21,491.696,359.86,696.239
A1BG-AS1,33.4642,112.852,597.616,73.7626,22.4533,151.272
A2M,5.53521,0.0,28.607,0.0,6.58922,0.0
A2M-AS1,11.2035,74.8691,480.155,23.8664,6.21163,31.9432


## Read transcriptomics feature metadata file

In [3]:
transcriptomics_feature_metadata_path = "../../data/Su_2020_FAIR/transcriptomics/transcriptomics_Su_2020_feature-metadata.csv"
transcriptomics_feature_metadata = read_omics_data(transcriptomics_feature_metadata_path)
print(transcriptomics_feature_metadata.shape)
transcriptomics_feature_metadata

(18318, 3)


Unnamed: 0_level_0,database.ID,database,database.version
feature.name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A1BG,NM_130786,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213
A1BG-AS1,NR_015380,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213
A2M,NM_000014,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213
A2M-AS1,NR_026971,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213
A2ML1,NM_001282424,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213
...,...,...,...
ZXDC,NM_001040653,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213
ZYG11A,NM_001004339,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213
ZYG11B,NM_024646,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213
ZYX,NM_001010972_2,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213


## Create column with feature PID's

In [4]:
create_feature_PID(transcriptomics_feature_metadata)

Unnamed: 0_level_0,database.ID,database,database.version,PID
feature.name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A1BG,NM_130786,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213,https://www.ncbi.nlm.nih.gov/nuccore/NM_130786
A1BG-AS1,NR_015380,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213,https://www.ncbi.nlm.nih.gov/nuccore/NR_015380
A2M,NM_000014,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213,https://www.ncbi.nlm.nih.gov/nuccore/NM_000014
A2M-AS1,NR_026971,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213,https://www.ncbi.nlm.nih.gov/nuccore/NR_026971
A2ML1,NM_001282424,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213,https://www.ncbi.nlm.nih.gov/nuccore/NM_001282424
...,...,...,...,...
ZXDC,NM_001040653,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213,https://www.ncbi.nlm.nih.gov/nuccore/NM_001040653
ZYG11A,NM_001004339,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213,https://www.ncbi.nlm.nih.gov/nuccore/NM_001004339
ZYG11B,NM_024646,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213,https://www.ncbi.nlm.nih.gov/nuccore/NM_024646
ZYX,NM_001010972_2,https://www.ncbi.nlm.nih.gov/nuccore/,RefSeq Release 213,https://www.ncbi.nlm.nih.gov/nuccore/NM_001010...


## Subset omics data on features and/or sample IDs

In [5]:
features_list = ['IL6', 'IL6R', 'IL6R-AS1','IL6ST']
sample_list = ['INCOV002-BL']

df_subset = subset_omics_data(df, feature_list=features_list, sample_list=sample_list)
df_subset

Unnamed: 0_level_0,INCOV002-BL
feature.name,Unnamed: 1_level_1
IL6,136.53
IL6R,5428.38
IL6R-AS1,139.079
IL6ST,5535.38


## Calculate mean for every feature

In [10]:
calculate_means(df_subset)

{'IL6': 136.53, 'IL6R': 5428.38, 'IL6R-AS1': 139.079, 'IL6ST': 5535.38}