In [38]:
import numpy as np
import pandas as pd
from os import listdir
import shutil


Aging is the time dependant weaking of the body. For example, aging in the pancreous can predispose people to diabetes (https://www.imrpress.com/journal/FBE/10/1/10.2741/e813). I will attempt to find theraputic targets that could be used to target aging in the pancreatus using scRNA-Seq data from  https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE81547.

This includes finding genes and pathways whose transcription is most robustly associated with aging.

# Data Preprocessing

This is the description of the data:

	As organisms age, cells accumulate genetic and epigenetic changes that eventually lead to impaired organ function or catastrophic failure such as cancer. Here we describe a single-cell transcriptome analysis of 2544 human pancreas cells from donors, spanning six decades of life. We find that islet cells from older donors have increased levels of disorder as measured both by noise in the transcriptome and by the number of cells which display inappropriate hormone expression, revealing a transcriptional instability associated with aging. By analyzing the spectrum of somatic mutations in single cells from previously-healthy donors, we find a specific age-dependent mutational signature characterized by C to A and C to G transversions, indicators of oxidative stress, which is absent in single cells from human brain tissue or in a tumor cell line. Cells carrying a high load of such mutations also express higher levels of stress and senescence markers, including FOS, JUN, and the cytoplasmic superoxide dismutase SOD1, markers previously linked to pancreatic diseases with substantial age-dependent risk, such as type 2 diabetes mellitus and adenocarcinoma. Thus, our single-cell approach unveils gene expression changes and somatic mutations acquired in aging human tissue, and identifies molecular pathways induced by these genetic changes that could influence human disease. Also, our results demonstrate the feasibility of using single-cell RNA-seq data from primary cells to derive meaningful insights into the genetic processes that operate on aging human tissue and to determine which molecular mechanisms are coordinated with these processes.

This dataset contains scRNA-seq data on the pancreas of humans of various ages.

In [39]:
path = 'GSE81547'
datasets = sorted([path+'/'+i for i in listdir(path)])

pd.read_csv(datasets[0],delimiter='\t',header=None)

Unnamed: 0,0,1
0,1/2-SBSRNA4,0
1,A1BG,0
2,A1BG-AS1,0
3,A1CF,0
4,A2LD1,0
...,...,...
23460,no_feature,407558
23461,ambiguous,3754
23462,too_low_aQual,0
23463,not_aligned,0


The files themselves don't contain any information on the samples. Clumn 0 contains the gene and column 1 contains the Transcript counts. The last 5 rows contains data on the RNA-Seq preprocessing. First lets use the metadata file to get the ages and sexes of each sample.

In [40]:
with open('GSE81547_series_matrix.txt') as f:
    for line in f:
        if line.startswith('!Series_sample_id'):
            identifier = line[len('!Series_sample_id\t')+1:-3].split(' ')
        if line.startswith('!Sample_title'):
            agesex = [(int(i[1:i.find('yr')]),i[i.find('_')+1]) for i in line[len('!Sample_title\t'):].split('\t')]
identifier = { j : agesex[i] for i,j in enumerate(identifier)}
sampleinfo = lambda filename: identifier[filename[filename.find('/')+1:filename.find('_')]]

Now we have a function that takes the file name and returns the metadata

[(21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21, 'm'),
 (21