# Obtain gene lists

We're doing this in a separate notebook to keep things clean.

For peroxisome genes, we are going to use:

    1. Human Protein Atlas (queried for Peroxisome)
    2. Peroxisome Database

For mitochondria genes, we are going to use:

    1. Human Protein Atlas

We exported a custom TSV from HPA to include some columns that will help us curate the gene lists further.

In [7]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

import pandas as pd

In [9]:
pex_table = pd.read_csv('../data/hpa/Peroxisome.tsv', delimiter='\t')
pex_table.shape

mt_table = pd.read_csv('../data/hpa/Mitochondria.tsv', delimiter='\t')
mt_table.shape

(189, 41)

(1978, 41)

# Curate

We have several columns that will help us do the following:

    1. Remove genes that are not detected in immune cells (38 Pex; 336 Mt)
    2. 

In [27]:
pex_table['RNA blood cell specificity'].value_counts()
mt_table['RNA blood cell specificity'].value_counts()

mt_table['RNA blood cell specificity'].value_counts()

RNA blood cell specificity
Low immune cell specificity     116
Not detected in immune cells     38
Immune cell enhanced             16
Immune cell enriched             10
Group enriched                    9
Name: count, dtype: int64

RNA blood cell specificity
Low immune cell specificity     1315
Not detected in immune cells     336
Immune cell enhanced             189
Immune cell enriched              74
Group enriched                    64
Name: count, dtype: int64

RNA blood cell specificity
Low immune cell specificity     1315
Not detected in immune cells     336
Immune cell enhanced             189
Immune cell enriched              74
Group enriched                    64
Name: count, dtype: int64

['PEX5',
 'PEX19',
 'PEX16',
 'PEX1',
 'PEX12',
 'PEX14',
 'PEX26',
 'PEX6',
 'PEX7',
 'PPARD',
 'PEX13',
 'PEX10',
 'PEX11A',
 'PEX3',
 'LONP2',
 'PEX2',
 'PPARA',
 'ABCD3',
 'ABCD4',
 'PEX11B',
 'PEX11G',
 'ECH1',
 'TMEM135',
 'ABCD1',
 'ABCD2',
 'ACOT8',
 'CROT',
 'DNM1L',
 'FAM120B',
 'HACL1',
 'MFF',
 'PRDX5',
 'ACBD5',
 'FAR1',
 'LACC1',
 'PLAAT3',
 'ACOX1',
 'GSTK1',
 'SCP2',
 'ASXL1',
 'ASXL2',
 'CAT',
 'ENSG00000258465',
 'FIS1',
 'IDE',
 'IDH1',
 'MAVS',
 'MED1',
 'PIK3C3',
 'PJVK',
 'PRMT2',
 'SZT2',
 'TRIM37',
 'ACAA1',
 'ACOX3',
 'AMACR',
 'DECR2',
 'PECR',
 'ACOT4',
 'CD33',
 'CRAT',
 'EPHX2',
 'HSDL2',
 'ISOC1',
 'MPV17L',
 'MUL1',
 'NUDT17',
 'PIPOX',
 'PTGR3',
 'SLC27A2',
 'TYSND1',
 'VWA8',
 'AGPS',
 'HSD17B4',
 'ECI2',
 'PXMP4',
 'SLC25A17',
 'ACAD11',
 'ACSL1',
 'ACSL3',
 'ACSL4',
 'ACSL6',
 'ACTN4',
 'ALDH3A2',
 'ALOX15',
 'ALOX15B',
 'ANGPTL4',
 'AOC1',
 'ATAD1',
 'ATM',
 'BABAM2',
 'CITED2',
 'DDO',
 'DEPP1',
 'DHRS4',
 'DHRS4L2',
 'DHRS7B',
 'DUT',
 'EHHADH',
 '

['IMMT',
 'POLRMT',
 'OPA1',
 'BCL2L1',
 'BAK1',
 'MFF',
 'PPIF',
 'TRMT10C',
 'BNIP3',
 'GFM2',
 'TFAM',
 'AIFM1',
 'FIS1',
 'SLC25A4',
 'SUPV3L1',
 'TIMM13',
 'UCP2',
 'NDUFS2',
 'BID',
 'DNM1L',
 'MRPL27',
 'MRPS27',
 'MICU1',
 'NDUFS1',
 'NDUFV1',
 'STMP1',
 'MRPL32',
 'PINK1',
 'TIMM21',
 'NDUFAB1',
 'PNPT1',
 'TIMM44',
 'TOMM40',
 'IMMP2L',
 'LONP1',
 'MRPS22',
 'TIMM10',
 'TIMM50',
 'TOMM20',
 'GFM1',
 'OXA1L',
 'SLC25A5',
 'MICU2',
 'MTG1',
 'CKMT2',
 'MT-ND1',
 'HADHB',
 'NDUFS3',
 'ATP5F1B',
 'MIEF1',
 'MRPL23',
 'MRPS16',
 'MT-ND4',
 'MUL1',
 'SLC25A14',
 'TFB2M',
 'TIMM29',
 'TIMM8A',
 'TIMM9',
 'MTERF4',
 'SLC25A19',
 'BCS1L',
 'MRPL4',
 'MRPL55',
 'MRPS12',
 'MRPS18C',
 'MRPS33',
 'MT-ND6',
 'MTRF1',
 'SDHC',
 'TIMM10B',
 'TIMM8B',
 'TOMM7',
 'ATP5F1A',
 'MPC1',
 'TOMM70',
 'COX4I1',
 'COX5B',
 'MRPL44',
 'UQCC2',
 'ATP5F1D',
 'LRPPRC',
 'MRPL57',
 'NDUFB8',
 'PMPCB',
 'ABCB8',
 'COX7A2L',
 'PAM16',
 'FASTKD2',
 'MAIP1',
 'MRPL42',
 'MRPL58',
 'MRPS28',
 'MTERF1',
 'NDUFA

Index(['Gene', 'Gene synonym', 'Gene description', 'Uniprot',
       'Biological process', 'Molecular function', 'Disease involvement',
       'Evidence', 'UniProt evidence', 'RNA tissue specificity',
       'RNA tissue distribution', 'RNA single cell type specificity',
       'RNA single cell type distribution',
       'RNA single cell type specificity score',
       'RNA single cell type specific nTPM', 'RNA blood cell specificity',
       'RNA blood cell distribution', 'RNA blood cell specificity score',
       'RNA blood cell specific nTPM', 'Subcellular location',
       'Blood expression cluster', 'Interactions',
       'Blood RNA - basophil [nTPM]', 'Blood RNA - classical monocyte [nTPM]',
       'Blood RNA - eosinophil [nTPM]', 'Blood RNA - gdT-cell [nTPM]',
       'Blood RNA - intermediate monocyte [nTPM]',
       'Blood RNA - MAIT T-cell [nTPM]', 'Blood RNA - memory B-cell [nTPM]',
       'Blood RNA - memory CD4 T-cell [nTPM]',
       'Blood RNA - memory CD8 T-cell [nTPM]', '

RNA single cell type distribution
Detected in all     92
Detected in many    68
Detected in some    29
Name: count, dtype: int64

RNA blood cell specificity
Low immune cell specificity     1315
Not detected in immune cells     336
Immune cell enhanced             189
Immune cell enriched              74
Group enriched                    64
Name: count, dtype: int64