# Lecture 11 - Proteomics

Practical exercise prepared by *Animesh Sharma*.

In this exercise you will analyse a proteomics dataset from a recent publication: *"HDACi mediate UNG2 depletion, dysregulated genomic uracil and altered expression of oncoproteins and tumor suppressors in B- and T-cell lines"* [(Iveland et al, 2020)](https://translational-medicine.biomedcentral.com/articles/10.1186/s12967-020-02318-8).

_________________

## Instructions (Part 1):

This part of the analysis will process the raw proteomics data. Since it takes several hours to run, we will skip directly to **Part 2**, but you are welcome to try this part at home 🙂

 
### Step 1:

- Download and install [MaxQuant](https://maxquant.org/)

> Unfortunately, parameter creation officially works only on Windows. If you have a mac or linux machine, please team up with a colleague to create one. Once the parameters are created you can follow the https://www.maxquant.org/static/main/pdf/Linux_MQ_Installation.pdf instructions. 

### Step 2:

- Go to the [project page on PRIDE](https://www.ebi.ac.uk/pride/archive/projects/PXD008293)
- Download the following files:
  - [150820_JURKAT_SAHA_TECH4](http://ftp.ebi.ac.uk/pride-archive/2020/04/PXD008293/150820_JURKAT_SAHA_TECH4.raw)
  - [151124_JURKAT_SAHA_SILAC_Biol2_tech1](http://ftp.ebi.ac.uk/pride-archive/2020/04/PXD008293/151124_JURKAT_SAHA_SILAC_Biol2_tech1.raw )
  - [151124_JURKAT_SAHA_SILAC_Biol3_tech1](http://ftp.ebi.ac.uk/pride-archive/2020/04/PXD008293/151124_JURKAT_SAHA_SILAC_Biol3_tech1.raw)
- Download the [human proteome](https://www.uniprot.org/proteomes/UP000005640) from Uniprot
    - Open the link above and then select download (select option *"include isoforms"*)
    - Alternatively, download from [this link](https://rest.uniprot.org/uniprotkb/stream?format=fasta&includeIsoform=true&query=%28%28proteome%3AUP000005640%29%29)

### Step 3:

- Follow [this tutorial](https://fuzzylife.substack.com/p/reproducing-proteomics-results) 😇


-------------

## Part 2

We will use the pre-processed dataset that was generated in **Part 1**.

Here is a quick look at the file:

In [3]:
import pandas as pd

df = pd.read_csv('data/proteinGroups.txt', sep='\t')

df.sample(10) # show 10 random entries

Unnamed: 0,Protein IDs,Majority protein IDs,Peptide counts (all),Peptide counts (razor+unique),Peptide counts (unique),Protein names,Gene names,Fasta headers,Number of proteins,Peptides,...,Peptide IDs,Peptide is razor,Mod. peptide IDs,Evidence IDs,MS/MS IDs,Best MS/MS,Oxidation (M) site IDs,Oxidation (M) site positions,Taxonomy IDs,Taxonomy names
3238,Q9BW72,Q9BW72,1,1,1,"HIG1 domain family member 2A, mitochondrial",HIGD2A,sp|Q9BW72|HIG2A_HUMAN HIG1 domain family membe...,1,1,...,2528,True,2539,6268;6269,10278;10279,10279,,,-1,
3004,Q96MW5,Q96MW5,2,2,2,Conserved oligomeric Golgi complex subunit 8,COG8,sp|Q96MW5|COG8_HUMAN Conserved oligomeric Golg...,1,2,...,15091;27488,True;True,15133;27568,38167;68346,62309;111346,62309;111346,,,-1,
2971,Q96IJ6,Q96IJ6,5,5,5,Mannose-1-phosphate guanyltransferase alpha,GMPPA,sp|Q96IJ6|GMPPA_HUMAN Mannose-1-phosphate guan...,1,5,...,12843;16504;20563;26765;27053,True;True;True;True;True,12882;16548;20625;26840;27129,32228;32229;42047;51556;51557;66315;67023;6702...,52874;52875;52876;68628;83904;83905;108027;109...,52874;68628;83904;108027;109172,,,-1,
3280,Q9BYJ9,Q9BYJ9,3,2,2,YTH domain-containing family protein 1,YTHDF1,sp|Q9BYJ9|YTHD1_HUMAN YTH domain-containing fa...,1,3,...,2018;11155;24200,True;False;True,2027;11189;24269,5001;5002;5003;27849;59850,8183;8184;8185;45447;96772,8184;45447;96772,,,-1,
1866,Q13409,Q13409,6,6,6,Cytoplasmic dynein 1 intermediate chain 2,DYNC1I2,sp|Q13409|DC1I2_HUMAN Cytoplasmic dynein 1 int...,1,6,...,4060;5281;11560;14840;20392;24735,True;True;True;True;True;True,4075;5301;11596;14882;20454;24806,9956;9957;9958;12988;12989;28954;28955;37591;3...,16299;16300;16301;16302;21177;21178;21179;2118...,16299;21178;47380;61449;83249;98841,,,-1,
890,P20338,P20338,1,1,1,Ras-related protein Rab-4A,RAB4A,sp|P20338|RAB4A_HUMAN Ras-related protein Rab-...,1,1,...,18897,True,18950,47647,77469;77470,77470,,,-1,
2931,Q96G01,Q96G01,1,1,1,Protein bicaudal D homolog 1,BICD1,sp|Q96G01|BICD1_HUMAN Protein bicaudal D homol...,1,1,...,2802,True,2814,7002,11501,11501,,,-1,
695,P08758,P08758,11,11,11,Annexin A5,ANXA5,sp|P08758|ANXA5_HUMAN Annexin A5 OS=Homo sapie...,1,11,...,1611;4180;7025;8003;9873;10552;18365;19781;230...,True;True;True;True;True;True;True;True;True;T...,1617;4195;7050;8031;9905;10585;18411;19840;230...,3987;10252;10253;10254;10255;17159;17160;17161...,6483;16743;16744;16745;16746;16747;27743;27744...,6483;16744;27749;31559;39739;42213;75623;80773...,,,-1,
1173,P41252,P41252,45,45,45,"Isoleucine--tRNA ligase, cytoplasmic",IARS1,"sp|P41252|SYIC_HUMAN Isoleucine--tRNA ligase, ...",1,45,...,2041;2740;3102;4697;4779;4903;5458;5567;5995;6...,True;True;True;True;True;True;True;True;True;T...,2050;2752;3114;4715;4797;4921;5478;5588;6018;6...,5066;5067;5068;5069;5070;5071;6850;6851;6852;7...,8310;8311;8312;8313;8314;8315;8316;8317;8318;1...,8314;11270;12623;18920;19224;19687;21873;22225...,,,-1,
4103,Q9Y6K9,Q9Y6K9,2,2,2,NF-kappa-B essential modulator,IKBKG,sp|Q9Y6K9|NEMO_HUMAN NF-kappa-B essential modu...,1,2,...,2232;5212,True;True,2242;5230,5544;5545;5546;12800,9054;9055;9056;9057;20911,9056;20911,,,-1,


The rest of the tutorial will be guided by the lecturer 😉