# Import phospho-data from Ochoa et al.

This notebook reads the Supplementary Table 3 of Ochoa et al. (2021) and formats it for analysis with StructureMap.

https://www.nature.com/articles/s41587-019-0344-3

## Import libraries

In [1]:
import pandas as pd
import numpy as np
import re

## Phospho data

Here, we take all phosphosites with functional scores annotated in Supplementary table 3.

In [2]:
functional_phospho = pd.read_csv('../data/unformatted_ptm_data/41587_2019_344_MOESM5_ESM.tsv', sep='\t')
functional_phospho = functional_phospho.rename(columns={"uniprot": "protein_id"})
functional_phospho.functional_score = [float(re.sub(',','.',str(i))) for i in functional_phospho.functional_score.values]
functional_phospho[0:3]

Unnamed: 0,protein_id,position,functional_score
0,A0A075B6Q4,24,0.149257
1,A0A075B6Q4,68,0.119811
2,A0A075B6T3,24,0.477724


In [3]:
functional_phospho['p_functional_0'] = [1 if i > 0 else 0 for i in functional_phospho.functional_score.values]
functional_phospho['p_functional_5'] = [1 if i >= 0.5 else 0 for i in functional_phospho.functional_score.values]
functional_phospho['p_functional_6'] = [1 if i >= 0.6 else 0 for i in functional_phospho.functional_score.values]
functional_phospho['p_functional_7'] = [1 if i >= 0.7 else 0 for i in functional_phospho.functional_score.values]
functional_phospho['p_functional_8'] = [1 if i >= 0.8 else 0 for i in functional_phospho.functional_score.values]
functional_phospho['p_functional_9'] = [1 if i >= 0.9 else 0 for i in functional_phospho.functional_score.values]
functional_phospho[0:3]

Unnamed: 0,protein_id,position,functional_score,p_functional_0,p_functional_5,p_functional_6,p_functional_7,p_functional_8,p_functional_9
0,A0A075B6Q4,24,0.149257,1,0,0,0,0,0
1,A0A075B6Q4,68,0.119811,1,0,0,0,0,0
2,A0A075B6T3,24,0.477724,1,0,0,0,0,0


In [4]:
functional_phospho.to_csv('../data/ptm_data/p_ochoa.csv', index=False)

It's important to keep in mind that the sites from Ochoa et al. were not remapped to the most recent protein fasta. Some phosphosites might therefore not match correctly. However, this is expected to only affect a minority of proteins and phosphosites. 