# Features 

## Peptide
--- 

* visit_id - ID code for the visit.
* visit_month - The month of the visit, relative to the first visit by the patient.
* patient_id - An ID code for the patient.
* UniProt - The UniProt ID code for the associated protein. There are often several peptides per protein.
* Peptide - The sequence of amino acids included in the peptide. See this table for the relevant codes. Some rare annotations may not be included in the table. The test set may include peptides not found in the train set.
* PeptideAbundance - The frequency of the amino acid in the sample.

## Protien (Aggregated from Peptide Level)
---

* visit_id - ID code for the visit.
* visit_month - The month of the visit, relative to the first visit by the patient.
* patient_id - An ID code for the patient.
* UniProt - The UniProt ID code for the associated protein. There are often several peptides per protein. The test set may include proteins not found in the train set.
* NPX - Normalized protein expression. The frequency of the protein's occurrence in the sample. May not have a 1:1 8 relationship with the component peptides as some proteins contain repeated copies of a given peptide.

## TimeSeries Clinical Testing Results (Train/Test)
---
* visit_id - ID code for the visit.
* visit_month - The month of the visit, relative to the first visit by the patient.
* patient_id - An ID code for the patient.
* updrs_[1-4] - The patient's score for part N of the Unified Parkinson's Disease Rating Scale. Higher numbers indicate * more severe symptoms. Each sub-section covers a distinct category of symptoms, such as mood and behavior for Part 1 and motor functions for Part 3.
* upd23b_clinical_state_on_medication - Whether or not the patient was taking medication such as Levodopa during the * UPDRS assessment. Expected to mainly affect the scores for Part 3 (motor function). These medications wear off fairly * quickly (on the order of one day) so it's common for patients to take the motor function exam twice in a single month, both with and without medication.

In [1]:
# Import Dependencies 
import pandas as pd
import numpy as np 

In [28]:
peptide = pd.read_csv('train_peptides.csv')

peptide.head(3)

Unnamed: 0,visit_id,visit_month,patient_id,UniProt,Peptide,PeptideAbundance
0,55_0,0,55,O00391,NEQEQPLGQWHLS,11254.3
1,55_0,0,55,O00533,GNPEPTFSWTK,102060.0
2,55_0,0,55,O00533,IEIPSSVQQVPTIIK,174185.0


## Transforming Features

In [33]:
transform = peptide.pivot(index = peptide.columns[:3].tolist(), columns = 'Peptide', values = 'PeptideAbundance').reset_index().head()

In [37]:
transform

Peptide,visit_id,visit_month,patient_id,AADDTWEPFASGK,AAFGQGSGPIMLDEVQC(UniMod_4)TGTEASLADC(UniMod_4)K,AAFTEC(UniMod_4)C(UniMod_4)QAADK,AANEVSSADVK,AATGEC(UniMod_4)TATVGKR,AATVGSLAGQPLQER,AAVYHHFISDGVR,...,YSLTYIYTGLSK,YTTEIIK,YVGGQEHFAHLLILR,YVM(UniMod_35)LPVADQDQC(UniMod_4)IR,YVMLPVADQDQC(UniMod_4)IR,YVNKEIQNAVNGVK,YWGVASFLQK,YYC(UniMod_4)FQGNQFLR,YYTYLIMNK,YYWGGQYTWDMAK
0,10053_0,0,10053,6580710.0,31204.4,7735070.0,0.0,0.0,0.0,46620.3,...,202274.0,0.0,4401830.0,77482.6,583075.0,76705.7,104260.0,530223.0,0.0,7207.3
1,10053_12,12,10053,6333510.0,52277.6,5394390.0,0.0,0.0,0.0,57554.5,...,201009.0,0.0,5001750.0,36745.3,355643.0,92078.1,123254.0,453883.0,49281.9,25332.8
2,10053_18,18,10053,7129640.0,61522.0,7011920.0,35984.7,17188.0,19787.3,36029.4,...,220728.0,0.0,5424380.0,39016.0,496021.0,63203.6,128336.0,447505.0,52389.1,21235.7
3,10138_12,12,10138,7404780.0,46107.2,10610900.0,0.0,20910.2,66662.3,55253.9,...,188362.0,9433.71,3900280.0,48210.3,328482.0,89822.1,129964.0,552232.0,65657.8,9876.98
4,10138_24,24,10138,13788300.0,56910.3,6906160.0,13785.5,11004.2,63672.7,36819.8,...,206187.0,6365.15,3521800.0,69984.6,496737.0,80919.3,111799.0,0.0,56977.6,4903.09


In [34]:
transform.fillna(0, inplace = True)

In [36]:
transform[transform.duplicated(subset = 'visit_id', keep = False)]

Peptide,visit_id,visit_month,patient_id,AADDTWEPFASGK,AAFGQGSGPIMLDEVQC(UniMod_4)TGTEASLADC(UniMod_4)K,AAFTEC(UniMod_4)C(UniMod_4)QAADK,AANEVSSADVK,AATGEC(UniMod_4)TATVGKR,AATVGSLAGQPLQER,AAVYHHFISDGVR,...,YSLTYIYTGLSK,YTTEIIK,YVGGQEHFAHLLILR,YVM(UniMod_35)LPVADQDQC(UniMod_4)IR,YVMLPVADQDQC(UniMod_4)IR,YVNKEIQNAVNGVK,YWGVASFLQK,YYC(UniMod_4)FQGNQFLR,YYTYLIMNK,YYWGGQYTWDMAK


In [11]:
pd.DataFrame([[0, 1, 0]], columns = ['Amino_Acid_1','Amino_Acid_2', 'Amino_Acid_3' ])

Unnamed: 0,Amino_Acid_1,Amino_Acid_2,Amino_Acid_3
0,0,1,0


In [6]:
protein = pd.read_csv('train_proteins.csv')
protein.head()

Unnamed: 0,visit_id,visit_month,patient_id,UniProt,NPX
0,55_0,0,55,O00391,11254.3
1,55_0,0,55,O00533,732430.0
2,55_0,0,55,O00584,39585.8
3,55_0,0,55,O14498,41526.9
4,55_0,0,55,O14773,31238.0


In [25]:
clinical_Results_train = pd.read_csv('train_clinical_data.csv')

test = clinical_Results_train[clinical_Results_train['patient_id'] == 55]

In [26]:
test.shift(1).head()

Unnamed: 0,visit_id,patient_id,visit_month,updrs_1,updrs_2,updrs_3,updrs_4,upd23b_clinical_state_on_medication
0,,,,,,,,
1,55_0,55.0,0.0,10.0,6.0,15.0,,
2,55_3,55.0,3.0,10.0,7.0,25.0,,
3,55_6,55.0,6.0,8.0,10.0,34.0,,
4,55_9,55.0,9.0,8.0,9.0,30.0,0.0,On


In [21]:
test.head(10)

Unnamed: 0,visit_id,patient_id,visit_month,updrs_1,updrs_2,updrs_3,updrs_4,upd23b_clinical_state_on_medication
0,55_0,55,0,10.0,6.0,15.0,,
1,55_3,55,3,10.0,7.0,25.0,,
2,55_6,55,6,8.0,10.0,34.0,,
3,55_9,55,9,8.0,9.0,30.0,0.0,On
4,55_12,55,12,10.0,10.0,41.0,0.0,On
5,55_18,55,18,7.0,13.0,38.0,0.0,On
6,55_24,55,24,16.0,9.0,49.0,0.0,On
7,55_30,55,30,14.0,13.0,49.0,0.0,On
8,55_36,55,36,17.0,18.0,51.0,0.0,On
9,55_42,55,42,12.0,20.0,41.0,0.0,On
