# Features 

## Peptide
--- 

* visit_id - ID code for the visit.
* visit_month - The month of the visit, relative to the first visit by the patient.
* patient_id - An ID code for the patient.
* UniProt - The UniProt ID code for the associated protein. There are often several peptides per protein.
* Peptide - The sequence of amino acids included in the peptide. See this table for the relevant codes. Some rare annotations may not be included in the table. The test set may include peptides not found in the train set.
* PeptideAbundance - The frequency of the amino acid in the sample.

## Protien (Aggregated from Peptide Level)
---

* visit_id - ID code for the visit.
* visit_month - The month of the visit, relative to the first visit by the patient.
* patient_id - An ID code for the patient.
* UniProt - The UniProt ID code for the associated protein. There are often several peptides per protein. The test set may include proteins not found in the train set.
* NPX - Normalized protein expression. The frequency of the protein's occurrence in the sample. May not have a 1:1 8 relationship with the component peptides as some proteins contain repeated copies of a given peptide.


In [1]:
# Import Dependencies 

import pandas as pd
import numpy as np 

In [2]:
! ls

Data Exploration.ipynb            supplemental_clinical_data.csv
README.md                         train_clinical_data.csv
[34mamp_pd_peptide[m[m                    train_peptides.csv
[34mexample_test_files[m[m                train_proteins.csv
public_timeseries_testing_util.py


In [4]:
peptide = pd.read_csv('train_peptides.csv')

peptide.head()

Unnamed: 0,visit_id,visit_month,patient_id,UniProt,Peptide,PeptideAbundance
0,55_0,0,55,O00391,NEQEQPLGQWHLS,11254.3
1,55_0,0,55,O00533,GNPEPTFSWTK,102060.0
2,55_0,0,55,O00533,IEIPSSVQQVPTIIK,174185.0
3,55_0,0,55,O00533,KPQSAVYSTGSNGILLC(UniMod_4)EAEGEPQPTIK,27278.9
4,55_0,0,55,O00533,SMEQNGPGLEYR,30838.7


In [11]:
pd.DataFrame([[0, 1, 0]], columns = ['Amino_Acid_1','Amino_Acid_2', 'Amino_Acid_3' ])

Unnamed: 0,Amino_Acid_1,Amino_Acid_2,Amino_Acid_3
0,0,1,0


In [6]:
protein = pd.read_csv('train_proteins.csv')
protein.head()

Unnamed: 0,visit_id,visit_month,patient_id,UniProt,NPX
0,55_0,0,55,O00391,11254.3
1,55_0,0,55,O00533,732430.0
2,55_0,0,55,O00584,39585.8
3,55_0,0,55,O14498,41526.9
4,55_0,0,55,O14773,31238.0


## Patient Clinical Results 

* visit_id - ID code for the visit.
* visit_month - The month of the visit, relative to the first visit by the patient.
* patient_id - An ID code for the patient.
* updrs_[1-4] - The patient's score for part N of the Unified Parkinson's Disease Rating Scale. Higher numbers indicate * more severe symptoms. Each sub-section covers a distinct category of symptoms, such as mood and behavior for Part 1 and motor functions for Part 3.
* upd23b_clinical_state_on_medication - Whether or not the patient was taking medication such as Levodopa during the * UPDRS assessment. Expected to mainly affect the scores for Part 3 (motor function). These medications wear off fairly * quickly (on the order of one day) so it's common for patients to take the motor function exam twice in a single month, both with and without medication.

In [13]:
Y = pd.read_csv('train_clinical_data.csv')

Y.head(50)

Unnamed: 0,visit_id,patient_id,visit_month,updrs_1,updrs_2,updrs_3,updrs_4,upd23b_clinical_state_on_medication
0,55_0,55,0,10.0,6.0,15.0,,
1,55_3,55,3,10.0,7.0,25.0,,
2,55_6,55,6,8.0,10.0,34.0,,
3,55_9,55,9,8.0,9.0,30.0,0.0,On
4,55_12,55,12,10.0,10.0,41.0,0.0,On
5,55_18,55,18,7.0,13.0,38.0,0.0,On
6,55_24,55,24,16.0,9.0,49.0,0.0,On
7,55_30,55,30,14.0,13.0,49.0,0.0,On
8,55_36,55,36,17.0,18.0,51.0,0.0,On
9,55_42,55,42,12.0,20.0,41.0,0.0,On


In [12]:
Y.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2615 entries, 0 to 2614
Data columns (total 8 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   visit_id                             2615 non-null   object 
 1   patient_id                           2615 non-null   int64  
 2   visit_month                          2615 non-null   int64  
 3   updrs_1                              2614 non-null   float64
 4   updrs_2                              2613 non-null   float64
 5   updrs_3                              2590 non-null   float64
 6   updrs_4                              1577 non-null   float64
 7   upd23b_clinical_state_on_medication  1288 non-null   object 
dtypes: float64(4), int64(2), object(2)
memory usage: 163.6+ KB
