# ICR-Identifying Age Related Conditions

https://www.kaggle.com/competitions/icr-identify-age-related-conditions/overview

### Files and Field Descriptions

- **train.csv** - The training set.<br>
>- **Id** Unique identifier for each observation. <br>
>- **AB**-**GL** Fifty-six anonymized health characteristics. All are *numeric* except for **EJ**, which is *categorical*. <br>
>- **Class** A binary target: 1 indicates the subject has been diagnosed with one of the three conditions, 0 indicates they have not.
- **test.csv** - The test set. Your goal is to predict the probability that a subject in this set belongs to each of the two classes.<br><br>
- **greeks.csv** - Supplemental metadata, only available for the training set.
>- **Alpha** Identifies the type of age-related condition, if present.<br>
>-- **A** No age-related condition. Corresponds to class **0**.<br>
>-- **B**, **D**, **G** The three age-related conditions. Correspond to class **1**.<br>
>- **Beta**, **Gamma**, **Delta** Three experimental characteristics.<br>
>- **Epsilon** The date the data for this subject was collected. Note that all of the data in the test set was collected after the training set was collected.
- **sample_submission.csv** - A sample submission file in the correct format.<br> 


### Implementation

#### Install PyCaret

In [3]:
pip install pycaret

Note: you may need to restart the kernel to use updated packages.


In [4]:
from pycaret.utils import version
version()

'3.0.0.rc8'

#### Install Libraries

In [5]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style="darkgrid")
import pandas as pd
plt.rcParams['figure.figsize'] = (7,5)

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

In [6]:
print("Pandas version: ", pd.__version__)
print("Seaborn version: ", sns.__version__)

Pandas version:  1.4.4
Seaborn version:  0.11.2


#### Read Data

In [7]:
df = pd.read_csv('./train.csv')
df.head(25)

Unnamed: 0,Id,AB,AF,AH,AM,AR,AX,AY,AZ,BC,...,FL,FR,FS,GB,GE,GF,GH,GI,GL,Class
0,000ff2bfdfe9,0.209377,3109.03329,85.200147,22.394407,8.138688,0.699861,0.025578,9.812214,5.555634,...,7.298162,1.73855,0.094822,11.339138,72.611063,2003.810319,22.136229,69.834944,0.120343,1
1,007255e47698,0.145282,978.76416,85.200147,36.968889,8.138688,3.63219,0.025578,13.51779,1.2299,...,0.173229,0.49706,0.568932,9.292698,72.611063,27981.56275,29.13543,32.131996,21.978,0
2,013f2bd269f5,0.47003,2635.10654,85.200147,32.360553,8.138688,6.73284,0.025578,12.82457,1.2299,...,7.70956,0.97556,1.198821,37.077772,88.609437,13676.95781,28.022851,35.192676,0.196941,0
3,043ac50845d5,0.252107,3819.65177,120.201618,77.112203,8.138688,3.685344,0.025578,11.053708,1.2299,...,6.122162,0.49706,0.284466,18.529584,82.416803,2094.262452,39.948656,90.493248,0.155829,0
4,044fb8a146ec,0.380297,3733.04844,85.200147,14.103738,8.138688,3.942255,0.05481,3.396778,102.15198,...,8.153058,48.50134,0.121914,16.408728,146.109943,8524.370502,45.381316,36.262628,0.096614,1
5,04517a3c90bd,0.209377,2615.8143,85.200147,8.541526,8.138688,4.013127,0.025578,12.547282,1.2299,...,0.173229,0.49706,1.164956,21.915512,72.611063,24177.59555,28.525186,82.527764,21.978,0
6,049232ca8356,0.348249,1733.65412,85.200147,8.377385,15.31248,1.913544,0.025578,6.547778,1.2299,...,4.408484,0.8613,0.467337,17.878444,192.453107,3332.467494,34.166222,100.086808,0.065096,0
7,057287f2da6d,0.269199,966.45483,85.200147,21.174189,8.138688,4.987617,0.025578,9.408886,1.2299,...,6.591896,0.49706,0.277693,18.445866,109.693986,21371.75985,35.208102,31.424696,0.092873,0
8,0594b00fb30a,0.346113,3238.43674,85.200147,28.888816,8.138688,4.021986,0.025578,8.243016,3.626448,...,4.762291,1.18262,0.06773,17.245908,147.21861,4589.611956,29.771721,54.675576,0.073416,0
9,05f2bc0155cd,0.324748,5188.68207,85.200147,12.968687,8.138688,4.593392,0.025578,10.685041,1.2299,...,0.173229,1.57151,0.318331,24.515421,98.929757,5563.130949,21.994831,33.30097,21.978,0


Read column **EJ**, the only categorical column.

In [11]:
df_ej = pd.read_csv("train.csv", usecols = ['EJ'])
df_ej.head(25)

Unnamed: 0,EJ
0,B
1,A
2,B
3,B
4,B
5,A
6,B
7,B
8,B
9,A


In [12]:
len(df.index)

617

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 617 entries, 0 to 616
Data columns (total 58 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Id      617 non-null    object 
 1   AB      617 non-null    float64
 2   AF      617 non-null    float64
 3   AH      617 non-null    float64
 4   AM      617 non-null    float64
 5   AR      617 non-null    float64
 6   AX      617 non-null    float64
 7   AY      617 non-null    float64
 8   AZ      617 non-null    float64
 9   BC      617 non-null    float64
 10  BD      617 non-null    float64
 11  BN      617 non-null    float64
 12  BP      617 non-null    float64
 13  BQ      557 non-null    float64
 14  BR      617 non-null    float64
 15  BZ      617 non-null    float64
 16  CB      615 non-null    float64
 17  CC      614 non-null    float64
 18  CD      617 non-null    float64
 19  CF      617 non-null    float64
 20  CH      617 non-null    float64
 21  CL      617 non-null    float64
 22  CR

In [14]:
print("Number of duplicated rows is: ", df.duplicated().sum())

Number of duplicated rows is:  0


In [15]:
print("Number of rows with NaNs is: ", df.isna().any(axis=1).sum())

Number of rows with NaNs is:  69


**Remove rows with empty values.**

In [18]:
df = df.dropna()

In [20]:
len(df.index)

548

### Exploratory Data Analysis

#### Pairplot Analysis

In [None]:
sns.pairplot(df, hue='Class')
plt.show()