# Identifying Age-Related Conditions Using Machine Learning Models
## Author: Boni M. Ale, MD, MSc, MPH
### Date: 08 June 2023

# 1. Introduction

To determine if someone has these medical conditions requires a long and intrusive process to collect information from patients. With predictive models, we can shorten this process and keep patient details private by collecting key characteristics relative to the conditions, then encoding these characteristics.

In this project, I will use Machine Learning to detect conditions with measurements of anonymous characteristics. Therefore the general objective of this analysis is to predict if a person has any of three medical conditions. In order to predict if the person has one or more of any of the three medical conditions (Class 1), or none of the three medical conditions (Class 0), I will create a model trained on anonymous measurements of health characteristics.



**Load Libraries**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns

**Load Datasets**

In [None]:
train_raw = pd.read_csv('/Users/boniale/Desktop/Identifying-Age-Related-Conditions-using-ML-models/data/train.csv')
test = pd.read_csv('/Users/boniale/Desktop/Identifying-Age-Related-Conditions-using-ML-models/data/test.csv')
greeks_raw = pd.read_csv('/Users/boniale/Desktop/Identifying-Age-Related-Conditions-using-ML-models/data/greeks.csv')
sample_submission = pd.read_csv('/Users/boniale/Desktop/Identifying-Age-Related-Conditions-using-ML-models/data/sample_submission.csv')

# 2. Exploratory Data Analysis

## 2.1. Data Description

In [None]:
print("Raw Train Data Set's size: ", train_raw.shape)

print("Raw Greeks Data Set's size: ", greeks_raw.shape)

#separate variables into new data frames
numeric_data = train_raw.select_dtypes(include=[np.number])
cat_data = train_raw.select_dtypes(exclude=[np.number])
cat_data = cat_data.drop(['Id'], axis=1)
print ("There are {} numeric and {} categorical columns in train raw data".format(numeric_data.shape[1],cat_data.shape[1]))

These 56 numeric variables include our target which is "if the person has one or more of any of the three medical conditions (Class 1)" or "none of the three medical conditions (Class 0)". This means that Class is actually a categorical variable

## 2.2. Numerical Variables Exploration

In [None]:
numeric_data.head(5)

These above 56 numeric variables include our target which is "if the person has one or more of any of the three medical conditions (Class 1)" or "none of the three medical conditions (Class 0)". This means that Class is actually a categorical variable. Therefore, we will remove our target from the set of numeric variable and look at the distrubition of all numerical variables. 

In [None]:
num = [f for f in train_raw.columns if train_raw.dtypes[f] != 'object']
num.remove("Class")
nd = pd.melt(train_raw, value_vars = num)
barplot_train = sns.FacetGrid (nd, col='variable',
                    col_wrap=5, 
                    sharex=False, 
                              sharey = False
                   )
barplot_train = barplot_train.map(sns.histplot, 'value')
plt.show("barplot_train")

We can see that several variables are not normally distributed. Let's focus on our target and see how it behave. 

## 2.3. Target Distribution

In [None]:
fig_targ = px.pie(train_raw, names=train_raw['Class'].map({1: 'has medical conditions', 0: 'no medical conditions'}), 
             height=400, width=600, 
             hole=0.7, 
             title='Target class Distribution',
                   color_discrete_sequence=['#4c78a8', '#72b7b2'])
fig_targ.update_traces(hovertemplate=None, textposition='outside', textinfo='percent+label', rotation=0)
fig_targ.update_layout(margin=dict(t=100, b=30, l=0, r=0), showlegend=False,
                        plot_bgcolor='#fafafa', paper_bgcolor='#fafafa',
                        title_font=dict(size=20, color='#555', family="Lato, sans-serif"),
                        font=dict(size=17, color='#8a8d93'),
                        hoverlabel=dict(bgcolor="#444", font_size=13, font_family="Lato, sans-serif"))
fig_targ.show()