# Recruitment Task: Classification of a Heart with Hypertrophic Cardiomyopathy

I'm going to prepare a solution classification problem of detecting hypertrophic cardiomyopathy based on the provided features. The goal is to build a model capable of correctly distinguishing between a healthy heart and a diseased heart. I'll use: 
- Logistic Regression
- Decision Tree Classifier
- Random Forest Classifier

## Dataset Attributes

- **Cardiomegaly** :  indicates whether the heart was diagnosed with cardiomegaly:
    - 1 – positive diagnosis (diseased heart)
    - 0 – negative diagnosis (healthy heart)

- **Heart width** - The maximum horizontal width of the heart.

- **Lung width** - The horizontal distance between the outermost points of the lungs.

- **Inertia tensors** - Metrics describing the distribution of heart and lung pixels relative to the coordinate axes, capturing the shape and orientation of the objects.
    - xx – distribution of pixels relative to the y-axis (elongation along x)
    - yy – distribution of pixels relative to the x-axis (elongation along y)
    - xy – distribution relative to both x and y axes (a high value indicates object rotation)
    - normalized_diff – a scalar value derived from the vector whose components are described above
    
- **Inscribed circle radius** - The radius of the largest circle that can be inscribed within the heart area, describing its symmetry and compactness.

- **Polygon area ratio** - The ratio of the area of the polygon enclosing the heart contour to the actual heart area.

- **Heart perimeter** - The length of the heart contour.

- **Heart area** - The area occupied by the heart.

- **Lung area** - The area occupied by the lungs.

- **CR ratio** - The ratio of heart width to lung width.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings("ignore")

## Data overview

In [2]:
# Load the data
df = pd.read_csv("task_data.csv")

df.head()

Unnamed: 0,ID,Cardiomegaly,Heart width,Lung width,CTR - Cardiothoracic Ratio,xx,yy,xy,normalized_diff,Inscribed circle radius,Polygon Area Ratio,Heart perimeter,Heart area,Lung area
0,1,0,172,405,424691358,1682.360871,3153.67188,-638.531109,-0.304239,688186,0.213446,6794873689,24898,75419
1,2,1,159,391,4066496164,1526.66096,5102.159054,-889.678405,-0.539387,7392564,0.203652,7886589419,29851,94494
2,5,0,208,400,52,2465.903392,5376.834707,-1755.344699,-0.371163,6933974,0.320787,8623229369,33653,66666
3,7,1,226,435,5195402299,2509.063593,6129.82127,-1025.079806,-0.419123,8414868,0.317545,906724959,42018,82596
4,8,1,211,420,5023809524,2368.770135,5441.767075,-1493.040062,-0.393442,7378347,0.263542,8642396777,35346,85631


In [3]:
# Displays data information

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37 entries, 0 to 36
Data columns (total 14 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   ID                          37 non-null     int64  
 1   Cardiomegaly                37 non-null     int64  
 2   Heart width                 37 non-null     int64  
 3   Lung width                  37 non-null     int64  
 4   CTR - Cardiothoracic Ratio  37 non-null     object 
 5   xx                          37 non-null     float64
 6   yy                          37 non-null     float64
 7   xy                          37 non-null     float64
 8   normalized_diff             37 non-null     float64
 9   Inscribed circle radius     37 non-null     object 
 10  Polygon Area Ratio          37 non-null     float64
 11  Heart perimeter             37 non-null     object 
 12  Heart area                  37 non-null     int64  
 13  Lung area                   37 non-nu

columns : CTR - Cardiothoracic Ratio, Inscribed circle radius, Heart perimeter are object type, they need to be converted to a float type.

In [4]:
# replaces ',' with '.' and then converting to a float type
cols = ["CTR - Cardiothoracic Ratio", "Inscribed circle radius", "Heart perimeter"]

df[cols] = (
    df[cols]
    .replace(",", ".", regex=True)  
    .astype(float)                    
)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37 entries, 0 to 36
Data columns (total 14 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   ID                          37 non-null     int64  
 1   Cardiomegaly                37 non-null     int64  
 2   Heart width                 37 non-null     int64  
 3   Lung width                  37 non-null     int64  
 4   CTR - Cardiothoracic Ratio  37 non-null     float64
 5   xx                          37 non-null     float64
 6   yy                          37 non-null     float64
 7   xy                          37 non-null     float64
 8   normalized_diff             37 non-null     float64
 9   Inscribed circle radius     37 non-null     float64
 10  Polygon Area Ratio          37 non-null     float64
 11  Heart perimeter             37 non-null     float64
 12  Heart area                  37 non-null     int64  
 13  Lung area                   37 non-nu

Now each column is numeric type.