## Import necessary libraries

In [None]:
# Imports.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Data Loading and Inspection

### Dataset information

NOTE: 
- 1. This dataset doesn't come with column names, here I have created a list with column names and when you use read_csv function, use the proper parameter to assign the column names to the dataframe.

- 2. In this dataset, besides null values, we have some columns that have '? There is a way to replace a string with a null value. Check the parameters of the read_csv function to use the appropriate parameter for this purpose.

In [None]:
# Download the dataset and load it into a DataFrame.

# URL of the Heart Disease UCI dataset.
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data"

# Column names for the dataset.
column_names = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target']

# Read the dataset into a pandas DataFrame.
df = pd.read_csv(url, names=column_names)
df

In [None]:
# Drop the 'target' column
df = df.drop(columns=['target'])
df

In [None]:
# Display df info.
df.info()

## Data Cleaning

In [None]:
# Handle missing values.
# Remove rows with missing values.
df = df.dropna()
df

In [None]:
# Check for and handle duplicates.

# Find the duplicate rows.
duplicates = df[df.duplicated()]
# Display the duplicate rows.
duplicates

In [None]:
# Display df info.
df.info()

## Data Manipulation

In [None]:
# Create 'AgeGroup' column
bins = [29, 45, 60, 77]
labels = ['Young', 'Middle-aged', 'Senior']
df['AgeGroup'] = pd.cut(df['age'], bins=bins, labels=labels)
df

In [None]:
# Create a new feature 'HeartHealth' using the 'apply' function
# Conditions: Good if thalach > 150 and chol < 240, Poor otherwise
def heart_health(row):
    if row['thalach'] > 150 and row['chol'] < 240 :
        return 'Good'
    else:
        return 'Poor'

In [None]:
# Create 'RiskFactor' column
# formula: age * chol + thalach

# TODO

In [None]:
df

## Data Visualization

### 1. Line Plot

This line plot illustrates the average trend of Resting Blood Pressure (trestbps) across different age groups.
- X-Axis: Age
- Y-Axis: Average Resting Blood Pressure

The line represents the average Resting Blood Pressure at different ages.

The marker 'o' indicates individual data points.

The plot provides insight into how the average Resting Blood Pressure changes with increasing age.

In [None]:
# Line plot - Trend of 'trestbps' over 'age'

# TODO

### 2. Scatter Plot

This scatter plot visually explores the relationship between Age and Risk Factor.
- X-Axis: Age
- Y-Axis: Risk Factor

In [None]:
# Scatter plot - Relationship between Age and Risk Factor

# TODO

### 3. Bar Plot

This stacked bar plot represents the distribution of HeartHealth (Good or Poor) within each Age group.
- X-Axis: Age Group
- Y-Axis: Count

In [None]:
# Bar plot - HeartHealth distribution within each Age group

# TODO

### 4. Histogram Plot

This histogram visually displays the distribution of patient ages in the dataset.
- X-Axis: Age
- Y-Axis: Frequency (Number of individuals)

The histogram provides a visual representation of how patient ages are distributed in the dataset.
The 'skyblue' color represents the bars, and each bar corresponds to an age group.

The number of bins (20) determines the granularity of the age distribution.

In [None]:
# Histogram - Distribution of patient ages

# TODO

### 5. Pie Chart

This pie chart visually represents the percentage distribution of individuals across different age groups.

- Each slice of the pie corresponds to a specific age group (Young, Middle-aged, Senior).
- The 'skyblue,' 'orange,' and 'lightgreen' colors distinguish between the age groups.
- The percentage labels on each slice indicate the proportion of individuals in each age category relative to the total.
- The startangle parameter (90 degrees) adjusts the starting position of the first slice.

In [None]:
# Pie cahr - Percentage Distribution of Age Groups

# TODO

### 6. Box Plot

This box plot visually represents the distribution of Maximum Heart Rate (thalach) for different chest pain types.
- X-Axis: Chest Pain Type
- Y-Axis: Maximum Heart Rate

The box plot shows the central tendency and spread of Maximum Heart Rate for each chest pain type.

Chest pain types are labeled on the x-axis, and the y-axis represents Maximum Heart Rate values.

The boxes indicate the interquartile range (IQR), while the horizontal line inside the box represents the median.

In [None]:
# Box plot - Distribution of 'thalach' for different chest pain types

# TODO

## 7. Multiple plot

In [None]:
# Data Visualization with fig, ax syntax

# TODO