# Workshop 1: Machine Learning for Mac
## Step 1: Installing Homebrew and Python
- Navigate to the Launch Pad
- Search "Terminal"
- Copy and paste the following commands into the terminal one at a time and hit enter after each command:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

brew install python@3.11

## Step 2: Setup Python Virtual Environment
- Within VS Code access the terminal at the bottom of the screen by going to the bottom of the main panel until your cursor changes into a up arrow then dragging up
- Once, this panel is revealed, click on the "terminal" tab
- If you already have a terminal type "exit" then hit enter then follow the instructions above
- Now, copy and paste the following commands into the terminal one at a time (you can't use ctrl-v to paste into Powershells. You can just click on the cell and ctrl-c to copy it):

python3.11 -m venv ml-venv

source ml-venv/bin/activate

pip install -r requirements.txt

## Step 3: Test PyTorch
- Run the following code segment to verify that the virtual environment is working as intended
    - When doing this, VS Code will prompt you for a kernel at the top of the screen
    - Click on "Python Environments" then "ml-venv"
- If VS Code prompts you to install something when you run the following code block, press "Install"

In [1]:
import torch
import pandas as pd
import torch.nn as nn
import seaborn as sns
import matplotlib.pyplot as plt

# Step 4: Importing and Cleaning Data

In [None]:
# import CSV data located in the data folder through pandas
heartData = pd.read_csv('data/heart.csv')
o2Data = pd.read_csv('data/o2Saturation.csv', header=None)

# copy the O2 saturation data to the main pandas dataframe
heartData['o2Saturation'] = o2Data

# moe the output column to the end of the dataframe
cols = list(heartData.columns)
cols.remove('output')
cols.append('output')
heartData = heartData[cols]

# normalize the dataset between 0 and 1
for col in heartData.columns:
    # these columns are already between 0 and 1 so we don't need to normalize them
    if col == 'output' or col == 'sex' or col == 'fbs' or col == 'exng': continue

    # normalize the data by subtracting the minimum value and dividing by the range
    heartData[col] = (heartData[col] - heartData[col].min()) / (heartData[col].max() - heartData[col].min())

# display the first 5 rows of the dataframe to give an idea of what the data looks like
heartData.head()

# Step 5: Split the Dataset

In [3]:
# the training set with be 80% of the data, this randomly selects 80% of the data and resets the index
train = heartData.sample(frac=0.8, random_state=42)
train = train.reset_index(drop=True)

# the validation and test set will each be 10% of the data
# after the training set is removed from the data, the remaining data is split in half, one half for validation and the other for testing
validation = heartData.drop(train.index)
test = validation.sample(frac=0.5, random_state=42)
validation = validation.drop(test.index)

# reset the index of the validation and test sets
validation = validation.reset_index(drop=True)
test = test.reset_index(drop=True)

## Step 5.5: Visualizing Connections
- The following code displays the correlation coefficient between each of the 14 variables and the output variable
- As you may notice, none of the variables on their own can accurately predict whether the patient will experience a heart attack
- This is why a holistic analysis of the data using deep learning is advantagous

In [None]:
plt.figure(figsize=(12, 1))
sns.heatmap(train.corr()[['output']].T, annot=True, cmap='coolwarm', fmt='.2f', cbar=False)
plt.show()

# Step 6: Define a Dataset Object
- To make the data easier to parse through while training, PyTorch offers a Dataset class we can inherant and implement for our own data

In [None]:
# this will inherit the torch Dataset class and override the __len__ and __getitem__ methods
class HeartDataset(torch.utils.data.Dataset):
    # the constructor will take a dataframe as an argument
    def __init__(self, df):
        self.df = df
    
    # the __len__ method will return the number of rows in the dataframe
    def __len__(self):
        return len(self.df)

    # everytime we parse through the dataset, this method will be called
    def __getitem__(self, idx):
        # get the row at the current
        row = self.df.iloc[idx]
        
        # the input will be all the columns except the output column because the output column is the answer
        input = torch.tensor(row.drop('output').values, dtype=torch.float32)
        
        # the expected output will be the output column
        output = torch.tensor(row['output'], dtype=torch.float32)

        # new need to reshape the tensor from a row vector to a column vector so the answer can be lined up with the output of the model
        output = output.reshape(1)

        # return the input and expected output
        return input, output

# Step 7: Design the Neural Network

In [None]:
class HeartAttackModel(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x): pass

# Step 8: Create Training Function

In [None]:
def train_model(model, train_loader, val_loader, epochs, learningRate): pass

# Step 9: Initialize Objects and Train Model

# Step 10: Test Model Performance

Link to dataset: https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-prediction-dataset