# AI/ML Introduction/Outline

* Suggested audience: Basic knowledge of Python, curious about ML

* Basic machine learning principles with theory

* Mnist is our example dataset - walkthrough/demonstration-simpler Imagenet

* NOT LLMs, this will be a future presentation

# If you want to recreate this...

* https://github.com/chpc-uofu/...  

* Jupyter Python Version = CHPC Deep Learning 2024.1


# Machine Learning - Why?
<img src="../images/cactuscaptioned.jpg" width="200" height="200" align=right />
<img src="../images/neekacaptioned_resized.jpg" width="200" height="150" align=right />

* Image recognition

* Predict the future, model disease spread        

* Patterns behind the chaos even if you can not see them

* Create and train a model with a dataset

# Types of Learning

* Supervised (labeled data) - classification, regression

* Unsupervised (unlabeled data) first stage of LLMs - future presentations

* There is an underlying assumption behind machine learning

* Past performance does not guarantee future results

# Supervised Learning

* Supervised – you have a labeled target (thing you are trying to predict)

* Remember to separate this from your data!

* Another separation – train and test partitions - checks for overfitting and remember past performance warning

In [2]:
#Pandas is the standard dataframe/SQL library in Python
import pandas as pd
df=pd.read_csv('Tornado_Example_Data.csv')
df

Unnamed: 0,Elevation,Humidity,Month,Tornado,Avg Temperature
0,500,77,July,1,89
1,4300,18,July,0,95
2,7100,11,July,0,87
3,4300,88,June,1,89
4,500,34,December,0,50
5,300,55,November,0,58
6,300,77,June,1,91
7,400,66,August,1,95


# Tornado - what kind of target? Classification or regression?

* Avg Temperature could be a target for regression (raw value).

* Following convention, we will label our target as y and our dataset as X.

In [3]:
y = df['Tornado']
# VERY IMPORTANT HERE, we don't want the target in the dataset or we'll end up predicting ourself.
# The goal is to predict unseen events.
X = df.drop(['Tornado'], axis=1)
X

Unnamed: 0,Elevation,Humidity,Month,Avg Temperature
0,500,77,July,89
1,4300,18,July,95
2,7100,11,July,87
3,4300,88,June,89
4,500,34,December,50
5,300,55,November,58
6,300,77,June,91
7,400,66,August,95


# Important things to mention

* We need to do train-test split of our dataset. This is how we can judge how well it works on unseen data which is the entire point of machine learning.

* For classification, we generally want to preserve the ratio of our target from the dataset.

* If you did a strictly random assignment of rows of data into train or test partitions, you could end up with ratios not representative of your data
* There may be times you would want to manipulate ratios but you need to be aware of this issue at a minimum.

In [4]:
# We have equal ratios here (4 tornados, 4 not-tornados) and we want to preserve that. So we use the train_test_split function with the
# following parameters:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    stratify=y, 
                                                    test_size=0.25)

# Stratify = y because we want same percentage of 1s and 0s in each partition. 
# Common train/test ratios are around 70-30, 80-20 or 75-25 as done above.

In [5]:
# What do these look like?
print(X_train)
print(y_train)

   Elevation  Humidity     Month  Avg Temperature
5        300        55  November               58
1       4300        18      July               95
6        300        77      June               91
2       7100        11      July               87
7        400        66    August               95
0        500        77      July               89
5    0
1    0
6    1
2    0
7    1
0    1
Name: Tornado, dtype: int64


# How does it learn?

* "Any sufficiently advanced technology is indistinguishable from magic" – Arthur C. Clarke

* Technology here = <s>magic</s> math

* Training – create model, run data through it, check error, repeat

* Activation function, Loss function, Optimizer, Metrics

# Anatomy of a machine learning model

<img src="../images/Backprop.jpg" width="600" height="600" align=center/>

# Activation function
<img src="../images/sigmoid.png" width="200" height="100" align=right />
<img src="../images/relu.png" width="200" height="100" align=right />

* They introduce nonlinearity which is vital for more complicated problems
  
* Function must be differentiable to measure rate of change

* Relu has been default in the field for several years now

* Relu is quick to compute and works for most tasks

# Loss function

* Measures difference between our predictions and truth

* Classification and regression use different functions - THIS IS IMPORTANT and we will show what happens if it's wrong

* Cross entropy functions for classification 

* MSE or MAE for regression

# Loss function - sidebar

* We said before that we want to keep the ratio of targets the same in train and test data

* Some data is not balanced - earthquakes, anomaly detection might be 99% all "0"

* If you don't do anything about this, you could end up with a model that will always predict 0 and get 99% accuracy

* Writing a custom loss function is one way to deal with this - penalize loss more if it misses a 1 than if it misses a 0

# Optimizer

* Update model parameters to minimize loss

* Adam has been standard for some time (both C&R)

* SGD (Stochastic Gradient Descent) was original

* AdamW is modified Adam, which was modified SGD

* All use a learning rate parameter - (around .0001 - .01) higher means quicker training but you might miss your solution

# Metrics - How well did we do?

* Classification / regression will influence this choice

* There may be other considerations (TP, FP)

* Different accuracy metrics like loss functions

* Today we will use sparse_categorical_accuracy

* Regression would continue to use MSE or MAE

# Units and Layers

* Brings everything together

* A machine learning model is made up of layers

* Which are made up of units

* LLMs have 1000s, our model today will have 3 - 4

# Anatomy of a machine learning model

<img src="../images/Backprop.jpg" width="600" height="600" align=center/>

# END OF THIS NOTEBOOK - CODE TO FOLLOW