# Introduction

## Topics
- Pretrained models
- diff b/w pretrained models and custom algorithms
- publically available pretrained model
- BERT model architecture
    - how to pre-train the model un-supervised step?
    - how to fine tune it to your data?
    - Pre-train
        - HuggingFace Model Zoo
        - Custom model training loop
    - profile the model
        - deep learning model will take time to learn
        - profiling will give some visibility to model training
        - quickly trouble shoot
        - SageMaker debugger to capture the model trining metrics and profile system resouce utalized
        

# Train and Debug a Custom Machine Learning Model

- Areas and Tool box
![image.png](attachment:image.png)

- Use case
    - sentiment analysis of Customer review classification
- RoBERTa model


# Pre-trained Models

- training a model particularly a deep learning model from the scratch will take days or even weeks depending on the availabality of cpu or gpu
- pre-trained models are available
    - use case
    - data set
    
- built in algorithm vs pre-trained model
![image.png](attachment:image.png)

## Pre-trained BERT models available
![image.png](attachment:image.png)

- BERT
    1. pre train on large courpus of data (un-supervised)
    2. train on the data you have (fine tuning)
    - the fine tuning can be similar to the transfer lenaring in NLP
    - fine tuning is faster

- Where to find the pre-trained models?
    - many of the machine learnign models have pre-trained model-hubs
        - Pytorch Hub
        - mxnet Glue Model Zoo
        - Tensorflow Hub
        - Hugging Face Model Hub
        
    - Amazon SageMaker JumpStart gives pretrained model access
        - works with pytorch and tensorflow hub
        - one click deployement with SageMaker environment
        - fine tuen / deployment (the selected model)
        - after launcing the model in jumpstart, it provides a notebook link, where you can do the rest.

# Pre-Trained BERT Models

- <u>How do the BERT pretrained?</u>
- Unsupervised Learning
    - Generic traning data 
        - Masked Language Models:- the model mask 50% of the words in the sentence, then tries to predict the masked words.
        - Next Sentence Prediction (NSP):- select sentence pairs, and on the 50% of the sentence pair, one of the sentence is replaced with a random sentence from another part of the document. Then the model tries to predict the valid pair of sentence.    
    

# <u>Module 2: Training of a Custom Model with Amazon SageMaker</u>
# Train Custom Model with Amazon SageMaker

- Huggies face Roberta Model (going to use)
- Custom model in Amazon SageMaker
    - 3 ways to train
        - built in algoritham
        - bring your own script
        - bring your own container
    - now we are going to use the second option, that is bring your own script
    
    ![image.png](attachment:image.png)


- training job needs
    - traning data
    - compute resources (amazon SageMaker)
    - output location
    - training code image
    
- Steps needs to perform
![image.png](attachment:image.png)

- configure the trianing validation and test datasets
- evaluation metric to capture
    - eg validation loss or validation accuracy
    
- Need to configure the model hyper parameters
    - epochs, learning rate etc
    
- custom training model script to fit the model
<br>
<br>
- ## <u>lets dive into details</u>
- ## Configure dataset and metircs
    - example code

In [None]:
from sagemaker.inputs import TrainingInput

s3_input_train_data = TrainingInput(s3_data="s3://...")
s3_input_validation_data = TrainingInput(s3_data="s3://...")
s3_input_test_data = TrainingInput(s3_data="s3://...")

- Regex expression to capture the loss from Amazon cloudWatch logs

In [None]:
metic_defenitions = [
    {'Name': 'validatiton:loss', 'Regex': 'val_loss: ([0-9\\.])'},
    {'Name': 'validatiton:accuracy', 'Regex': 'val_acc: ([0-9\\.]+)'}
]

## Configure the Hyper-parameters

In [None]:
hyperparameters = {
    'epochs' : 3,
    'learning_rate': 2e-5,
    'train_batch_size': 256,
    'train_steps_per_epoch': 50,
    'validation_batch_size': 256,
    'validation_steps_per_epcoh': 50,
    ...
    'max_seq_length': 128
}

- max_seq_lenght is the max words in a sentence (padded or chopped)

## Custom training script
- start from a sample trainig script and then customize it

- provide training script (src/trian.py)

In [None]:
from trasformers import RobertaModel, RobertaConfig
from trasformers import RobertaForSequenceClassification # import Huggies Face trasformer liberaries (pip install trasformers)

config = RobertaConfig.from_pretrained(
    'roberta-base', 
    num_labels=3,
    id2label={
        0: -1,
        1: 0
        2: 1
    },
    label2id={
        -1: 0,
        0: 1,
        1: 2
    }
)

- download model config
    - by `RobertaConfig.from_pretrained` and sepcify the model name here `roberta-base`
    - we can re-configure the downloaded model by
        - number of labels
        - id2label mappings
        
- download the Roberta model from Huggies face by

In [None]:
model = RobertaForSequenceClassification.from_pretrained(
    'roberta-base', # model name
    config=config # custom configurations
)

model = train_model(
    model, ...
) # code to fine tune the model


# fine tune the model using py-tourch

def train_model(model, train_data_loader, df_train, val_data_loader, df_val, args):
    
    loss_function = nn.CrossEntropyLoss()
    optimizer = optim.Adam(
        prams=model.parameters(), 
        lr=args.learning_rate
    )
    

# now write the training code
for epochs in range(args.epochs): # loop through the epochs
    print('EPOCH -- {}'.format(epoch))
    for i, (sent, label) in enumerate(train_data_loader):
        if i < arg.train_step_per_epoch:
            model.train() # put the model in the training mode
            optimizer.zero_grad() # clear gradients from the previous step
            sent = sent.squeeze(0)
            output = model(sent)[0]
            _, predicted = torch.max(output, 1) #retrieve the model prediction
            loss = loss_function(output, label) # calculate the loss
            loss.backward() # compute the gradients through backpropogation
            optimizer.step() # update the parameters (weights and bias)
    ... # need to add the code to do the validation
    # print the validation matrix
return model

# fit the model
from sagemaker.pytorch import PyTorch as PytorchEstimator

estimator = PyTorchEstimator(
    entry_point='train.py'
    source_dir='src',
    role=role,
    instance_count=1,
    instance_type='ml.c5.9xlarge'
    ...
)

# you can also specify 
# famework_version (the PyTorch version you are using)
# hyperparameters=hyperparameters
# metric_defentions=metric_defenitions

estimator.fit(...) # start the model training

# Debug and Profiel Models

- debugging and profiling will give insigts about common traning errors. some of them are
    - vanishing and exploding gradients
    - bad initalization (same initial value, model unable to learn; all the activation has some initalization method)
    - overfitting
- harder for distributed training

- another important area is the system utalization
    - how much GPU, CPU and other system modules are the trainign is using
    - potential system bottlenecks
        - i/o bottlenecks when loading the data
        - CPU and Memory bottlenecks when processing the data
        - GPU bottlenecks or under utalization during model training
        
- we can take corrective action, if we know the status of the model training
    - why don't we stop training when the model starts overfitting?
    - sent a notification via email/text when an issue is found
    
- <b>SageMaker Debugger</b> helps you to debug and profile the model training

# Debug and Profile the Models with Amazon SageMaker Debugger

- automatically capture the metics during the training
    - training and validation loss
    - accuracy
    - learning rates etc
    
- debugger can be visualized in sagemaker studio for easy understanding
- debugger can gives warnings and remediation advice when common problems are dectected
- automatically monitor system resouces
    - gpu
    - cpu
    - memory etc
- caputre realtime model data during training
![image.png](attachment:image.png)

- the data includes
    - system metircs
        hardware utaliztion data
        
    - networks metrics
    - data input and output metrics
    
    - Frame work metircs
        - convolutional operations in forward pass
        - batch normalization operaion in backward pass
        - data loader process between stps
        - gradient desecnt algorithm operations
        
    - output tensors
        - sclar values (accuracy and loss)
        - metircs (weights, gradients, input layers and output layers)

- debugger built in rules

![image.png](attachment:image.png)

## Amazon SageMaker Debugger

![image.png](attachment:image.png)

In [None]:
# Code examples

# Debug model training: Configuration

from sagemaker.debugger import Rule, rule_configs

rules=[
    Rule.sagemaker(rule_config.loss_not_decreasing()),
    Rules.sagemaker(rule_config.overtraining())
] # select built in models to evaluate the progress

from sagemaker.pytorch import PyTorch as PyTorchEstimator

estimator = PyTorchEstimator(
    entry_point='train.py',
    ...
    rules=rules
)

# sagemaker will process one job for each process in parallel to the training job


## Profile the training job: Configuration

- select the rule to observe
- by default the profiler collect system report on every 500 ms

- <b>what is the diff b/w debugging rules and profiling rules?</b>

In [None]:
from sagemaker.debugger import ProfilerRule, rule_configs

rules=[
    ProfileRule.sagemaker(rule_config.LowGPUUtilization()),
    ProfileRule.sagemaker(rule_config.ProfilerReport()),
    ...
] # select the rules for profiling

from sagemaker.debugger import ProfileCOnfig, FrameworkProfile

profile_config = ProfileConfig(
    system_monitor_interval_millis=500, # collect system info in every 500 milli-second
    framework_profile_params=FrameworkProfile(num_steps=10)
)

# pass the profile config to the estimator
from sagemaker.pytorch import PyTorch as PyTorchEstimator

estimator = PyTorchEstimator(
    entry_point='train.py',
    ...
    rules=rules,
    profiler_config = profiler_config # pass profiler config to the estimator
)

- all the rules will be comprehenced to a profile report
- download the report while running the training job or after the training job is over from s3
- Rule summary section
    - the debugger aggregate all rules
    
- the profile report contain
    - cpu utalization
    - network utalization
    
- it also create a system utalization heat map

    


# Week 2: Optional references

- [PyTorch Hub](https://pytorch.org/hub/)

- [TensorFlow Hub](https://www.tensorflow.org/hub)

- [Hugging Face open-source NLP transformers library](https://github.com/huggingface/transformers)

- [RoBERTa model](https://arxiv.org/abs/1907.11692)

- [Amazon SageMaker Model Training (Developer Guide)](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html)

- [Amazon SageMaker Debugger: A system for real-time insights into machine learning model training](https://www.amazon.science/publications/amazon-sagemaker-debugger-a-system-for-real-time-insights-into-machine-learning-model-training)

- [The science behind SageMaker’s cost-saving Debugger](https://www.amazon.science/blog/the-science-behind-sagemakers-cost-saving-debugger)

- [Amazon SageMaker Debugger (Developer Guide)](https://docs.aws.amazon.com/sagemaker/latest/dg/train-debugger.html)

- [Amazon SageMaker Debugger (GitHub)](https://github.com/awslabs/sagemaker-debugger)

# Assignment

## Train a review classifier with BERT and Amazon SageMaker
1. Configure dataset

2. Configure model hyper-parameters

3. Setup evaluation metrics, debugger and profiler

4. Train model

5. Analyze debugger results

6. Deploy and test the model



- Notebook URL: - aws s3 cp --recursive s3://dlai-practical-data-science/labs/c2w2-824159/ ./