# FAQ Bot - Q&A model, trained using pairs of questions and answers

Fine tune a large language model with a list of question and answers. This approach os called Closed Book Q&A because the model doesn't require context and is capable of answering variations of the questions you provide in your dataset.

This is an evolution of classic ChatBots because LLMs like T5 can disambiguate and generalize better than the old technologies we find in these ChatBots services.

For that purpose you'll use a **[T5 SMALL SSM ~80MParams](https://huggingface.co/google/t5-small-ssm)** model, accelerated by a trn1 instance ([AWS Trainium](https://aws.amazon.com/machine-learning/trainium/)), running on [Amazon SageMaker](https://aws.amazon.com/sagemaker/).

The dataset is the content of all **AWS FAQ** pages, downloaded from: https://aws.amazon.com/faqs/

## 1) Install some dependencies and check permissions
You need a more recent version of **SageMaker** Python Library. After this install you'll need to restart the kernel.

In addition you'll be using [SageMaker Studio Image Build](https://aws.amazon.com/blogs/machine-learning/using-the-amazon-sagemaker-studio-image-build-cli-to-build-container-images-from-your-studio-notebooks/) to deploy your image. If you have not previously used it please follow that link and setup the required IAM permissions for your SageMaker role.

>**If you have not setup IAM permissions to the ECR repositiory in `image_uri`, do so now to prevent errors.**

>**If you have never before done a SageMaker training job with Trn1, you'll need to do a service level request. This can take a few hours, best to make the request early so you don't have to wait.**

You can edit this URL to go directly to the page to request the increase:

`https://<region>.console.aws.amazon.com/servicequotas/home/services/sagemaker/quotas/L-79A1FE57`

In [None]:
%pip install sagemaker-studio-image-build
%pip install --upgrade sagemaker

In [None]:
import sagemaker
print(sagemaker.__version__)
if not sagemaker.__version__ >= "2.146.0": print("You need to upgrade or restart the kernel if you already upgraded")

sess = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sess.default_bucket()

# Make changes here to use an existing repository and/or tag
repo_name="sagemaker-studio-pytorch-training-neuron"
image_tag="1.13.1-neuron-py38-sdk2.9.0-ubuntu20.04"

image_name=f"{repo_name}:{image_tag}"
image_uri=f"{sess.account_id()}.dkr.ecr.{sess.boto_region_name}.amazonaws.com/{repo_name}:{image_tag}"
print(image_name)
print(image_uri)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {bucket}")
print(f"sagemaker session region: {sess.boto_region_name}")

## 2) Visualize and upload the dataset
Take note of the S3 URI here if you get interrupted, no need to reupload later.

In [5]:
import pandas as pd
df = pd.read_csv('train.csv.gz', compression='gzip', sep=';')
df.head()

Unnamed: 0,service,question,answers
0,/ec2/autoscaling/faqs/,What is Amazon EC2 Auto Scaling?,Amazon EC2 Auto Scaling is a fully managed ser...
1,/ec2/autoscaling/faqs/,When should I use Amazon EC2 Auto Scaling vs. ...,You should use AWS Auto Scaling to manage scal...
2,/ec2/autoscaling/faqs/,How is Predictive Scaling Policy different fro...,Predictive Scaling Policy brings the similar p...
3,/ec2/autoscaling/faqs/,What are the benefits of using Amazon EC2 Auto...,Amazon EC2 Auto Scaling helps to maintain your...
4,/ec2/autoscaling/faqs/,What is fleet management and how is it differe...,If your application runs on Amazon EC2 instanc...


In [None]:
s3_uri = sess.upload_data(path='train.csv.gz', key_prefix='datasets/aws-faq/train')
print(s3_uri)

## 3) Build a custom container image with NeuronSDK 2.9+
NeuronSDK 2.9+ is required to deal with T5. We'll take a pre-existing container with NeuronSDK 2.8 and upgrade it. To build the docker image and upload it to ECR, we'll make use of **sagemaker-studio-image-build**

In [7]:
import os
if not os.path.isdir('container'): os.mkdir('container')

In [10]:
%%writefile container/Dockerfile
FROM 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training-neuron:1.13.0-neuron-py38-sdk2.8.0-ubuntu20.04

RUN apt update && apt install -y \
    aws-neuronx-dkms=2.* \
    aws-neuronx-tools=2.* \
    aws-neuronx-collectives=2.* \
    aws-neuronx-runtime-lib=2.* \
    && rm -rf /var/lib/apt/lists/*

RUN pip3 install -U pip
RUN pip3 install --force-reinstall neuronx-cc==2.* torch-neuronx torchvision==0.14.1 transformers==4.27.4

Overwriting container/Dockerfile


### Use sm-docker to build our image
Due to instance memeory issues, BUILD_GENERAL1_MEDIUM seems to be best.
We'll use `image_name` for our repo (will create properly as long as permissions were setup according to the [documentation here](https://aws.amazon.com/blogs/machine-learning/using-the-amazon-sagemaker-studio-image-build-cli-to-build-container-images-from-your-studio-notebooks/)).


In [None]:
!sm-docker build container --compute-type BUILD_GENERAL1_MEDIUM --repository $image_name

## 4) Prepare the train/inference script

In [15]:
import os
if not os.path.isdir('src'): os.mkdir('src')

In [None]:
!pygmentize src/question_answering.py

## 5) Kick-off our fine tuning job on Amazon SageMaker
We need to create a SageMaker Estimator first and then invoke **.fit**. 

Please, notice we're passing the parameter **checkpoint_s3_uri**. This is important because NeuronSDK will spend some time compiling the model before fine tuning it. The compiler saves the model to cache files and, with this param, the files will be uploaded to **S3**. So, next time we run a job, NeuronSDK can just load back the cache files and start training immediately.

When training for the first time, the training job takes ~9 hours to process all 60 Epochs on an **inf1.32xlarge**.


If you need to wait for a quota increase like I did. When you come back, run cell 2 to setup the sagemaker session and S3 uris, etc. Then run the below to get the process started.

In [None]:
# manually set S3 bucket:
# s3_uri = "s3://sagemaker-us-west-2-430432044279/datasets/aws-faq/train/train.csv.gz"
# s3_uri = "s3://sagemaker-<region>-<accountID>/datasets/aws-faq/train/train.csv.gz"
print(s3_uri)

# manually set to the image uri given by sm-docker
# image_uri = "430432044279.dkr.ecr.us-west-2.amazonaws.com/sagemaker-studio-d-io5zhcigtblg:neuron-labs-1677869849323"
# image_uri = "<accountID>.dkr.ecr.<region>.amazonaws.com/<repo>:<tag>"
print(image_uri)

In [13]:
# https://github.com/aws/deep-learning-containers/blob/master/available_images.md#neuron-containers
from sagemaker.pytorch import PyTorch

estimator = PyTorch(
    entry_point="question_answering.py", # Specify your train script
    source_dir="src",
    role=role,
    sagemaker_session=sess,
    instance_count=1,
    instance_type='ml.trn1.32xlarge',
    image_uri=image_uri,
    disable_profiler=True,
    output_path=f"s3://{bucket}/output",
    
    # Parameters required to enable checkpointing
    # This is necessary for caching XLA HLO files and reduce training time next time    
    checkpoint_s3_uri=f"s3://{bucket}/checkpoints",
    volume_size = 512,
    distribution={
        "torch_distributed": {
            "enabled": True
        }
    },
    hyperparameters={
        "model-name": "t5-small-ssm",
        "lr": 5e-5,
        "num-epochs": 60
    },
    metric_definitions=[
        {'Name': 'train:loss', 'Regex': 'loss:(\S+);'}
    ]
)
estimator.framework_version = '1.13.1' # workround when using image_uri

In [None]:
estimator.fit({"train": s3_uri})

## 6) Deploy our model to a SageMaker endpoint
Here, we're using a pre-defined HuggingFace model class+container to just load our fine tuned model on a CPU based instance: c6i.4xlarge (an Intel Xeon based machine).

>If you're picking this up later uncomment line 4, fill in the path to your model artifacts, comment line 9 out, and uncomment line 10.

In [10]:
# uncomment and modify this if you're picking this back up later and your training was sucessful.
# you'll need to get the model s3 URI from sagemaker -> Training -> Training Jobs -> <Your job name> -> Output -> S3 model artifact

# pre_trained_model = YOUR_S3_PATH
from sagemaker.huggingface.model import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=estimator.model_data,       # path to your model and script
   # model_data=pre_trained_model,       # path to your model and script
   role=role,                             # iam role with permissions to create an Endpoint
   transformers_version="4.26.0",         # transformers version used
   pytorch_version="1.13.1",              # pytorch version used
   py_version='py39',                     # python version used
   sagemaker_session=sess,
   
   # for production it is important to define vpc_config and use a vpc_endpoint
   #vpc_config={
   #    'Subnets': ['subnet-A-REPLACE', 'subnet-B-REPLACE'],
   #    'SecurityGroupIds': ['sg-A-REPLACE', 'sg-B-REPLACE']
   #}    
)

In [None]:
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.c6i.4xlarge",
)

## 7) Run a quick test

In [17]:
%%time
questions = [
    "What is SageMaker?",
    "What is EC2 AutoScaling?",
    "What are the benefits of autoscaling?"
]
resp = predictor.predict({'inputs': questions})
# print(resp)
for q,a in zip(questions, resp['answer']):
    print(f"Q: {q}\nA: {a}\n")

Q: What is S3?
A: S3 is an Amazon of the S3 storage that provides a generic way to store and process data between a single file. You can use S3 in the cloud to process data in the cloud without worryinging your application or managing. You can use S3 in AWS services as well as custom-built

Q: What is EC2?
A: Amazon EC2 is a new type of computing Machine that can be deployed to web, deploy, and scale web web of your application. With EC2, you can run your applications without having to use the internet to manage your application or business intelligence.

Q: What are the benefits of using Application Load Balencer?
A: Application Load Balencer enabless applications the load and complexity of applications across multiple applications, allowing you to easily configure applications (e.g., network network, network network, etc.) using Network Load Balens.

Q: What is SageMaker JumpStart?
A: SageMaker JumpStart is a new ML capability tool that helps you quickly launch, deploy, and deploy yo

## 8) Clean up
This will delete the model and the endpoint you created

In [None]:
# predictor.delete_model()
predictor.delete_endpoint()