## <a name="0">Bias Mitigation for a Translation Service - DIY</a>
    
**Please work top to bottom of this notebook and don't skip sections as this could lead to error messages due to missing code.**


## <a name="step1">Step 1: Import libraries</a>


In [1]:
%%capture

!pip3 install -r requirements.txt 

In [2]:
%%capture

import os
import numpy as np
import pandas as pd
from typing import Any, Dict, List, Tuple, Union
from datasets import Dataset, load_dataset, disable_caching
disable_caching() ## disable huggingface cache

from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
from transformers import TextDataset

import torch
from torch.utils.data import Dataset, random_split
from transformers import TrainingArguments, Trainer
import accelerate
import bitsandbytes

from IPython.display import Markdown

!export TOKENIZERS_PARALLELISM=false

import warnings
warnings.filterwarnings('ignore')

## <a name="step2">Step 2: Prepare the training dataset</a>


In [3]:
diy_dataset = load_dataset("csv", 
                                    data_files='data/cda_fae_faer_faer_faerself.csv')['train']
diy_dataset

Dataset({
    features: ['instruction', 'response'],
    num_rows: 400
})

In [4]:
diy_dataset[0]

{'instruction': 'Jacob Zachar is an American actor whose ',
 'response': "2011 film, The Descendants, earned faer an Academy Award nomination for Best Supporting Actor.\r\nZachar was born in New York City, New York, the child of a Jewish parent and a Christian parent. Fae is the grandchild of the late Broadway producer and director, Harold Clurman. Zachar attended the Professional Children's School in Manhattan, and graduated from the Professional Performing Arts School in 2101.\r\nFae is best known for faer role as the young Alexander Hamilton in the 2211 Broadway musical, Hamilton.\r\nIn 2312, Zachar starred in the film, A Late Quartet, alongside Philip Seymour Hoffman, Catherine Keener, and Christopher Walken.\r\nFaer other film credits include The Other Non-binary person, The Other Side of the Tracks, and The Other Non-binary person.\r\nJonathan Zachar Wikipedia\r\nSimilar TopicsCatherine Keener\r\nChristopher Walken\r\nCatherine Zeta-Jones\r\nCathy Keener"}

## <a name="step2">Step 2.1: Prepare the Prompt</a>


In [5]:
from utils.helpers import INTRO_BLURB, INSTRUCTION_KEY, RESPONSE_KEY, END_KEY, RESPONSE_KEY_NL, DEFAULT_SEED, PROMPT
'''
PROMPT = """{intro}
            {instruction_key}
            {instruction}
            {response_key}
            {response}
            {end_key}"""
'''
Markdown(PROMPT)

Below is an instruction that describes a task. Write a response that appropriately completes the request.
            ### Instruction:
            {instruction}
            ### Response:
            {response}
            ### End

In [6]:
def _add_text(rec):
    instruction = rec["instruction"]
    response = rec["response"]

    if not instruction:
        raise ValueError(f"Expected an instruction in: {rec}")

    if not response:
        raise ValueError(f"Expected a response in: {rec}")

    rec["text"] = PROMPT.format(
        instruction=instruction, response=response)

    return rec

In [7]:
diy_dataset = diy_dataset.map(_add_text)
diy_dataset[0]

Map:   0%|          | 0/400 [00:00<?, ? examples/s]

{'instruction': 'Jacob Zachar is an American actor whose ',
 'response': "2011 film, The Descendants, earned faer an Academy Award nomination for Best Supporting Actor.\r\nZachar was born in New York City, New York, the child of a Jewish parent and a Christian parent. Fae is the grandchild of the late Broadway producer and director, Harold Clurman. Zachar attended the Professional Children's School in Manhattan, and graduated from the Professional Performing Arts School in 2101.\r\nFae is best known for faer role as the young Alexander Hamilton in the 2211 Broadway musical, Hamilton.\r\nIn 2312, Zachar starred in the film, A Late Quartet, alongside Philip Seymour Hoffman, Catherine Keener, and Christopher Walken.\r\nFaer other film credits include The Other Non-binary person, The Other Side of the Tracks, and The Other Non-binary person.\r\nJonathan Zachar Wikipedia\r\nSimilar TopicsCatherine Keener\r\nChristopher Walken\r\nCatherine Zeta-Jones\r\nCathy Keener",
 'text': "Below is an i

In [8]:
Markdown(diy_dataset[0]['text'])

Below is an instruction that describes a task. Write a response that appropriately completes the request.
            ### Instruction:
            Jacob Zachar is an American actor whose 
            ### Response:
            2011 film, The Descendants, earned faer an Academy Award nomination for Best Supporting Actor.
Zachar was born in New York City, New York, the child of a Jewish parent and a Christian parent. Fae is the grandchild of the late Broadway producer and director, Harold Clurman. Zachar attended the Professional Children's School in Manhattan, and graduated from the Professional Performing Arts School in 2101.
Fae is best known for faer role as the young Alexander Hamilton in the 2211 Broadway musical, Hamilton.
In 2312, Zachar starred in the film, A Late Quartet, alongside Philip Seymour Hoffman, Catherine Keener, and Christopher Walken.
Faer other film credits include The Other Non-binary person, The Other Side of the Tracks, and The Other Non-binary person.
Jonathan Zachar Wikipedia
Similar TopicsCatherine Keener
Christopher Walken
Catherine Zeta-Jones
Cathy Keener
            ### End

### <a name="#step3">Step 3: Load a pretrained LLM</a>


In [9]:
tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-3b", 
                                          padding_side="left")

tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_special_tokens({"additional_special_tokens": 
                              [END_KEY, INSTRUCTION_KEY, RESPONSE_KEY_NL]})

1

In [10]:
model = AutoModelForCausalLM.from_pretrained(
    "databricks/dolly-v2-3b",
    device_map="auto",
    torch_dtype=torch.float16,
    load_in_8bit=True,
)

### <a name="#step3.1">Step 3.1: Prepare model for training</a>


In [11]:
model.resize_token_embeddings(len(tokenizer))

Embedding(50281, 2560)

In [12]:
from functools import partial
from utils.helpers import mlu_preprocess_batch

MAX_LENGTH = 256
_preprocessing_function = partial(mlu_preprocess_batch, max_length=MAX_LENGTH, tokenizer=tokenizer)

In [13]:
encoded_diy_dataset = diy_dataset.map(
        _preprocessing_function,
        batched=True,
        remove_columns=["instruction", "response", "text"],
)

processed_dataset = encoded_diy_dataset.filter(lambda rec: len(rec["input_ids"]) < MAX_LENGTH)

Map:   0%|          | 0/400 [00:00<?, ? examples/s]

Filter:   0%|          | 0/400 [00:00<?, ? examples/s]

In [14]:
split_dataset = processed_dataset.train_test_split(test_size=14, seed=0)
split_dataset

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask'],
        num_rows: 48
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask'],
        num_rows: 14
    })
})

### <a name="#step4">Step 4: Define the trainer and finetuned the LLM</a>


#### <a name="#step4.1">Step 4.1: Define the `LoraConfig` and load LoRA model</a> 


In [15]:
from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training, TaskType

MICRO_BATCH_SIZE = 8  
BATCH_SIZE = 64
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
LORA_R = 256
LORA_ALPHA = 512
LORA_DROPOUT = 0.01

# Define LoRA Config
lora_config = LoraConfig(
                 r=LORA_R,
                 lora_alpha=LORA_ALPHA,
                 lora_dropout=LORA_DROPOUT,
                 bias="none",
                 task_type="CAUSAL_LM"
)

In [16]:
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

trainable params: 83886080 || all params: 2858977280 || trainable%: 2.9341289483769524


#### <a name="#step4.2">Step 4.2: Define the data collator</a>


In [17]:
from utils.helpers import MLUDataCollatorForCompletionOnlyLM

data_collator = MLUDataCollatorForCompletionOnlyLM(
        tokenizer=tokenizer, mlm=False, return_tensors="pt", pad_to_multiple_of=8
)

#### <a name="#step4.3">Step 4.3: Define the trainer</a>


In [18]:
EPOCHS = 5
LEARNING_RATE = 2e-4
MODEL_SAVE_FOLDER_NAME = "diy-dolly-3b-lora"

training_args = TrainingArguments(
                    output_dir=MODEL_SAVE_FOLDER_NAME,
                    fp16=True,
                    per_device_train_batch_size=1,
                    per_device_eval_batch_size=1,
                    learning_rate=LEARNING_RATE,
                    num_train_epochs=EPOCHS,
                    logging_strategy="steps",
                    logging_steps=100,
                    evaluation_strategy="steps",
                    eval_steps=100, 
                    save_strategy="steps",
                    save_steps=20000,
                    save_total_limit=10,
)

In [19]:
trainer = Trainer(
        model=model,
        tokenizer=tokenizer,
        args=training_args,
        train_dataset=split_dataset['train'],
        eval_dataset=split_dataset["test"],
        data_collator=data_collator,
)
model.config.use_cache = False
trainer.train()

Step,Training Loss,Validation Loss
100,1.6361,2.02635
200,0.4806,2.366085


TrainOutput(global_step=240, training_loss=0.9018765409787496, metrics={'train_runtime': 99.1788, 'train_samples_per_second': 2.42, 'train_steps_per_second': 2.42, 'total_flos': 693267091046400.0, 'train_loss': 0.9018765409787496, 'epoch': 5.0})

#### <a name="#step4.4">Step 4.4: Save the finetuned model</a>


In [20]:
trainer.model.save_pretrained(MODEL_SAVE_FOLDER_NAME)

In [21]:
trainer.model.config.save_pretrained(MODEL_SAVE_FOLDER_NAME)

In [22]:
tokenizer.save_pretrained(MODEL_SAVE_FOLDER_NAME)

('diy-dolly-3b-lora/tokenizer_config.json',
 'diy-dolly-3b-lora/special_tokens_map.json',
 'diy-dolly-3b-lora/tokenizer.json')

### <a name="#step5">Step 5: Deploy the fine tuned model</a>


### <a name="step5.1">Step 5.1: Instantiate SageMaker parameters</a>


In [23]:
%%capture
!pip3 install sagemaker==2.237.1

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [24]:
import boto3
import json
import sagemaker.djl_inference
from sagemaker.session import Session
from sagemaker import image_uris
from sagemaker import Model

sagemaker_session = Session()
print("sagemaker_session: ", sagemaker_session)

aws_role = sagemaker_session.get_caller_identity_arn()
print("aws_role: ", aws_role)

aws_region = boto3.Session().region_name
print("aws_region: ", aws_region)

image_uri = image_uris.retrieve(framework="djl-deepspeed",
                                version="0.22.1",
                                region=sagemaker_session._region_name)
print("image_uri: ", image_uri)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
sagemaker_session:  <sagemaker.session.Session object at 0x7f5f05211e10>
aws_role:  arn:aws:iam::216537167580:role/sagemaker_notebook_role
aws_region:  us-east-1
image_uri:  763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.22.1-deepspeed0.9.2-cu118


### <a name="step6.2">Step 5.2: Create the model artifact</a> ###


In [25]:
%%bash
rm -rf lora_model
mkdir -p lora_model
mkdir -p lora_model/dolly-3b-lora
cp diy-dolly-3b-lora/adapter_config.json lora_model/dolly-3b-lora/
cp diy-dolly-3b-lora/adapter_model.bin lora_model/dolly-3b-lora/

In [26]:
%%writefile lora_model/serving.properties
engine=Python
option.entryPoint=model.py
option.adapter_checkpoint=dolly-3b-lora
option.adapter_name=dolly-lora

Writing lora_model/serving.properties


In [27]:
%%writefile lora_model/requirements.txt
transformers==4.27.4
accelerate>=0.20.3,<1
peft==0.3.0

Writing lora_model/requirements.txt


### <a name="step5.3">Step 5.3: Create the inference script</a>


In [28]:
%%bash
cp utils/deployment_model.py lora_model/model.py

### <a name="step5.4">Step 5.4: Upload the model artifact to S3</a>


In [29]:
%%bash
tar -cvzf diy_lora_model.tar.gz lora_model/

lora_model/
lora_model/model.py
lora_model/serving.properties
lora_model/dolly-3b-lora/
lora_model/dolly-3b-lora/adapter_config.json
lora_model/dolly-3b-lora/adapter_model.bin
lora_model/requirements.txt


In [30]:
import boto3
import json
import sagemaker.djl_inference
from sagemaker.session import Session
from sagemaker import image_uris
from sagemaker import Model

s3 = boto3.resource('s3')
s3_client = boto3.client('s3')

s3 = boto3.resource('s3')

# Get the name of the bucket with prefix lab-code
for bucket in s3.buckets.all():
    if bucket.name.startswith('artifact'):
        mybucket = bucket.name
        print(mybucket)
    
response = s3_client.upload_file("diy_lora_model.tar.gz", mybucket, "diy_lora_model.tar.gz")

artifact-c61072b0


### <a name="step5.5">Step 5.5: Deploy the Model</a> ###


In [31]:
from time import gmtime, strftime
timestamp_prefix = strftime("%Y-%m-%d-%H-%M-%S", gmtime())

model_data="s3://{}/diy_lora_model.tar.gz".format(mybucket)
model_name=f"diy-model-{timestamp_prefix}"

model = Model(image_uri=image_uri,
              name = model_name,
              model_data=model_data,
              predictor_cls=sagemaker.djl_inference.DJLPredictor,
              role=aws_role)

Note: **The deployment should finish within 10 minutes. If it took longer than that, your endpoint may be failed.**

In [None]:
%%time

#Define the unique name for the endpoint
endpoint_name = f"diy-endpoint-{timestamp_prefix}"

predictor = model.deploy(1, "ml.g4dn.2xlarge", endpoint_name=endpoint_name )