# Final Project
### Fine tuning of a Mistral 7B Model
---
Members:
- Bastian Castillo (C0872284)
- Fadernel Bedoya (C0872455)
- Marcelo Munoz (C0873813)
- Suyog Adhikari (C0880973)

**Goal**:

The goal of this proyect is to fine tune a pretrained model be able to sove basics tasks given as instructions through a chatbot interface to interact with it. This interface was going to be built using Gradio library and it will be deploy on Hugging Face Spaces. 

(*) Because of the limited computed power and storage resources the dataset used to fine tune the model has only 1K instances. We tried to used some other datasets use for the same purpose, but we had resources during the training and also during deployment.

First of all, the required dependencies are installed in this notebook environment:

In [1]:
!python -m venv venv

In [1]:
!source venv/bin/activate

In [3]:
!venv/bin/pip install autotrain-advanced



In [16]:
!venv/bin/pip install datasets transformers



In [5]:
from datasets import load_dataset
import pandas as pd

# Load the dataset
train= load_dataset("tatsu-lab/alpaca",split='train[:10%]')
train = pd.DataFrame(train)

Using custom data configuration tatsu-lab--alpaca-2b32f0433506ef5f


Downloading and preparing dataset parquet/tatsu-lab--alpaca to /root/.cache/huggingface/datasets/tatsu-lab___parquet/tatsu-lab--alpaca-2b32f0433506ef5f/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/24.2M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

0 tables [00:00, ? tables/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/tatsu-lab___parquet/tatsu-lab--alpaca-2b32f0433506ef5f/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.


In [6]:
train

Unnamed: 0,instruction,input,output,text
0,Give three tips for staying healthy.,,1.Eat a balanced diet and make sure to include...,Below is an instruction that describes a task....
1,What are the three primary colors?,,"The three primary colors are red, blue, and ye...",Below is an instruction that describes a task....
2,Describe the structure of an atom.,,"An atom is made up of a nucleus, which contain...",Below is an instruction that describes a task....
3,How can we reduce air pollution?,,There are a number of ways to reduce air pollu...,Below is an instruction that describes a task....
4,Describe a time when you had to make a difficu...,,I had to make a difficult decision when I was ...,Below is an instruction that describes a task....
...,...,...,...,...
5195,Write an example of an editorial that discusse...,,Online education is becoming increasingly popu...,Below is an instruction that describes a task....
5196,Classify this scenario as an example of reinfo...,A teacher withholds a student's recess break i...,This scenario is an example of punishment.,"Below is an instruction that describes a task,..."
5197,Create a format for holding a virtual team mee...,,The format of a virtual team meeting should pr...,Below is an instruction that describes a task....
5198,What would you do to improve the quality of cu...,,"To improve the quality of customer service, I ...",Below is an instruction that describes a task....


In [9]:
!mkdir data

In [10]:
def text_formatting(data):

    # If the input column is not empty
    if data['input']:

        text = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{data["instruction"]} \n\n### Input:\n{data["input"]}\n\n### Response:\n{data["output"]}"""

    else:

        text = f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{data["instruction"]}\n\n### Response:\n{data["output"]}"""

    return text

train['text'] = train.apply(text_formatting, axis =1)

In [11]:
train.to_csv('data/train.csv', index = False)

In [12]:
train_chat = train[train['input'] == ''].reset_index(drop = True).copy()

In [13]:
def chat_formatting(data):

  text = f"<s>[INST] {data['instruction']} [/INST] {data['output']} </s>"

  return text

train_chat['text'] = train_chat.apply(chat_formatting, axis =1)
train_chat.to_csv('data/train_chat.csv', index =False)

In [2]:
!venv/bin/autotrain setup

> [1mINFO    Installing latest xformers[0m
> [1mINFO    Successfully installed latest xformers[0m


In [3]:
project_name = 'my_autotrain_llm'
model_name = 'mistralai/Mistral-7B-Instruct-v0.1'

In [4]:
push_to_hub = True
hf_token = "HUGGIN_FACE_ACCESS_TOKEN"
repo_id = "bascr/chatbot"

In [5]:
learning_rate = 2e-4
num_epochs = 4
batch_size = 1
block_size = 1024
trainer = "sft"
warmup_ratio = 0.1
weight_decay = 0.01
gradient_accumulation = 4
use_fp16 = True
use_peft = True
use_int4 = True
lora_r = 16
lora_alpha = 32
lora_dropout = 0.045

In [6]:
import os
os.environ["PROJECT_NAME"] = project_name
os.environ["MODEL_NAME"] = model_name
os.environ["PUSH_TO_HUB"] = str(push_to_hub)
os.environ["HF_TOKEN"] = hf_token
os.environ["REPO_ID"] = repo_id
os.environ["LEARNING_RATE"] = str(learning_rate)
os.environ["NUM_EPOCHS"] = str(num_epochs)
os.environ["BATCH_SIZE"] = str(batch_size)
os.environ["BLOCK_SIZE"] = str(block_size)
os.environ["WARMUP_RATIO"] = str(warmup_ratio)
os.environ["WEIGHT_DECAY"] = str(weight_decay)
os.environ["GRADIENT_ACCUMULATION"] = str(gradient_accumulation)
os.environ["USE_FP16"] = str(use_fp16)
os.environ["USE_PEFT"] = str(use_peft)
os.environ["USE_INT4"] = str(use_int4)
os.environ["LORA_R"] = str(lora_r)
os.environ["LORA_ALPHA"] = str(lora_alpha)
os.environ["LORA_DROPOUT"] = str(lora_dropout)

In [20]:
!venv/bin/autotrain llm \
--train \
--model $MODEL_NAME \
--project-name $PROJECT_NAME \
$( [[ "$PUSH_TO_HUB" == "True" ]] && echo "--push-to-hub --token $HF_TOKEN --repo-id $REPO_ID" ) \
--data-path "data/" \
--text-column "text" \
--lr $LEARNING_RATE \
--batch-size $BATCH_SIZE \
--epochs $NUM_EPOCHS \
--block-size $BLOCK_SIZE \
--warmup-ratio $WARMUP_RATIO \
--lora-r $LORA_R \
--lora-alpha $LORA_ALPHA \
--lora-dropout $LORA_DROPOUT \
--weight-decay $WEIGHT_DECAY \
--gradient-accumulation $GRADIENT_ACCUMULATION \
$( [[ "$USE_FP16" == "True" ]] && echo "--fp16" ) \
$( [[ "$USE_PEFT" == "True" ]] && echo "--use-peft" ) \
$( [[ "$USE_INT4" == "True" ]] && echo "--use-int4" )

> [1mINFO    Running LLM[0m
> [1mINFO    Params: Namespace(version=False, train=True, deploy=False, inference=False, data_path='data/', train_split='train', valid_split=None, text_column='text', rejected_text_column='rejected', prompt_text_column='prompt', model='mistralai/Mistral-7B-Instruct-v0.1', model_ref=None, learning_rate=0.0002, num_train_epochs=4, train_batch_size=1, warmup_ratio=0.1, gradient_accumulation_steps=4, optimizer='adamw_torch', scheduler='linear', weight_decay=0.01, max_grad_norm=1.0, seed=42, add_eos_token=False, block_size=1024, use_peft=True, lora_r=16, lora_alpha=32, lora_dropout=0.045, logging_steps=-1, project_name='my_autotrain_llm', evaluation_strategy='epoch', save_total_limit=1, save_strategy='epoch', auto_find_batch_size=False, fp16=True, push_to_hub=True, use_int8=False, model_max_length=1024, repo_id='bascr/chatbot', use_int4=True, trainer='default', target_modules=None, merge_adapter=False, token='', backend='default', username=None, use_flash_atte

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "./my_autotrain_llm"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Generalize

In [3]:
input_text = "Give three tips for staying healthy."
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids, max_new_tokens = 200)
predicted_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(predicted_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Give three tips for staying healthy.

1. Eat a balanced diet: Make sure to include plenty of fruits, vegetables, whole grains, lean proteins, and healthy fats in your diet. This will help you get the nutrients you need to stay healthy and strong.

2. Exercise regularly: Aim for at least 30 minutes of moderate exercise every day. This can be anything from walking to swimming to weightlifting. Exercise helps to keep your body fit and healthy, and it also helps to reduce stress.

3. Get enough sleep: Make sure to get at least 7-8 hours of quality sleep every night. Sleep is essential for physical and mental health, and it helps to reduce the risk of chronic diseases. Make sure to create a comfortable sleep environment and stick to a regular sleep schedule. Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Generate a list


### Push model to Hugginface repository

https://huggingface.co/bascr/chatbot/tree/main

In [5]:
!venv/bin/pip install ipywidgets



In [2]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [6]:
model.push_to_hub(repo_id)



CommitInfo(commit_url='https://huggingface.co/bascr/chatbot/commit/ec27cdc0e47a754a716667cbf55871a760ee4343', commit_message='Upload MistralForCausalLM', commit_description='', oid='ec27cdc0e47a754a716667cbf55871a760ee4343', pr_url=None, pr_revision=None, pr_num=None)

### Compress model folder to download

In [5]:
!tar -czvf model.tar.gz my_autotrain_llm

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


my_autotrain_llm/
my_autotrain_llm/tokenizer.model
my_autotrain_llm/README.md
my_autotrain_llm/training_args.bin
my_autotrain_llm/tokenizer.json
my_autotrain_llm/adapter_config.json
my_autotrain_llm/tokenizer_config.json
my_autotrain_llm/special_tokens_map.json
my_autotrain_llm/.ipynb_checkpoints/
my_autotrain_llm/.ipynb_checkpoints/tokenizer_config-checkpoint.json
my_autotrain_llm/checkpoint-596/
my_autotrain_llm/checkpoint-596/trainer_state.json
my_autotrain_llm/checkpoint-596/tokenizer.model
my_autotrain_llm/checkpoint-596/pytorch_model.bin
my_autotrain_llm/checkpoint-596/README.md
my_autotrain_llm/checkpoint-596/training_args.bin
my_autotrain_llm/checkpoint-596/adapter_model.bin
my_autotrain_llm/checkpoint-596/optimizer.pt
my_autotrain_llm/checkpoint-596/tokenizer.json
my_autotrain_llm/checkpoint-596/adapter_config.json
my_autotrain_llm/checkpoint-596/rng_state.pth
my_autotrain_llm/checkpoint-596/scheduler.pt
my_autotrain_llm/checkpoint-596/tokenizer_config.json
my_autotrain_llm/ch

### Test Model as Chatbot UI with Gradio

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import gradio as gr

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model_name = "bascr/chatbot"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, TOKENIZERS_PARALLELISM=False)
model = AutoModelForCausalLM.from_pretrained(model_name)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [3]:
message_list = []
response_list = []

def chat(message, history):
    message_list.append(message)
    input_ids = tokenizer.encode(message, return_tensors="pt")
    output = model.generate(input_ids, pad_token_id=tokenizer.eos_token_id, max_new_tokens = 200)
    predicted_text = tokenizer.decode(output[0], skip_special_tokens=True)
    response_list.append(predicted_text)
    return response_list[-1]

demo_chatbot = gr.ChatInterface(chat, title="Instruction Chatbot", description="Enter an instruction to start chatting. Because of the limited computing power resources it could take a significant amount of time to get the response.")

demo_chatbot.queue().launch(share=True)

Running on local URL:  http://127.0.0.1:7861


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Running on public URL: https://9d8ee626c12b849a74.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




## Hugging face space link

The model was deployed on the following huggingface space:
