In [1]:
from google.colab import drive
drive.mount('/content/drive')
%cd "/content/drive/MyDrive/685Project"

Mounted at /content/drive
/content/drive/MyDrive/685Project


To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + support us if you can!
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).

In [2]:
%%capture
import torch
major_version, minor_version = torch.cuda.get_device_capability()
# Must install separately since Colab has torch 2.2.1, which breaks packages
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
if major_version >= 8:
    # Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
    !pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes
else:
    # Use this for older GPUs (V100, Tesla T4, RTX 20xx)
    !pip install --no-deps xformers trl peft accelerate bitsandbytes
pass

* We support Llama, Mistral, CodeLlama, TinyLlama, Vicuna, Open Hermes etc
* And Yi, Qwen ([llamafied](https://huggingface.co/models?sort=trending&search=qwen+llama)), Deepseek, all Llama, Mistral derived archs.
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* With [PR 26037](https://github.com/huggingface/transformers/pull/26037), we support downloading 4bit models **4x faster**! [Our repo](https://huggingface.co/unsloth) has Llama, Mistral 4bit models.
* [**NEW**] We make Gemma 6 trillion tokens **2.5x faster**! See our [Gemma notebook](https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing)

In [3]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
    "unsloth/llama-2-7b-bnb-4bit",
    "unsloth/llama-2-13b-bnb-4bit",
    "unsloth/codellama-34b-bnb-4bit",
    "unsloth/tinyllama-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit", # New Google 6 trillion tokens model 2.5x faster!
    "unsloth/gemma-2b-bnb-4bit",
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "codellama/CodeLlama-7b-Instruct-hf", # Choose ANY! eg mistralai/Mistral-7B-Instruct-v0.2
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

    PyTorch 2.3.0+cu121 with CUDA 1201 (you have 2.2.1+cu121)
    Python  3.10.14 (you have 3.10.12)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details


config.json:   0%|          | 0.00/646 [00:00<?, ?B/s]

==((====))==  Unsloth: Fast Llama patching release 2024.5
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.2.1+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = True.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

codellama/CodeLlama-7b-Instruct-hf does not have a padding token! Will use pad_token = <unk>.


We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [4]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.5 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


### Just Inference



In [None]:
# !pip install -q trl xformers wandb datasets einops gradio sentencepiece bitsandbytes

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, HfArgumentParser, TrainingArguments, pipeline, logging, TextStreamer
# #from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
# import os,torch, wandb, platform, gradio, warnings
# !pip install datasets
# from datasets import load_dataset, Dataset
# # from trl import SFTTrainer
# from huggingface_hub import login
# login(token="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",add_to_git_credential=True)

In [None]:
# instruct_model_name = "codellama/CodeLlama-7b-Instruct-hf"

In [None]:
# instruct_model = AutoModelForCausalLM.from_pretrained(instruct_model_name, use_auth_token=True, device_map = 'auto')

In [None]:
# instruct_tokenizer = AutoTokenizer.from_pretrained(instruct_model_name, trust_remote_code=True)
# EOS_TOKEN = instruct_tokenizer.eos_token

In [None]:
# question = "A car is being driven, in a straight line and at a uniform speed, towards the base of a vertical tower. The top of the tower is observed from the car and, in the process, it takes 10 minutes for the angle of elevation to change from 45° to 60°. After how much more time will this car reach the base of the tower?"
# options = [ "A)5(√3 + 1)", "B)6(√3 + √2)", "C)7(√3 – 1)", "D)8(√3 – 2)", "E)None of these" ]
# runtimeFlag = "cuda:0"
# prompt_prefix = "Calculate the answer for the following math problem and return the option that matches with the answer in the options list."
# # zero_shot_system_prompt = 'Given mathematical problem question and the options for the answer. Please provide the correct answer among the options A, B, C, D, or E.\n\n'
# # cot_system_prompt = 'Given mathematical problem question, rationale for the solution and the options for the answer. Please provide the correct answer among the options A, B, C, D, or E.\n\n'
# Q_INST, O_INST, R_INST,  C_INST = "Question: ", "Options: ", "### Rationale:\n", "### Correct Option:\n"

# prompt = f"{prompt_prefix}\n{Q_INST}{question.strip()}\n{O_INST}{options} \n"
# print(f'prompt is {prompt}')
# inputs = instruct_tokenizer([prompt], return_tensors="pt").to(runtimeFlag)

# generation_parameters = {
#     "max_length": 512,        # Corresponds to Output Length
#     "temperature": 0.7,       # Temperature for randomness
#     "top_p": 0.7,             # Nucleus sampling (Top-P)
#     "top_k": 50,              # Top-K sampling
#     "repetition_penalty": 1   # Repetition penalty (usually greater than 1 to reduce repetition)
# }
# streamer = TextStreamer(instruct_tokenizer, skip_prompt=True, skip_special_tokens=True)
# output = instruct_model.generate(**inputs, **generation_parameters)
# response = instruct_tokenizer.decode(output[0], skip_special_tokens=True)
# print(f'Model response is: {response} ')

### Inference with qlora model

In [None]:
# question = "A car is being driven, in a straight line and at a uniform speed, towards the base of a vertical tower. The top of the tower is observed from the car and, in the process, it takes 10 minutes for the angle of elevation to change from 45° to 60°. After how much more time will this car reach the base of the tower?"
# options = [ "A)5(√3 + 1)", "B)6(√3 + √2)", "C)7(√3 – 1)", "D)8(√3 – 2)", "E)None of these" ]
# runtimeFlag = "cuda:0"
# prompt_prefix = "Calculate the answer for the following math problem and return the option that matches with the answer in the options list."
# # zero_shot_system_prompt = 'Given mathematical problem question and the options for the answer. Please provide the correct answer among the options A, B, C, D, or E.\n\n'
# # cot_system_prompt = 'Given mathematical problem question, rationale for the solution and the options for the answer. Please provide the correct answer among the options A, B, C, D, or E.\n\n'
# Q_INST, O_INST, R_INST,  C_INST = "Question: ", "Options: ", "### Rationale:\n", "### Correct Option:\n"
from transformers import TextStreamer

runtimeFlag = "cuda:0"
llama_prompt1 = """You are an mathematical assistant that helps users with the Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list. \n\n.
###Input
Question: {}
Options: {}

### Output
Correct answer option: The correct answer option is {}"""

llama_prompt2 = """You are an mathematical assistant that helps users with the Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list. \n\n.
###Input
Question: {}
Options: {}

### Output
Rationale: The rationale is {}"""

question = "Two friends plan to walk along a 43-km trail, starting at opposite ends of the trail at the same time. If Friend P's rate is 15% faster than Friend Q's, how many kilometers will Friend P have walked when they pass each other?", # question
options = "A)21, B)21.5, C)22, D)22.5, E)23", # input, # output - leave this blank for generation!
rationale = "If Q complete x kilometers, then P completes 1.15x kilometers. x + 1.15x = 43 2.15x=43 x = 43/2.15 = 20 Then P will have have walked 1.15*20=23 km. The answer is E.	"


prompt = llama_prompt1.format(question, options, rationale)
print(f'prompt is {prompt}')
inputs = tokenizer([prompt], return_tensors="pt").to(runtimeFlag)

generation_parameters = {
    "max_length": 512,        # Corresponds to Output Length
    "temperature": 0.7,       # Temperature for randomness
    "top_p": 0.7,             # Nucleus sampling (Top-P)
    "top_k": 50,              # Top-K sampling
    "repetition_penalty": 1   # Repetition penalty (usually greater than 1 to reduce repetition)
}
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
output = model.generate(**inputs, **generation_parameters)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(f'Model response is: {response} ')

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


prompt is You are an mathematical assistant that helps users with the Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list. 

.
###Input
Question: ("Two friends plan to walk along a 43-km trail, starting at opposite ends of the trail at the same time. If Friend P's rate is 15% faster than Friend Q's, how many kilometers will Friend P have walked when they pass each other?",)
Options: ('A)21, B)21.5, C)22, D)22.5, E)23',)

### Output
Correct answer option: The correct answer option is If Q complete x kilometers, then P completes 1.15x kilometers. x + 1.15x = 43 2.15x=43 x = 43/2.15 = 20 Then P will have have walked 1.15*20=23 km. The answer is E.	
Model response is: You are an mathematical assistant that helps users with the Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that

### Data Prep
We now use the Alpaca dataset from [yahma](https://huggingface.co/datasets/yahma/alpaca-cleaned), which is a filtered version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html). You can replace this code section with your own data prep.

**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).

**[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!

If you want to use the `ChatML` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing).

For text completions like novel writing, try this [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing).

In [5]:
import datasets
from datasets import load_dataset, Dataset
llama_prompt1 = """You are an mathematical assistant that helps users with the Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list. \n\n.
###Input
Question: {}
Options: {}

### Output
Correct answer option: The correct answer option is {}"""

llama_prompt2 = """You are an mathematical assistant that helps users with the Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list. \n\n.
###Input
Question: {}
Options: {}

### Output
Rationale: The rationale is {}"""

def preprocess_dataset(examples):
    dataset_list = examples.to_dict()
    new_list = {'question': [], 'options': [], 'rationale': [], 'correct': []}
    for i in dataset_list['question']:
        new_list['question'].append(i)
        new_list['question'].append(i)
    for i in dataset_list['options']:
        new_list['options'].append(i)
        new_list['options'].append(i)
    for i in dataset_list['rationale']:
        new_list['rationale'].append(i)
        new_list['rationale'].append(i)
    for i in dataset_list['correct']:
        new_list['correct'].append(i)
        new_list['correct'].append(i)
    dataset_list = new_list
    return datasets.Dataset.from_dict(dataset_list)

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples, flag = 1):

    questions = examples["question"]
    options       = examples["options"]
    rationale = examples["rationale"]
    true_labels      = examples["correct"]
    texts = []
    for i, (question, opts, output, rationale) in enumerate(zip(questions, options, true_labels, rationale)):
        if i%2==0:
            text = llama_prompt1.format(question, opts, output) + EOS_TOKEN
            texts.append(text)
        else:
            text = llama_prompt2.format(question, opts, rationale) + EOS_TOKEN
            texts.append(text)
    return {"text": texts,}
pass

training_dataset = load_dataset("aqua_rat", split = "train") #TODO: change this when doing it for the final thing.
validation_dataset = load_dataset("aqua_rat", split="validation")
test_dataset = load_dataset("aqua_rat", split="test")

training_dataset = preprocess_dataset(training_dataset)
validation_dataset = preprocess_dataset(validation_dataset)
test_dataset = preprocess_dataset(test_dataset)

training_dataset = training_dataset.map(formatting_prompts_func, batched=True)
validation_dataset = validation_dataset.map(formatting_prompts_func, batched=True)
test_dataset = test_dataset.map(formatting_prompts_func, batched=True)

Downloading readme:   0%|          | 0.00/5.89k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/25.4M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/74.0k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/76.1k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/97467 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/254 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/254 [00:00<?, ? examples/s]

Map:   0%|          | 0/194934 [00:00<?, ? examples/s]

Map:   0%|          | 0/508 [00:00<?, ? examples/s]

Map:   0%|          | 0/508 [00:00<?, ? examples/s]

In [6]:
print(type(training_dataset))
i = 0
for batch in training_dataset:
    print(batch)
    i+=1
    if i>3:
        break

<class 'datasets.arrow_dataset.Dataset'>
{'question': "Two friends plan to walk along a 43-km trail, starting at opposite ends of the trail at the same time. If Friend P's rate is 15% faster than Friend Q's, how many kilometers will Friend P have walked when they pass each other?", 'options': ['A)21', 'B)21.5', 'C)22', 'D)22.5', 'E)23'], 'rationale': 'If Q complete x kilometers, then P completes 1.15x kilometers.\nx + 1.15x = 43\n2.15x=43\nx = 43/2.15 = 20\nThen P will have have walked 1.15*20=23 km.\nThe answer is E.', 'correct': 'E', 'text': "You are an mathematical assistant that helps users with the Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list. \n\n.\n###Input\nQuestion: Two friends plan to walk along a 43-km trail, starting at opposite ends of the trail at the same time. If Friend P's rate is 15% faster than Friend Q's, how many kilometers will F

In [7]:
dataset_size = len(training_dataset)
print(f'length of the dataset is {dataset_size}')
num_epochs = 5
batch_size = 10
grad_accumulation = 10
desired_checkpoints = 250

max_steps = (dataset_size // (batch_size*grad_accumulation))*num_epochs
save_interval = max_steps // desired_checkpoints

print(f'max steps is {max_steps}')
print(f'save interval is {save_interval}')

length of the dataset is 194934
max steps is 9745
save interval is 38


In [8]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.564 GB.
4.064 GB of memory reserved.


In [9]:
!pip install rouge_score

Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24933 sha256=1ae10d2a8e46ccf498766f66fc22cc636987e92224adc29993d4073207f507eb
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge_score
Installing collected packages: rouge_score
Successfully installed rouge_score-0.1.2


### Evaluation Metrics

In [10]:
from datasets import load_metric
import numpy as np
from sklearn.metrics import accuracy_score, f1_score

# Load metrics
rouge = load_metric('rouge')
bleu = load_metric('bleu')
meteor = load_metric('meteor')

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    results = {
        "accuracy": 0, "f1": 0, "rouge1": 0, "rouge2": 0, "rougeL": 0, "bleu": 0, "meteor": 0
    }

    if not predictions or not labels:
        print("Empty predictions or labels.")
        return results

    # Decode tokens to strings
    # predictions = [tokenizer.decode(pred, skip_special_tokens=True) if pred else "" for pred in predictions]
    # labels = [tokenizer.decode(label, skip_special_tokens=True) if label else "" for label in labels]

    predictions = [pred.split() for pred in predictions]
    labels = [[label.split()] for label in labels]  # Nested list for each reference


    # Try to compute each metric individually and catch exceptions to pinpoint the issue
    try:
        rouge_result = rouge.compute(predictions=predictions, references=labels)
        results.update({
            "rouge1": rouge_result['rouge1'].mid.fmeasure,
            "rouge2": rouge_result['rouge2'].mid.fmeasure,
            "rougeL": rouge_result['rougeL'].mid.fmeasure
        })
    except Exception as e:
        print(f"Error computing ROUGE: {e}")

    try:
        bleu_result = bleu.compute(predictions=[pred.split() for pred in predictions], references=[[label.split()] for label in labels])
        results['bleu'] = bleu_result['bleu']
    except Exception as e:
        print(f"Error computing BLEU: {e}")

    try:
        meteor_result = meteor.compute(predictions=predictions, references=labels)
        results['meteor'] = meteor_result['meteor']
    except Exception as e:
        print(f"Error computing METEOR: {e}")

    # try:
    #     # Calculate accuracy and F1 score
    #     pred_tokens = [tokenizer.encode(pred) for pred in predictions]
    #     label_tokens = [tokenizer.encode(label) for label in labels]

  ###TODO###
  #once the model is fully trained, we can write the custom logic to extract answer and use that to generate this. Current

    #     flat_pred_tokens = [item for sublist in pred_tokens for item in sublist]
    #     flat_label_tokens = [item for sublist in label_tokens for item in sublist]
    #     results['accuracy'] = accuracy_score(flat_label_tokens, flat_pred_tokens)
    #     results['f1'] = f1_score(flat_label_tokens, flat_pred_tokens, average='weighted')
    # except Exception as e:
    #     print(f"Error computing Accuracy or F1: {e}")

    return results

  rouge = load_metric('rouge')
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


Downloading builder script:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


Downloading builder script:   0%|          | 0.00/2.48k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


Downloading builder script:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [11]:
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = training_dataset,
    # eval_dataset = validation_dataset, #TODO: commenting this until the last stage.
    compute_metrics = compute_metrics,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = batch_size,
        gradient_accumulation_steps = grad_accumulation,
        warmup_steps = 5,
        max_steps = max_steps,
        save_steps = save_interval,
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs_big",
        save_strategy = "steps",
        num_train_epochs = num_epochs,
        report_to="none",
    ),
)

  self.pid = os.fork()


Map (num_proc=2):   0%|          | 0/194934 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [12]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.564 GB.
4.064 GB of memory reserved.


In [13]:
trainer_stats = trainer.train(resume_from_checkpoint=True,)

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 194,934 | Num Epochs = 5
O^O/ \_/ \    Batch size per device = 10 | Gradient Accumulation steps = 10
\        /    Total batch size = 100 | Total steps = 9,745
 "-____-"     Number of trainable parameters = 39,976,960
	save_steps: 38 (from args) != 7 (from trainer_state.json)


Step,Training Loss
8968,0.2097
8969,0.218
8970,0.2138
8971,0.1805
8972,0.2291
8973,0.185
8974,0.2224
8975,0.2081
8976,0.2006
8977,0.1793




In [None]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

108.1635 seconds used for training.
1.8 minutes used for training.
Peak reserved memory = 4.812 GB.
Peak reserved memory for training = 0.691 GB.
Peak reserved memory % of max memory = 21.707 %.
Peak reserved memory for training % of max memory = 3.117 %.


### Custom evaluation

In [None]:
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from datasets import load_dataset

runtimeFlag = "cuda:0"
# Define prompt template and generation settings
llama_prompt = """You are an mathematical assistant that helps users with the Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list. \n\n.
###Input
Question: {}
Options: {}

### Output
Rationale:{}
Correct answer:{}"""

llama_prompt_inference = """You are a mathematical assistant that helps users with Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list.\n\n
### Input\nQuestion: {}\nOptions: {}\n
### Output"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
generation_parameters = {
    "max_length": 512,
    "temperature": 0.7,
    "top_p": 0.7,
    "top_k": 50,
    "repetition_penalty": 1,
}

def formatting_prompts_func(examples):

    questions = examples["question"]
    options       = examples["options"]
    rationales = examples["rationale"]
    true_labels      = examples["correct"]

    inputs, texts = [], []
    for question, opts, rationale, output in zip(questions, options, rationales, true_labels):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        orig_text = llama_prompt.format(question, opts, rationale, output) + EOS_TOKEN
        inference_input = llama_prompt_inference.format(question, opts)
        # print(f'text is {text}')
        texts.append(orig_text)
        inputs.append(inference_input)
    return { "text" : texts, "input": inputs}

def custom_evaluate(dataset):
    # Tokenize prompts in batch
    inputs = tokenizer(dataset["input"], padding=True, truncation=True, return_tensors="pt", max_length=512).to(runtimeFlag)

    # Generate responses in batch
    outputs = model.generate(**inputs, **generation_parameters)

    # Decode all responses
    predictions = [tokenizer.decode(gen, skip_special_tokens=True) for gen in outputs]

    # Evaluating using custom compute_metrics
    eval_pred = (predictions, dataset["text"])  # Assuming 'text' includes the correct reference
    print(f'eval_pred is {eval_pred}')
    results = compute_metrics(eval_pred)
    return results

# Load and process dataset
training_dataset = load_dataset("aqua_rat", split="train[:10]")
training_dataset = training_dataset.map(formatting_prompts_func, batched=True)
evaluation_results = custom_evaluate(training_dataset)
print(evaluation_results)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


eval_pred is (["You are a mathematical assistant that helps users with Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list.\n\n\n### Input\nQuestion: Two friends plan to walk along a 43-km trail, starting at opposite ends of the trail at the same time. If Friend P's rate is 15% faster than Friend Q's, how many kilometers will Friend P have walked when they pass each other?\nOptions: ['A)21', 'B)21.5', 'C)22', 'D)22.5', 'E)23']\n\n### Output Calculate the answer: C\nRationale: [21]\n\n### Output\nIn a recent trip, Friend and his wife, and he was driving with his wife, and he was a passenger.\n\n1. A man and his wife were sitting in the backseat, and he was driving with his wife, and he was driving with his wife, and he was driving with his wife, and he was driving with his wife, and he was driving with his wife, and he was driving with his wife, and he was dr

### Check overfitting model on small **set**

In [None]:
test_results = trainer.evaluate(training_dataset)
print(test_results)

AttributeError: 'NoneType' object has no attribute 'get'

<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

In [None]:
llama_inference_prompt = """You are an mathematical assistant that helps users with the Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list. \n\n.
###Input
Question: {}
Options: {}

### Output
Rationale:"""
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    llama_inference_prompt.format(
        "Two friends plan to walk along a 43-km trail, starting at opposite ends of the trail at the same time. If Friend P's rate is 15% faster than Friend Q's, how many kilometers will Friend P have walked when they pass each other?", # question
        "A)21, B)21.5, C)22, D)22.5, E)23", # input, # output - leave this blank for generation!
        # "If Q complete x kilometers, then P completes 1.15x kilometers. x + 1.15x = 43 2.15x=43 x = 43/2.15 = 20 Then P will have have walked 1.15*20=23 km. The answer is E.	"
    )
], return_tensors = "pt").to("cuda")
generation_parameters = {
    "max_length": 512,        # Corresponds to Output Length
    "temperature": 0.7,       # Temperature for randomness
    "top_p": 0.7,             # Nucleus sampling (Top-P)
    "top_k": 50,              # Top-K sampling
    "repetition_penalty": 1   # Repetition penalty (usually greater than 1 to reduce repetition)
}
from transformers import TextStreamer
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
outputs = model.generate(**inputs, **generation_parameters, max_new_tokens = 64, use_cache = True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f'Model response is: {response} ')

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Both `max_new_tokens` (=64) and `max_length`(=512) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Model response is: You are an mathematical assistant that helps users with the Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list. 

.
###Input
Question: Two friends plan to walk along a 43-km trail, starting at opposite ends of the trail at the same time. If Friend P's rate is 15% faster than Friend Q's, how many kilometers will Friend P have walked when they pass each other?
Options: A)21, B)21.5, C)22, D)22.5, E)23

### Output
Rationale:['If Q complete x kilometers, then P completes 1.15x kilometers.\nx + 1.15x = 43\n2.15x=43\nx = 43/2.15 = 20\nThen P will have 


 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    llama_inference_prompt.format(
        "Two friends plan to walk along a 43-km trail, starting at opposite ends of the trail at the same time. If Friend P's rate is 15% faster than Friend Q's, how many kilometers will Friend P have walked when they pass each other?", # question
        "A)21, B)21.5, C)22, D)22.5, E)23", # input # output - leave this blank for generation!,
        # "If Q complete x kilometers, then P completes 1.15x kilometers. x + 1.15x = 43 2.15x=43 x = 43/2.15 = 20 Then P will have have walked 1.15*20=23 km. The answer is E.	",
    ) + EOS_TOKEN
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


<s> You are an mathematical assistant that helps users with the Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list. 

.
###Input
Question: Two friends plan to walk along a 43-km trail, starting at opposite ends of the trail at the same time. If Friend P's rate is 15% faster than Friend Q's, how many kilometers will Friend P have walked when they pass each other?
Options: A)21, B)21.5, C)22, D)22.5, E)23

### Output
Rationale:</s><s> #include "stdafx.h"
#include "CppUnitTest.h"
#include "../CppUnitTestFramework/MyMath.h"

using namespace Microsoft::VisualStudio::CppUnitTestFramework;

namespace MyMathTests
{
	TEST_CLASS(MyMathTests)
	{
	public:
		TEST_METHOD(Addition)
		{
			Assert::AreEqual(5, MyMath::Add(2, 3));
		}

		TEST_METHOD


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("lora_model_finetuned_4") # Local saving
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving



Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [None]:
if True:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model_finetuned_3", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        # load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# alpaca_prompt = You MUST copy from above!
llama_inference_prompt = """You are an mathematical assistant that helps users with the Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list. \n\n.
###Input
Question: {}
Options: {}

### Output
Rationale:{}
Correct answer:"""
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    llama_inference_prompt.format(
        "Two friends plan to walk along a 43-km trail, starting at opposite ends of the trail at the same time. If Friend P's rate is 15% faster than Friend Q's, how many kilometers will Friend P have walked when they pass each other?", # question
        "A)21, B)21.5, C)22, D)22.5, E)23", # input, # output - leave this blank for generation!
        "If Q complete x kilometers, then P completes 1.15x kilometers. x + 1.15x = 43 2.15x=43 x = 43/2.15 = 20 Then P will have have walked 1.15*20=23 km. The answer is E.	"
    )
], return_tensors = "pt").to("cuda")
generation_parameters = {
    "max_length": 512,        # Corresponds to Output Length
    "temperature": 0.7,       # Temperature for randomness
    "top_p": 0.7,             # Nucleus sampling (Top-P)
    "top_k": 50,              # Top-K sampling
    "repetition_penalty": 1   # Repetition penalty (usually greater than 1 to reduce repetition)
}
from transformers import TextStreamer
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
outputs = model.generate(**inputs, **generation_parameters, max_new_tokens = 64, use_cache = True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f'response is {response}')
# print(f'Model response is: {response} ')
# inputs = tokenizer(
# [
#     alpaca_prompt.format(
#         "What is a famous tall tower in Paris?", # instruction
#         "", # input
#         "", # output - leave this blank for generation!
#     )
# ], return_tensors = "pt").to("cuda")

# from transformers import TextStreamer
# text_streamer = TextStreamer(tokenizer)
# _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)



==((====))==  Unsloth: Fast Llama patching release 2024.4
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.2.1+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = True.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Both `max_new_tokens` (=64) and `max_length`(=512) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


response is You are an mathematical assistant that helps users with the Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list. 

.
###Input
Question: Two friends plan to walk along a 43-km trail, starting at opposite ends of the trail at the same time. If Friend P's rate is 15% faster than Friend Q's, how many kilometers will Friend P have walked when they pass each other?
Options: A)21, B)21.5, C)22, D)22.5, E)23

### Output
Rationale:If Q complete x kilometers, then P completes 1.15x kilometers. x + 1.15x = 43 2.15x=43 x = 43/2.15 = 20 Then P will have have walked 1.15*20=23 km. The answer is E.	
Correct answer:E

### Input
Question: A man can row 20 kmph in still water, but due to the stream, it takes him 2 hours to cover 60 km. What is the speed of the stream in kmph?
Options: A)12, B)13


In [None]:
llama_inference_prompt = """You are an mathematical assistant that helps users with the Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list. \n\n.
###Input
Question: {}
Options: {}

### Output
Correct answer:"""
inputs = tokenizer(
[
    llama_inference_prompt.format(
        "A train 280 m long passed a pole in 28 sec. How long will it take to pass a platform 660 m long?", # question
        "A)52, B)94, C)71, D)68, E)88", # input, # output - leave this blank for generation!
        # "	Speed = 280/28 = 10 m/sec. Required time = (280 + 660)/10 = 94 sec.	"
    )
], return_tensors = "pt").to("cuda")
generation_parameters = {
    "max_length": 512,        # Corresponds to Output Length
    "temperature": 0.7,       # Temperature for randomness
    "top_p": 0.7,             # Nucleus sampling (Top-P)
    "top_k": 50,              # Top-K sampling
    "repetition_penalty": 1.5   # Repetition penalty (usually greater than 1 to reduce repetition)
}
from transformers import TextStreamer
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
outputs = model.generate(**inputs, **generation_parameters, max_new_tokens = 64, use_cache = True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f'response is {response}')

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Both `max_new_tokens` (=64) and `max_length`(=512) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


response is You are an mathematical assistant that helps users with the Algebra questions. Calculate the answer for the following math problem and return the rationale used for solving this and the option that matches with the answer in the options list. 

.
###Input
Question: A train 280 m long passed a pole in 28 sec. How long will it take to pass a platform 660 m long?
Options: A)52, B)94, C)71, D)68, E)88

### Output
Correct answer: Option (B)", Rs.(3*x+y)/(x-y)"Ratio of two numbers is defined as the first number divided by the second Number . If we have been given just one number and we want to find the other ,we need to use this ratio . But if we are given both


You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [None]:
if False:
    # I highly do NOT suggest - use Unsloth if possible
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
        "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit = load_in_4bit,
    )
    tokenizer = AutoTokenizer.from_pretrained("lora_model")

# Evaluation

In [None]:
test_results = trainer.evaluate(test_dataset)
print(test_results)

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# Merge to 16bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

In [None]:
# Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in `llama.cpp` or a UI based system like `GPT4All`. You can install GPT4All by going [here](https://gpt4all.io/index.html).

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/u54VK8m8tk) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Zephyr DPO 2x faster [free Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing)
2. Mistral 7b 2x faster [free Colab](https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing)
3. TinyLlama 4x faster full Alpaca 52K in 1 hour [free Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)
4. CodeLlama 34b 2x faster [A100 on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing)
5. Mistral 7b [free Kaggle version](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook)
6. We also did a [blog](https://huggingface.co/blog/unsloth-trl) with 🤗 HuggingFace, and we're in the TRL [docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth)!
7. `ChatML` for ShareGPT datasets, [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing)
8. Text completions like novel writing [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing)
9. Gemma 6 trillion tokens is 2.5x faster! [free Colab](https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing)

<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Support our work if you can! Thanks!
</div>