<a href="https://colab.research.google.com/github/azhar3339/Data-Analytics-Udacity/blob/master/22_llama_3_2_fine_tuning_with_torchtune.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!nvidia-smi

Sat Dec  7 19:07:06 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [2]:
!pip install -Uqqq pip --progress-bar off
!pip install -qqq torchtune==0.3.1 --progress-bar off
!pip install -qqq torchao==0.6.1 --progress-bar off
!pip install -qqq transformers==4.46.1 --progress-bar off

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for antlr4-python3-runtime (setup.py) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.9.0 which is incompatible.[0m[31m
[0m

In [3]:
import json
import re
from pathlib import Path
from typing import List

import numpy as np
import pandas as pd
import torch
from google.colab import userdata
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split
from tqdm import tqdm
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

%matplotlib inline
%config InlineBackend.figure_format='retina'


RANDOM_SEED = 42

np.random.seed(RANDOM_SEED)

## Dataset

In [4]:
df = pd.read_csv("finance_training_data_v1.csv")

In [5]:
df.head()

Unnamed: 0,Question,Answer
0,Get average contribution by gender,"{'name': 'get_metric', 'parameters': {'metric'..."
1,Get average premium by gender,"{'name': 'get_metric', 'parameters': {'metric'..."
2,Get burning cost by gender,"{'name': 'get_metric', 'parameters': {'metric'..."
3,Get claims by gender,"{'name': 'get_metric', 'parameters': {'metric'..."
4,Get exposure by gender,"{'name': 'get_metric', 'parameters': {'metric'..."


In [6]:
df.shape

(190, 2)

In [7]:
def create_prompt(question: str):
    prompt = """
You are an expert in composing functions. You are given a question and a set of possible functions.
Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
If none of the functions can be used, point it out. If the given question lacks the parameters required by the function, also point it out. You should only return the function call in the tools call sections.
Avoid making guesses and providing speculative information, or giving incomplete responses.
All the parameters are optional, if any parameter is not required to answer the user's query, leave it empty.
If you decide to invoke any of the function(s), you MUST put it in json format like below:
{
    "name": "get_metric",
    "parameters": {
        "metric": "get_unit_cost",
        "filters": {
            "povider_network": "NW6"
        },
        "groupby": "gender"
    }
}
You SHOULD NOT include any other text in the response.

Here is a list of functions in JSON format that you can invoke:

[
    {
        "name": "get_metric",
        "description": "Retrieve specific metrics based on filters and groupings.",
        "parameters": {
            "type": "dict",
            "required": [],
            "properties": {
                "metric": {
                    "type": "string",
                    "enum": [
                        "get_premium",
                        "get_avg_contribution",
                        "get_unit_cost"
                    ]
                },
                "filters": {
                    "type": "object",
                    "properties": {
                        "stating_period": {
                            "type": "string",
                            "description": "Starting period in YYYYMM format. Defaults to 2024 if year is not mentioned."
                        },
                        "ending_period": {
                            "type": "string",
                            "description": "Ending period in YYYYMM format. Defaults to same value as starting period if not mentioned."
                        },
                        "quarter": {
                            "type": "string",
                            "enum": ["Q1", "Q2", "Q3", "Q4"]
                        },
                        "age": {
                            "type": "integer",
                            "description": "Age of the members."
                        },
                        "gender": {
                            "type": "string",
                            "enum": ["M", "F", ""]
                        },
                        "membership_type": {
                            "type": "string",
                            "enum": ["employee", "wife", "parent", "dependent", "others", ""]
                        },
                        "plan_network": {
                            "type": "string",
                            "enum": ["NW1", "NW2", "NW3", "NW4", "NW5", "NW6", "NW7", "NWS", ""]
                        },
                        "povider_network": {
                            "type": "string",
                            "enum": ["NW1", "NW2", "NW3", "NW4", "NW5", "NW6", "NW7", "NWS", ""]
                        },
                        "povider_name": {
                            "type": "string",
                            "description": "Name of the provider."
                        },
                        "provide_region": {
                            "type": "string",
                            "enum": ["eastern", "western", "central", "northern", "southern", ""]
                        },
                        "benefit_type": {
                            "type": "string",
                            "enum": ["chronic", "standard", "dental", ""]
                        },
                        "claims_type": {
                            "type": "string",
                            "enum": ["I", "O", ""],
                            "description": "I for Inpatient claims, O for Outpatient claims."
                        }
                    }
                },
                "groupby": {
                    "type": "string",
                    "description": "One of the above filters to group the results by."
                }
            }
        }
    }
]

Question:
""".strip()
    return prompt + question

In [8]:
def create_dataset(df):
    rows = []
    for _, row in tqdm(df.iterrows()):
        rows.append(
            {
                "input": create_prompt(row.Question),
                "output": row.Answer.lower(),
            }
        )
    return rows

In [9]:
train_rows = create_dataset(df)

190it [00:00, 8869.33it/s]


In [10]:
Path("train_data.json").write_text(json.dumps(train_rows))

936936

## Download Model

In [11]:
hf_token = userdata.get("HF_TOKEN")

In [12]:
!tune download "meta-llama/Llama-3.2-1B-Instruct" \
  --output-dir "./Llama-3.2-1B-Instruct" \
  --hf-token "{hf_token}" \
  --ignore-patterns "[]"

Ignoring files matching the following patterns: []
Fetching 13 files:   0% 0/13 [00:00<?, ?it/s]
model.safetensors:   0% 0.00/2.47G [00:00<?, ?B/s][A

config.json: 100% 877/877 [00:00<00:00, 5.01MB/s]


USE_POLICY.md: 100% 6.02k/6.02k [00:00<00:00, 29.2MB/s]


README.md:   0% 0.00/41.7k [00:00<?, ?B/s][A[A


LICENSE.txt: 100% 7.71k/7.71k [00:00<00:00, 40.5MB/s]
README.md: 100% 41.7k/41.7k [00:00<00:00, 5.16MB/s]


generation_config.json: 100% 189/189 [00:00<00:00, 1.27MB/s]


.gitattributes: 100% 1.52k/1.52k [00:00<00:00, 10.5MB/s]
Fetching 13 files:   8% 1/13 [00:00<00:02,  4.03it/s]

consolidated.00.pth:   0% 0.00/2.47G [00:00<?, ?B/s][A[A


tokenizer_config.json: 100% 54.5k/54.5k [00:00<00:00, 6.85MB/s]



special_tokens_map.json: 100% 296/296 [00:00<00:00, 2.23MB/s]



tokenizer.json:   0% 0.00/9.09M [00:00<?, ?B/s][A[A[A



tokenizer.model:   0% 0.00/2.18M [00:00<?, ?B/s][A[A[A[A




original/params.json: 100% 220/220 [00:00<00:00, 1.33MB/s]

model.safetensors:   0% 10

## Fine-tuning

In [13]:
config = """
# Model Arguments
model:
  _component_: torchtune.models.llama3_2.lora_llama3_2_1b
  lora_attn_modules: ['q_proj', 'v_proj', 'output_proj']
  apply_lora_to_mlp: True
  apply_lora_to_output: False
  lora_rank: 64
  lora_alpha: 128
  lora_dropout: 0.0

# Tokenizer
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: ./Llama-3.2-1B-Instruct/original/tokenizer.model
  max_seq_len: null

checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: ./Llama-3.2-1B-Instruct
  checkpoint_files: [
    model.safetensors
  ]
  recipe_checkpoint: null
  output_dir: ./checkpoints
  model_type: LLAMA3_2
resume_from_checkpoint: False
save_adapter_weights_only: False

# Dataset and Sampler
dataset:
  _component_: torchtune.datasets.instruct_dataset
  data_files: ./train_data.json
  source: json
  split: train
seed: 42
shuffle: True
batch_size: 1
#batch_size: 4

# Optimizer and Scheduler
optimizer:
  _component_: torch.optim.AdamW
  fused: True
  weight_decay: 0.01
  lr: 3e-4
lr_scheduler:
  _component_: torchtune.modules.get_cosine_schedule_with_warmup
  num_warmup_steps: 100

loss:
  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss

# Training
epochs: 4
max_steps_per_epoch: null
gradient_accumulation_steps: 4
compile: False # set it to True for better memory and performance
# compile: True # set it to True for better memory and performance

# Logging
output_dir: ./logs
metric_logger:
  _component_: torchtune.training.metric_logging.TensorBoardLogger
  log_dir: ${output_dir}
log_every_n_steps: 1
log_peak_memory_stats: False

# Environment
device: cuda
dtype: bf16

# Activations Memory
enable_activation_checkpointing: False
enable_activation_offloading: False

# Profiler (disabled)
profiler:
  _component_: torchtune.training.setup_torch_profiler
  enabled: False

  #Output directory of trace artifacts
  output_dir: ${output_dir}/profiling_outputs

  #`torch.profiler.ProfilerActivity` types to trace
  cpu: True
  cuda: True

  #trace options passed to `torch.profiler.profile`
  profile_memory: False
  with_stack: False
  record_shapes: True
  with_flops: False

  # `torch.profiler.schedule` options:
  # wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles -> repeat
  wait_steps: 5
  warmup_steps: 5
  active_steps: 2
  num_cycles: 1
"""

In [14]:
Path("custom_config.yaml").write_text(config)

2342

In [15]:
!mkdir checkpoints

In [16]:
!tune run lora_finetune_single_device --config "custom_config.yaml" epochs=4

INFO:torchtune.utils._logging:Running LoRAFinetuneRecipeSingleDevice with resolved config:

batch_size: 1
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: ./Llama-3.2-1B-Instruct
  checkpoint_files:
  - model.safetensors
  model_type: LLAMA3_2
  output_dir: ./checkpoints
  recipe_checkpoint: null
compile: false
dataset:
  _component_: torchtune.datasets.instruct_dataset
  data_files: ./train_data.json
  source: json
  split: train
device: cuda
dtype: bf16
enable_activation_checkpointing: false
enable_activation_offloading: false
epochs: 4
gradient_accumulation_steps: 4
log_every_n_steps: 1
log_peak_memory_stats: false
loss:
  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss
lr_scheduler:
  _component_: torchtune.modules.get_cosine_schedule_with_warmup
  num_warmup_steps: 100
max_steps_per_epoch: null
metric_logger:
  _component_: torchtune.training.metric_logging.TensorBoardLogger
  log_dir: ./logs
model:
  _component_: torchtune.mode

## Upload model

In [17]:
!mkdir hf_repo

In [18]:
!cp "checkpoints/hf_model_0001_3.pt" "hf_repo/pytorch_model.bin"
!cp "checkpoints/config.json" "hf_repo/"
!cp "Llama-3.2-1B-Instruct/original/tokenizer.model" "hf_repo/"
!cp "Llama-3.2-1B-Instruct/generation_config.json" "hf_repo/"
!cp "Llama-3.2-1B-Instruct/tokenizer.json" "hf_repo/"
!cp "Llama-3.2-1B-Instruct/tokenizer_config.json" "hf_repo/"
!cp "Llama-3.2-1B-Instruct/special_tokens_map.json" "hf_repo/"
!cp "Llama-3.2-1B-Instruct/LICENSE.txt" "hf_repo/"
!cp "Llama-3.2-1B-Instruct/README.md" "hf_repo/"
!cp "Llama-3.2-1B-Instruct/USE_POLICY.md" "hf_repo/"

In [19]:
MODEL_PATH = "hf_repo"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH).to("cuda")

## Evaluation

In [20]:
pipe = pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=256,
    temperature=0.000001,
    return_full_text=False,
    model_kwargs={"torch_dtype": torch.bfloat16},
    pad_token_id=tokenizer.eos_token_id,
    device="cuda",
)

In [21]:
question = "What is the total premium for march 2023"
user_question = create_prompt(question)
prompt = [
    {"role": "user", "content": user_question},
]
output = pipe(prompt)
output[0]["generated_text"]

"{'name': 'get_metric', 'parameters': {'metric': 'get_premium', 'filters': {'starting_period': '202403', 'ending_period': '202403'}, 'groupby': none}}"

In [None]:
# was wrong

In [22]:
question = "What is the total premium by gender"
user_question = create_prompt(question)
prompt = [
    {"role": "user", "content": user_question},
]
output = pipe(prompt)
output[0]["generated_text"]

"{'name': 'get_metric', 'parameters': {'metric': 'get_premium', 'filters': {}, 'groupby': 'gender'}}"

In [None]:
# was correct

In [23]:
question = "What is the total premium?"
user_question = create_prompt(question)
prompt = [
    {"role": "user", "content": user_question},
]
output = pipe(prompt)
output[0]["generated_text"]

"{'name': 'get_metric', 'parameters': {'metric': 'get_premium', 'filters': {}, 'groupby': none}}"

In [24]:
question = "getme premium by membership type for first quarter"
user_question = create_prompt(question)
prompt = [
    {"role": "user", "content": user_question},
]
output = pipe(prompt)
output[0]["generated_text"]

"{'name': 'get_metric', 'parameters': {'metric': 'get_premium', 'filters': {'quarter': 'q1'}, 'groupby':'membership_type'}}"

In [27]:
question = "get me premium by membership type for female dependents for jan 2024"
user_question = create_prompt(question)
prompt = [
    {"role": "user", "content": user_question},
]
output = pipe(prompt)
output[0]["generated_text"]

"{'name': 'get_metric', 'parameters': {'metric': 'get_premium', 'filters': {'starting_period': '202401', 'ending_period': '202401','membership_type': 'f', 'plan_network': 'nw6'}, 'groupby':'membership_type'}}"

In [26]:


question = "What is the premium from jan 2023 to may 2023 for female dependents by network type"
user_question = create_prompt(question)
prompt = [
    {"role": "user", "content": user_question},
]
output = pipe(prompt)
output[0]["generated_text"]


"{'name': 'get_metric', 'parameters': {'metric': 'get_premium', 'filters': {'starting_period': '202301', 'ending_period': '202305', 'gender': 'f','membership_type': 'dependents'}, 'groupby': 'plan_network'}}"

In [None]:
# was wrong

In [None]:
test_data = json.loads(Path("test_data.json").read_text())

In [None]:
prompt = test_data[4]["input"]
print(prompt)

Classify the text for one of the categories:

<text>
Has anyone been able to get on disability due to just your mental illnesses? I’ve tried three times, sent in all required paperwork and have been denied every time. I have severe depression, severe social and generalized anxiety/slight agoraphobia, bipolar, autism, ptsd and ADHD. I can’t work and it’s ruining my life
</text>

Choose from one of the category:
normal, depression, suicidal, anxiety, bipolar, stress, personality disorder
Only choose one category, the most appropriate one. Reply only with the category.


In [None]:
predictions = []
for example in tqdm(test_data):
    statement = example["input"]
    prompt = [
        {"role": "user", "content": statement},
    ]
    output = pipe(prompt)
    predictions.append(output[0]["generated_text"])

100%|██████████| 5052/5052 [10:29<00:00,  8.03it/s]


In [None]:
pd.Series(predictions).value_counts()

Unnamed: 0,count
suicidal,1017
depression,968
normal,968
anxiety,821
stress,549
bipolar,537
personality disorder,192


In [None]:
eval_df = pd.DataFrame.from_dict({"label": true_values, "prediction": predictions})
eval_df.head()

Unnamed: 0,label,prediction
0,depression,suicidal
1,depression,depression
2,suicidal,suicidal
3,anxiety,anxiety
4,bipolar,bipolar


In [None]:
accuracy_score(true_values, predictions)

0.8489707046714172

In [None]:
print(classification_report(eval_df.label, eval_df.prediction))

                      precision    recall  f1-score   support

             anxiety       0.88      0.92      0.90       787
             bipolar       0.93      0.88      0.91       565
          depression       0.76      0.74      0.75       988
              normal       0.95      0.94      0.94       981
personality disorder       0.84      0.81      0.82       200
              stress       0.84      0.84      0.84       552
            suicidal       0.78      0.81      0.79       979

            accuracy                           0.85      5052
           macro avg       0.85      0.85      0.85      5052
        weighted avg       0.85      0.85      0.85      5052



## References

- [Trained model on HF](https://huggingface.co/curiousily/Llama-3.2-1B-Mental-Health-Sentiment)
- [Sentiment analysis for mental health dataset](https://huggingface.co/datasets/btwitssayan/sentiment-analysis-for-mental-health)