# Fine-Tuning LLMs with Hugging Face

## Step 1: Installing and importing the libraries

In [2]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7
!pip install huggingface_hub
!pip install flask

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/244.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m235.5/244.2 kB[0m [31m7.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.5/92.5 MB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m104.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.4/77.4 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m103.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━

In [3]:
!pip install huggingface_hub



In [5]:
import torch
from trl import SFTTrainer
from peft import LoraConfig
from datasets import load_dataset
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, pipeline)

## Step 2: Loading the model

In [None]:
llama_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path = "aboonaji/llama2finetune-v2",
                                                   quantization_config = BitsAndBytesConfig(load_in_4bit = True, bnb_4bit_compute_dtype = getattr(torch, "float16"), bnb_4bit_quant_type = "nf4"))
llama_model.config.use_cache = False
llama_model.config.pretraining_tp = 1

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/632 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/174 [00:00<?, ?B/s]

## Step 3: Loading the tokenizer

## Step 4: Setting the training arguments

In [None]:
training_arguments = TrainingArguments(output_dir = "./results", per_device_train_batch_size = 4, max_steps = 100)

## Step 5: Creating the Supervised Fine-Tuning trainer

In [None]:
llama_sft_trainer = SFTTrainer(model = llama_model,
                               args = training_arguments,
                               train_dataset = load_dataset(path = "vivekdugale/llama2_filtered_dataset_amod_helios_450", split = "train"),
                               tokenizer = tokenizer,
                               peft_config = LoraConfig(task_type = "CAUSAL_LM", r = 64, lora_alpha = 16, lora_dropout = 0.1),
                               dataset_text_field = "text")



## Step 6: Training the model

In [None]:
llama_sft_trainer.train()

## Step 7: Chatting with the model

In [None]:
user_prompt = "I'm feeling down at work nowadays, and no one values me. I'm at a difficult place & I'm feeling depressed"
text_generation_pipeline = pipeline(task = "text-generation", model = llama_model, tokenizer = llama_tokenizer, max_length = 1000)
model_answer = text_generation_pipeline(f"<s>[INST] {user_prompt} [/INST]")
print(model_answer[0]['generated_text'])




<s>[INST] I'm feeling down at work nowadays, and no one values me. I'm at a difficult place & I'm feeling depressed [/INST]  Sorry to hear that you're feeling down and unvalued at work. It can be really tough to feel like you're not appreciated or recognized for your efforts, especially when you're putting in a lot of hard work. Here are some suggestions that may help you cope with these feelings:

1. Talk to someone: It can be helpful to talk to a trusted friend, family member, or mental health professional about how you're feeling. They can offer emotional support, provide new perspectives, or help you develop a plan to address the issues at work.
2. Identify the source of the problem: Try to pinpoint the reasons why you feel unvalued at work. Is it a specific person or situation, or is it a more general feeling? Once you understand the source of the problem, you may be able to come up with ways to address it.
3. Focus on your strengths: Rather than dwelling on your perceived shortco

In [None]:
user_prompt = "Talking about your second point, it is more of a general feeling of being underappreciated & overworked."
text_generation_pipeline = pipeline(task = "text-generation", model = llama_model, tokenizer = llama_tokenizer, max_length = 1000)
model_answer = text_generation_pipeline(f"<s>[INST] {user_prompt} [/INST]")
print(model_answer[0]['generated_text'])

<s>[INST] Talking about your second point, it is more of a general feeling of being underappreciated & overworked. [/INST]  Ah, I see. Yes, feeling underappreciated and overworked can be a common experience for many people, regardless of their job or profession. It can be especially challenging when you feel like you're putting in a lot of effort and time into your work, but not getting the recognition or appreciation you feel you deserve.

Here are some additional thoughts on this topic:

1. It's important to communicate your concerns: If you're feeling underappreciated and overworked, it's important to communicate your concerns with your supervisor or manager. They may not be aware of how you're feeling, and addressing the issue directly can help to find a solution.
2. Set boundaries: It's important to set boundaries around your work and personal life to help manage your workload and prevent burnout. This may involve prioritizing your tasks, learning to say no to additional responsib

In [None]:
user_prompt = "I have tried communicating with my peers and superiors at work, and honestly it hasn't gotten me anywhere, which is why I'm coming to you with this. I would like it for you to try & identify my issue & suggest some measure to overcome it."
text_generation_pipeline = pipeline(task = "text-generation", model = llama_model, tokenizer = llama_tokenizer, max_length = 1000)
model_answer = text_generation_pipeline(f"<s>[INST] {user_prompt} [/INST]")
print(model_answer[0]['generated_text'])

<s>[INST] I have tried communicating with my peers and superiors at work, and honestly it hasn't gotten me anywhere, which is why I'm coming to you with this. I would like it for you to try & identify my issue & suggest some measure to overcome it. [/INST]  Thank you for reaching out and sharing your concerns. I'm here to help you identify the issue and suggest some measures to overcome it.

To better understand your situation, could you please provide some more details about your workplace communication challenges? For example, what kind of communication are you trying to achieve (e.g., resolving conflicts, building relationships, sharing ideas, etc.)? Are there any specific situations or people that are causing the issue? Have you tried different communication strategies or approaches in the past?

Once I have a better understanding of your situation, I can help you identify the root cause of the issue and suggest some potential solutions. These might include improving your communica

In [None]:
user_prompt = "I have tried to build relationships with my coworkers, and I do try to avoid conflicts & resolve them without escalation. But in terms of ideas, mine are always undervalued. I have tried my best to bring changes within myself but it still seems like a losing battle. Can you please help me?"
text_generation_pipeline = pipeline(task = "text-generation", model = llama_model, tokenizer = llama_tokenizer, max_length = 1000)
model_answer = text_generation_pipeline(f"<s>[INST] {user_prompt} [/INST]")
print(model_answer[0]['generated_text'])



<s>[INST] I have tried to build relationships with my coworkers, and I do try to avoid conflicts & resolve them without escalation. But in terms of ideas, mine are always undervalued. I have tried my best to bring changes within myself but it still seems like a losing battle. Can you please help me? [/INST]  It sounds like you are facing a common challenge in the workplace: feeling undervalued and unheard when it comes to your ideas and contributions. Here are some suggestions that may help you address this issue:

1. Communicate your ideas clearly and assertively: When you have an idea, make sure to communicate it clearly and confidently. Use "I" statements to express your thoughts and avoid blaming or attacking language. For example, instead of saying "You never listen to me," say "I have an idea that I think could improve our team's performance."
2. Seek feedback and support from your coworkers: Ask your coworkers for their thoughts on your ideas and seek their support. This can hel

SENTIMENT ANALYSIS FOR USER PROMPT

In [9]:
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from scipy.special import softmax

MODEL = f"cardiffnlp/twitter-roberta-base-sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

def polarity_scores_roberta(example):
    encoded_text = tokenizer(example, return_tensors='pt')
    output = model(**encoded_text)
    scores = output[0][0].detach().numpy()
    scores = softmax(scores)
    scores_dict = {
        'roberta_neg' : scores[0],
        'roberta_neu' : scores[1],
        'roberta_pos' : scores[2]
    }
    return scores_dict
example = "I'm feeling down at work nowadays, and no one values me. I'm at a difficult place & I'm feeling depressed"

def polarity_scores_roberta(example):
    encoded_text = tokenizer(example, return_tensors='pt')
    output = model(**encoded_text)
    scores = output[0][0].detach().numpy()
    scores = softmax(scores)
    scores_dict = {
        'roberta_neg' : scores[0],
        'roberta_neu' : scores[1],
        'roberta_pos' : scores[2]
    }
    return scores_dict

print(polarity_scores_roberta(example))

{'roberta_neg': 0.9519865, 'roberta_neu': 0.044013172, 'roberta_pos': 0.0040003955}
