#### 7.3 Open Models from the Hugging Face Model Hub

* In the following task, we are going to work with Llama, a Large Language Model provided by Meta. We want to use it to classify the same 30 texts as you and LEIA did already to compare sentiment classification performance.
* [Hugging Face](https://huggingface.co/) is a platform where different machine learning models are distributed. We are going to use the Llama Model via this page.
* Since language models can get quite big, it makes sense to use these with a GPU for faster loading and inference. Therefore, we need a provider of GPU access. Options are either [GoogleColab](https://colab.research.google.com) or [Kaggle](https://www.kaggle.com/).
* This guide is going to outline the use via Kaggle, but you are free to use other services for GPU access.



1) Create an account and sign into [Kaggle](https://www.kaggle.com/account/login)
2) Create an account and sign into [Hugging Face](https://huggingface.co/)
    * Here, you need to generate an [API Key on Hugging Face](https://huggingface.co/settings/tokens) in order to be able to access the models (see screenshot below)
    * Save the access token in a .txt file 

3) We are going to use the Llama Model in Version 3.1 with 8 billion parameters. To do so, you need to request the access from Meta via the [models hugging face page](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct). On the linked page, you should find a box telling you that you need to request model access. To be granted access should happen within a couple of minutes.

4) To use the model, we now need to move to Kaggle. After you logged in, check out the sidebar on the left hand, where you can find a menu option `Code`. Select it and click on `+ New Notebook`. After the notebook got opened, navigate to the top left, select `File` and `Import Notebook`. Upload this notebook.
 * On Kaggle, you have 30h of GPU use per 
 * To start GPU usage, navigate to the right. In the `Session options` under `Accelerator` you can use select a GPU to use. To save compute time, make sure to turn this usage off, after you are done (but be aware that turning on and off stops and restarts your session (kernel restart)).
 * To later download your notebook, again use `File` and `Download Notebook`.

6) Next, we need to install the hugging face package for python to be able to load the model. To get the access, we have to authenticate via the API key, we generated on Hugging Face.
* Code to install the packages is provided below.



In [1]:
# install packages

!pip install huggingface_hub
!pip install torch 
!pip install accelerate 
!pip install transformers 
!pip install bitsandbytes
!pip install -U transformers



* Kaggle offers a way to load your API Key in a save way. To do so, navigate to the top and `Add-ons` where you select `Secrets`. In the panel, which opens on the right, you can add a new secret token. Name it i.e. 'Hugging Face' and enter your API key. Kaggle provides you the code snippet to load this token in the notebook. Depending on the name you assigned to the key, it should look like this:

In [2]:
# load API key in a save way

from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
HF_key = user_secrets.get_secret("Hugging Face")

* To log into Hugging Face from this Kaggle page, we use the log in client from the `huggingface_hub` package

In [3]:
# log in

from huggingface_hub import login
login(token = HF_key)

7) The last thing is that we have to import the datasets, which we need to use. To do so, navigate to the right. In the section `Inputs`, you select `Upload` and `New Dataset`. Select the csv's you want to upload, assign a name and upload.
* To import the data, Kaggle offers an option to directly copy the file path to the clipbord (hover over the dataset's name and you see an option `copy file path` on the right hand side)


In [7]:
# imports 

from transformers import pipeline
import os
import transformers
from transformers import (AutoModelForCausalLM,
                          AutoTokenizer,
                          BitsAndBytesConfig)
import torch
import pandas as pd
import bitsandbytes as bnb

In [9]:
# empty the memory and check if the GPU is available

torch.cuda.empty_cache()
torch.cuda.is_available()

True

In [None]:
# import data

df = pd.read_csv("/kaggle/input/data-with-labels/data_with_labels.csv")

df = df.sample(frac=1, random_state=42).reset_index(drop=True)
df.head(2)

sents = df['text'].values

sample_text = sents[25]
sample_text

8. Now, we can finally load and use the model

In [10]:
# specify the models name
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

# add the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# quantization options to compress the model to that it fits with the memory
quantization_config = BitsAndBytesConfig(
    load_in_4bit = True, 
    bnb_4bit_quant_type = 'nf4', 
    bnb_4bit_use_double_quant = True,
    bnb_4bit_compute_dtype = torch.bfloat16
)

# load the model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config = quantization_config # with quantization
)

# instantiate a pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=True
)


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

Device set to use cuda:0


In [19]:
# here, we specify the input consisting of a system prompt (which gived the model general instructions on how to behave)
# some user prompts with assistant return which serve as examples
# and the final user input with the current text to classify into sentiments.

messages = [
    # system prompt
    {"role": "system", "content": """You are now Emotbot. Emotbot answers with only one of the following words: Sadness, Affection, Fear, Happiness, Anger. 
                                    If Emotbot answers with anything else they have failed. Emotbot cannot fail. Emotbot will be provided with short pieces of text from the website 'Vent'.
                                    'Vent' is a website where users can share their feelings to an anonymous audience.
                                    When writing the 'Vent' text, the authors selected 1 of the following emotions to represent what they were feeling: 'Sadness', 'Affection', 'Fear', 'Happiness', or 'Anger'.
                                    Emotbot will read and analyse the text and predict which of the 5 feelings the author had selected."""},
    
    # example 1
    {"role": "user", "content": "I hate fuckin every single person on this fuckin planet. Someone kill me pls"},
    {"role": "assistant", "content": "Sadeness"},

    # example 2
    {"role": "user", "content": "Best day I have had in a long time :)"},
    {"role": "assistant", "content": "Happiness"},

    # example 3
    {"role": "user", "content": "boy I like called me princess He’s so precious"},
    {"role": "assistant", "content": "Affection"},

    # actual text to classify
    {"role": "user", "content": sample_text},
    
    # no assistant role
    
]

In [20]:
# generate the putput
outputs = pipeline(
    messages,
    max_new_tokens=10,
)

print(outputs)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


[{'generated_text': [{'role': 'system', 'content': "You are now Emotbot. Emotbot answers with only one of the following words: Sadness, Affection, Fear, Happiness, Anger. \n    If Emotbot answers with anything else they have failed. Emotbot cannot fail.\n    Emotbot will be provided with short pieces of text from the website 'Vent'.\n    'Vent' is a website where users can share their feelings to an anonymous audience.\n    When writing the 'Vent' text, the authors selected 1 of the following emotions to represent what they were feeling: 'Sadness', 'Affection', 'Fear', 'Happiness', or 'Anger'.\n    Emotbot will read and analyse the text and predict which of the 5 feelings the author had selected."}, {'role': 'user', 'content': 'I hate fuckin every single person on this fuckin planet. Someone kill me pls'}, {'role': 'assistant', 'content': 'Sadeness'}, {'role': 'user', 'content': 'Best day I have had in a long time :)'}, {'role': 'assistant', 'content': 'Happiness'}, {'role': 'user', 'c

In [28]:
label = outputs[0]['generated_text'][-1]['content']
print(f"The predicted label for '{sample_text}' is: {label}")

The predicted label for 'Great now my head hurts too' is: Anger


#### 7.4 Comparison
* Compare the performance of the two models (Leia and LLama), with each other, as well as with the quality of your annotation using the metrics introduced in part 1 (accuracy, precision, recall, f1 score) or other metrics you find interesting. Create informative visualizations to aid the comparison.
* Discuss your results. 
* Are the models accurately predicting human emotions?
* Which approach seems to work better? Why?