# load data

In [None]:
!pip install -U accelerate
!pip install -U transformers

Collecting accelerate
  Downloading accelerate-0.31.0-py3-none-any.whl (309 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m309.4/309.4 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.w

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In this notebook, you will be working with a Large Language Model (LLM) and explore its capabilities to help you solve various problems.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re
import string
import math

In [None]:
train_df = pd.read_json("/content/drive/MyDrive/imdb/train_imdb.jsonl", lines=True)

In [None]:
test_df = pd.read_json("/content/drive/MyDrive/imdb/test_imdb.jsonl", lines=True)

In [None]:
unlabeled_df = pd.read_json("/content/drive/MyDrive/imdb/aug_imdb_unlabeled.jsonl", lines=True)

# Loading Model

We will be using Phi-3 as our LLM.

In [None]:
MODEL_ARGS = {
    'Name': 'microsoft/Phi-3-mini-128k-instruct',
    'DType': torch.bfloat16 # add torch.
}
device = "cuda" if torch.cuda.is_available() else "cpu"

In [None]:
def load_model(model_args):


    model = AutoModelForCausalLM.from_pretrained(
        model_args['Name'],
        trust_remote_code=True,
        torch_dtype=model_args['DType'], #remove torch.
        low_cpu_mem_usage=True,
        device_map={"": device},
    )
    tokenizer = AutoTokenizer.from_pretrained(
        model_args['Name'],
        trust_remote_code=True,
    )

    return model, tokenizer

In [None]:
model, tokenizer = load_model(MODEL_ARGS)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/3.38k [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.8k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.17k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/568 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


# llm label generation

In [None]:
def generate_text(model, tokenizer, prompt, max_new_tokens = 100, do_sample=True, temperature=0.5):

    input_ids = tokenizer.encode(prompt, return_tensors='pt').to(device)
    if do_sample:
        output_ids = model.generate(input_ids, max_new_tokens=max_new_tokens, do_sample=True, temperature=temperature)
    else:
        output_ids = model.generate(input_ids, max_new_tokens=max_new_tokens, do_sample=do_sample)

    output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

    return output_text[len(prompt):]

Lets break down this function:

**Arguments**:

* **model**: The language model used for text generation.
* **tokenizer**: The tokenizer that converts text to tokens and vice versa.
* **prompt**: The initial text input that the model will build upon.
* **max_new_tokens**: The maximum number of new tokens to generate.
* **do_sample**: Whether to sample the next token or use deterministic decoding.
* **temperature**: Controls the randomness of sampling; higher values produce more diverse outputs. ( model creativity )

**Functionality**:

The generate_text function creates more text based on a given starting prompt using a language model and tokenizer. It first converts the prompt into tokens (numbers the model understands), then generates additional tokens to continue the text. Depending on settings, it can generate text randomly or in a fixed way. Finally, it converts the tokens back into readable text and returns the part that extends beyond the original prompt.

LLMs can learn from their prompts, as you can give it examples or guide it and teach it how to solve the problem.

## zero-shot

In [None]:
text = ""
templete_promt = f"""Question: Is sentiment of this sentence positive if answer is yes say 1 and if is no sasy 0,{text}?
Answer:"""

In [None]:
templete_promt

'Question: Is sentiment of this sentence positive if answer is yes say 1 and if is no sasy 0,?\nAnswer:'

In [None]:

# prompt = """Question: Is sentiment of this sentence positive if answer is yes say 1 and if is no sasy 0,it was last time i saw this movie?
# Answer:"""

generate_text(
    model=model,
    tokenizer=tokenizer,
    prompt=templete_promt,
    max_new_tokens=70,
    do_sample=False,
    temperature=0.0,
)

' 1\n\nSentence: This product seems very useful and its performance is good.'

In [None]:
train_df.head(10)

Unnamed: 0,text,label,embedding,cleaned
0,fairly good romantic comedy in which i do not ...,1,"[-0.0167805497, -0.0395836979, 0.1233159453000...",fairly good romantic comedy think see meg look...
1,"""dressed to kill"", is one of the best thriller...",1,"[-0.12526972590000002, 0.10147688540000001, 0....",""" dressed kill "" , good thriller . dealing sex..."
2,i'm glad that users (as of this date) who like...,1,"[0.1312361956, 0.0294876788, 0.2328549027, -0....",glad user ( date ) like movie come forward . u...
3,needed an excuse to get out of the house while...,0,"[0.1387384981, 0.0460377187, 0.3447172046, -0....",need excuse house paint dry - leave movie hour...
4,john candy is performance in once upon a crime...,1,"[0.1606466323, -0.1768193543, 0.35633808370000...",john candy performance crime possibly good . f...
5,"this movie maybe really bad, but it is alot of...",1,"[0.0058481544000000005, -0.1326265633, 0.18759...","movie maybe bad , alot fun . bad acting poor d..."
6,"besides being boring, the scenes were oppressi...",0,"[0.1397939026, -0.021955709900000002, 0.169891...","boring , scene oppressive dark . movie try por..."
7,this is exactly the sort of saturday matinee s...,0,"[0.0405324325, 0.0174655784, 0.1514206231, 0.0...",exactly sort saturday matinee serial love worl...
8,"very slick, very pre-hays code, and still very...",1,"[0.039149038500000004, -0.0511390641, 0.348277...","slick , pre - hay code , sassy . highly recomm..."
9,i like this film a lot. it has a wonderful che...,1,"[-0.1012975574, 0.12617483740000002, 0.0768608...",like film lot . wonderful chemistry actor tell...


In [None]:
from tqdm import tqdm

In [None]:
sentiments_by_llm_zero_shot = []
label = []
for i,j in tqdm(train_df[:].iterrows()):
  text = (j["text"])

  templete_promt = f"""Question: Is sentiment of this sentence positive or negative just answer in one word positive or negatice,{text}
  Answer:"""
  res = generate_text(
    model=model,
    tokenizer=tokenizer,
    prompt=templete_promt,
    max_new_tokens=70,
    do_sample=False,
    temperature=0.0,
  )

  if("positive" in res):
    sentiment_result = 1
  else:
    sentiment_result = 0
  sentiments_by_llm_zero_shot.append(sentiment_result)
  label.append(int(j[1]))



150it [03:16,  1.31s/it]


In [None]:
k=0
for i,j in zip(label, sentiments_by_llm_zero_shot):
  if(i != j):
    print(train_df.iloc[k]["text"])
    print(i,j)
    print("\n\n\n\n")
  k+=1

whenever people ask me to name the scariest movie i've ever seen, i invariably reply "black noon" and to this day nobody is ever heard of it.i watched it alone some 30 years ago at the tender age of 13 when my parents had gone out for the evening. as far as i know its only ever been shown once in the uk and sadly is unavailable on dvd or vhs.if anyone can trace a copy please let me know.if i watched it again now it would probably be a big disappointment but it has always stuck in my memory as a particularly disturbing little film!
1 0





for those who commented on the patriot as being accurate, (which basically satanised the english), it was interesting to see this film. by all accounts this was the bloodiest war that americans have ever been involved in, and they were the only nationality present. it was therefore very refreshing to see something resembling historical accuracy coming from that side of the atlantic that did not paint america as either martyrs or saviours. all in all 

In [None]:
from sklearn.metrics import classification_report
print(classification_report(label, sentiments_by_llm_zero_shot))

              precision    recall  f1-score   support

           0       0.89      0.97      0.93        67
           1       0.97      0.90      0.94        83

    accuracy                           0.93       150
   macro avg       0.93      0.94      0.93       150
weighted avg       0.94      0.93      0.93       150



In [None]:
train_df.describe()

Unnamed: 0,label
count,150.0
mean,0.553333
std,0.498813
min,0.0
25%,0.0
50%,1.0
75%,1.0
max,1.0


In [None]:
sentiments_by_llm_zero_shot = []
label = []
for i,j in tqdm(test_df[:].iterrows()):
  text = (j["text"])

  templete_promt = f"""Question: Is sentiment of this sentence positive or negative just answer in one word positive or negatice,{text}
  Answer:"""
  res = generate_text(
    model=model,
    tokenizer=tokenizer,
    prompt=templete_promt,
    max_new_tokens=70,
    do_sample=False,
    temperature=0.0,
  )

  if("positive" in res):
    sentiment_result = 1
  else:
    sentiment_result = 0
  sentiments_by_llm_zero_shot.append(sentiment_result)
  label.append(int(j[1]))



In [None]:

print(classification_report(label, sentiments_by_llm_zero_shot))

              precision    recall  f1-score   support

           0       0.94      0.96      0.95        78
           1       0.96      0.93      0.94        72

    accuracy                           0.95       150
   macro avg       0.95      0.95      0.95       150
weighted avg       0.95      0.95      0.95       150



In [None]:
sentiments_by_llm_zero_shot_unlabeld = []
for i,j in tqdm(unlabeled_df[:].iterrows()):
  text = (j["text"])

  templete_promt = f"""Question: Is sentiment of this sentence positive or negative just answer in one word positive or negatice,{text}
  Answer:"""
  res = generate_text(
    model=model,
    tokenizer=tokenizer,
    prompt=templete_promt,
    max_new_tokens=70,
    do_sample=False,
    temperature=0.0,
  )

  if("positive" in res):
    sentiment_result = 1
  else:
    sentiment_result = 0
  sentiments_by_llm_zero_shot.append(sentiment_result)



In [None]:
import pickle

In [None]:
def save_data(data):
    with open("sb_llm_0shot.dat", "wb") as f:
        pickle.dump(sentiments_by_llm_zero_shot, f)

In [None]:
with open('sb_llm_0shot.pkl', 'wb') as file:

    # A new file will be created
    pickle.dump(sentiments_by_llm_zero_shot, file)

In [None]:
with open('sb_llm_0shot.pkl', 'rb') as file:

    # Call load method to deserialze
    labels_aug_df = pickle.load(file)

    print(labels_aug_df)

## one-shot

In [None]:
sentiments_by_llm_zero_shot_unlabeld = []
for i,j in tqdm(unlabeled_df[:].iterrows()):
  text = (j["text"])

  templete_promt = f"""Question: Is sentiment of this sentence positive or negative just answer in one word positive or negatice for example for i like this movie sentence result is positive
                        i will say an example
                        example is i like this movie sentence result is  and result is positive,
                        target is{text}
  Answer:"""
  res = generate_text(
    model=model,
    tokenizer=tokenizer,
    prompt=templete_promt,
    max_new_tokens=70,
    do_sample=False,
    temperature=0.0,
  )

  if("positive" in res):
    sentiment_result = 1
  else:
    sentiment_result = 0
  sentiments_by_llm_zero_shot.append(sentiment_result)



## few-shot

Here we get the sentence closest to the average for each label and write a chain of thoughts for it

In [None]:
sentiments_by_llm_few_shot_unlabeld = []
label = []
for i,j in tqdm(train_df[:].iterrows()):
  text = (j["text"])

  templete_promt = f"""Question: Is sentiment of this sentence positive or negative just answer in one word positive or negative
                        i will say two examples
                        first example is i like this movie sentence result is  and result is positive
                        second example is i hate this movie sentence result is  and result is negative ,
                        target sentence is{text}
  Answer:"""
  res = generate_text(
    model=model,
    tokenizer=tokenizer,
    prompt=templete_promt,
    max_new_tokens=70,
    do_sample=False,
    temperature=0.0,
  )

  if("positive" in res):
    sentiment_result = 1
  else:
    sentiment_result = 0
  sentiments_by_llm_few_shot_unlabeld.append(sentiment_result)
  label.append(int(j[1]))



In [None]:

print(classification_report(sentiments_by_llm_few_shot_unlabeld, label))

              precision    recall  f1-score   support

           0       0.96      0.90      0.93        71
           1       0.92      0.96      0.94        79

    accuracy                           0.93       150
   macro avg       0.94      0.93      0.93       150
weighted avg       0.93      0.93      0.93       150



In [None]:
sentiments_by_llm_zero_shot_unlabeld = []
for i,j in tqdm(test_df[:].iterrows()):
  text = (j["text"])

  templete_promt = f"""Question: Is sentiment of this sentence positive or negative just answer in one word positive or negative
                        i will say two examples
                        first example is i like this movie sentence result is  and result is positive
                        second example is i hate this movie sentence result is  and result is negative ,
                        target sentence is{text}
  Answer:"""
  res = generate_text(
    model=model,
    tokenizer=tokenizer,
    prompt=templete_promt,
    max_new_tokens=70,
    do_sample=False,
    temperature=0.0,
  )

  if("positive" in res):
    sentiment_result = 1
  else:
    sentiment_result = 0
  sentiments_by_llm_zero_shot.append(sentiment_result)



In [None]:

print(classification_report(sentiments_by_llm_few_shot_unlabeld, label))

              precision    recall  f1-score   support

           0       0.96      0.90      0.93        71
           1       0.92      0.96      0.94        79

    accuracy                           0.93       150
   macro avg       0.94      0.93      0.93       150
weighted avg       0.93      0.93      0.93       150



*** As we see in classification reports both of zero-shot and few-shot got 95% accuracy on test dataframe and 93% on train dataframe .
So we choosed zero-shot prompt because is faster than others with high accuracy. ***

In the context of large language models (LLMs), zero-shot, one-shot, and few-shot prompting refer to different ways of providing context or examples to the model to help it generate the desired response. These methods differ in the amount of example data provided to the model for understanding the task at hand.

### Zero-Shot Prompting

**Zero-shot prompting** involves asking the model to perform a task without giving any examples. The model relies solely on its pre-trained knowledge to understand and generate a response based on the task described in the prompt.

**Example:**
- **Prompt:** "Translate the following English sentence to French: 'How are you?'"
- **Response:** "Comment ça va?"

In this case, the model generates the translation based purely on its training without seeing any specific examples of translations in the prompt.

### One-Shot Prompting

**One-shot prompting** involves providing the model with one example to help it understand the task before generating a response.

**Example:**
- **Prompt:** "Translate the following English sentence to French. Example: 'Good morning.' -> 'Bonjour.' Now, translate: 'How are you?'"
- **Response:** "Comment ça va?"

Here, the single example "Good morning." -> "Bonjour." helps the model understand that it should translate the sentence from English to French.

### Few-Shot Prompting

**Few-shot prompting** involves giving the model a few examples (typically 2-5) to illustrate the task. This provides a clearer pattern for the model to follow.

**Example:**
- **Prompt:** "Translate the following English sentences to French. Examples: 'Good morning.' -> 'Bonjour.' 'Good night.' -> 'Bonne nuit.' Now, translate: 'How are you?'"
- **Response:** "Comment ça va?"

In this case, the multiple examples help reinforce the pattern of translating English sentences to French, making it more likely that the model will generate the correct translation.

### Comparison

- **Zero-shot prompting**: Relies on the model's inherent knowledge without any specific examples.
- **One-shot prompting**: Provides one example to guide the model.
- **Few-shot prompting**: Provides a few examples to establish a clear pattern for the model.

These techniques are particularly useful in scenarios where pre-training the model on a specific task is not feasible, allowing the model to adapt dynamically to a wide range of tasks based on the examples provided in the prompt.