<a href="https://colab.research.google.com/github/Harooniqbal4879/AgenticAI/blob/main/LLM_movie_review_writer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, we use the 3.8B parameter [Phi-3 model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) to write move reviews.

We will also take a closer look at the inputs and outputs of the tokenizer and the LLM, to understand what is happening under the hood when we call a LLM text generation API.

We also look at the effect of `temperature`, a common parameter used to control LLM text generation randomness. Higher temperature gives more random/creative outputs, while lower, near zero gives less varied outputs. There is also a "greedy" text generation strategy: under this strategy, the vocab token with the highest next token probability will always be selected, making the generation deterministic.



In [None]:
import torch
import transformers

transformers.utils.logging.set_verbosity_error()
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [None]:
#@title Download phi-3 model and its tokenizer from hugging face

from transformers import AutoModelForCausalLM, AutoTokenizer

phi3_model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct",
                                             torch_dtype="auto",
                                             trust_remote_code=True
                                             )
phi3_tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]



model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

In [None]:
# Our text generation "API".
def llm_generate_text(model, tokenizer, prompt, temperature=0.8, max_new_tokens=256):
  tokenized_inputs = tokenizer(prompt, return_tensors="pt")
  tokenized_inputs = {k: v.to(device) for k, v in tokenized_inputs.items()}

  model = model.to(device)

  predicted_token_ids = model.generate(
      **tokenized_inputs,
      do_sample=True,
      temperature=temperature,
      max_new_tokens=max_new_tokens,
  )[0]

  print(tokenizer.decode(predicted_token_ids))

### Generate movie review using phi3 model

* We ask the model to write a movie review starting with "I".
* The phi3 model expects chat formatted input. We use `tokenizer.apply_chat_template` to easily do this.

It's easy to see that phi3's movie reviews are vastly better than our previous bi-gram and 4-gram models. Importantly, we did not need to train the phi3 model either: the model has gone through sufficient prior training to satisfy our request.

In [None]:
prompt = 'Write a movie review. The movie should be an actual movie. The review should start with the word "I".'

messages = [{"role": "user", "content": prompt}]
phi3_prompt = phi3_tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
print(phi3_prompt)

llm_generate_text(phi3_model, phi3_tokenizer, phi3_prompt)

<|user|>
Write a movie review. The movie should be an actual movie. The review should start with the word "I".<|end|>
<|assistant|>





<|user|> Write a movie review. The movie should be an actual movie. The review should start with the word "I".<|end|><|assistant|> I recently had the opportunity to watch the captivating and thought-provoking film, "Parasite," directed by Bong Joon-ho. This South Korean masterpiece is not only a visual spectacle but also a gripping narrative that explores the intricate dynamics between different socio-economic classes.

The movie opens in the affluent neighborhood of Gangnam, where we meet the impoverished Kim family. They come across a unique opportunity when they notice the Parks, a wealthy family living in a beautiful mansion. The Kims, keen to improve their lives, devises a plan to infiltrate the Parks' household by posing as a well-educated and seemingly accomplished group of professionals.

Throughout the film, we see the Kims' audacious scheme unfold as they create elaborate deceptions and roles to blend in with the Parks' privileged family. Initially, this subterfuge appears ha

### Now let's look at the `llm_generate_text` function line by line.
```
def llm_generate_text(model, tokenizer, prompt, temperature=0.8, max_new_tokens=256):
  tokenized_inputs = tokenizer(prompt, return_tensors="pt")
  tokenized_inputs = {k: v.to(device) for k, v in tokenized_inputs.items()}

  model = model.to(device)

  predicted_token_ids = model.generate(
      **tokenized_inputs,
      do_sample=True,
      temperature=temperature,
      max_new_tokens=max_new_tokens,
  )[0]

  print(tokenizer.decode(predicted_token_ids))
```

In [None]:
#@title Tokenize text input
print(f'prompt = {phi3_prompt}')
print()

print('Tokenizing the prompt......')
phi3_tokens = phi3_tokenizer(phi3_prompt, return_tensors="pt")
print(phi3_tokens)

prompt = <|user|>
Write a movie review. The movie should be an actual movie. The review should start with the word "I".<|end|>
<|assistant|>


Tokenizing the prompt......
{'input_ids': tensor([[32010, 14350,   263, 14064,  9076, 29889,   450, 14064,   881,   367,
           385,  3935, 14064, 29889,   450,  9076,   881,  1369,   411,   278,
          1734,   376, 29902,  1642, 32007, 32001]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1]])}


In [None]:
#@title The LLM generates token ids
phi3_tokens = {k: v.to(device) for k, v in phi3_tokens.items()}
phi3_model = phi3_model.to(device)

predicted_phi3_token_ids = phi3_model.generate(
      **phi3_tokens,
      do_sample=True,
      temperature=0.9,
      max_new_tokens=16,
  )
predicted_phi3_token_ids

tensor([[32010, 14350,   263, 14064,  9076, 29889,   450, 14064,   881,   367,
           385,  3935, 14064, 29889,   450,  9076,   881,  1369,   411,   278,
          1734,   376, 29902,  1642, 32007, 32001,   306, 10325, 24774,   525,
          1576, 28548,   845,   804,  4367,   331,   683,   742,   263,  5835,
         12343,   346]], device='cuda:0')

In [None]:
#@title Translate generated token ids back to text
phi3_tokenizer.decode(predicted_phi3_token_ids[0])

'<|user|> Write a movie review. The movie should be an actual movie. The review should start with the word "I".<|end|><|assistant|> I recently viewed \'The Shawshank Redemption\', a masterpiece'

In [None]:
#@title Controling text generation randomness with temperature
for _ in range(3):
  print('-------------------------------------------------------------------')
  print(f'temperature = 1e-9')
  llm_generate_text(phi3_model, phi3_tokenizer, phi3_prompt, temperature=1e-9, max_new_tokens=16)
print('###################################################################')

for _ in range(3):
  print('-------------------------------------------------------------------')
  print(f'temperature = 1.0')
  llm_generate_text(phi3_model, phi3_tokenizer, phi3_prompt, temperature=1.0, max_new_tokens=16)
print('###################################################################')

for _ in range(3):
  print('-------------------------------------------------------------------')
  print(f'temperature = 10.0')
  llm_generate_text(phi3_model, phi3_tokenizer, phi3_prompt, temperature=10.0, max_new_tokens=16)
print('###################################################################')

-------------------------------------------------------------------
temperature = 1e-9
<|user|> Write a movie review. The movie should be an actual movie. The review should start with the word "I".<|end|><|assistant|> I recently watched the movie "The Shawshank Redemption," directed by
-------------------------------------------------------------------
temperature = 1e-9
<|user|> Write a movie review. The movie should be an actual movie. The review should start with the word "I".<|end|><|assistant|> I recently had the pleasure of watching "The Shawshank Redemption,"
-------------------------------------------------------------------
temperature = 1e-9
<|user|> Write a movie review. The movie should be an actual movie. The review should start with the word "I".<|end|><|assistant|> I recently watched the movie "The Shawshank Redemption," directed by
###################################################################
-------------------------------------------------------------------
temp