<a href="https://colab.research.google.com/github/Harooniqbal4879/AgenticAI/blob/main/Copy_of_LLM_movie_review_writer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, we use the 3.8B parameter [Phi-3 model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) to write move reviews.

We will also take a closer look at the inputs and outputs of the tokenizer and the LLM, to understand what is happening under the hood when we call a LLM text generation API.

We also look at the effect of `temperature`, a common parameter used to control LLM text generation randomness. Higher temperature gives more random/creative outputs, while lower, near zero gives less varied outputs. There is also a "greedy" text generation strategy: under this strategy, the vocab token with the highest next token probability will always be selected, making the generation deterministic.



In [17]:
import torch
import transformers

transformers.utils.logging.set_verbosity_error()
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [2]:
!pip install ipywidgets

Collecting jedi>=0.16 (from ipython>=4.0.0->ipywidgets)
  Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)
Downloading jedi-0.19.2-py2.py3-none-any.whl (1.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m54.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: jedi
Successfully installed jedi-0.19.2


In [15]:
#@title Download phi-3 model and its tokenizer from hugging face

from transformers import AutoModelForCausalLM, AutoTokenizer

phi3_model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct",
                                             torch_dtype="auto",
                                             trust_remote_code=True,
                                             attn_implementation="eager"
                                             )
phi3_tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [19]:
# Our text generation "API".
import torch
def llm_generate_text(model, tokenizer, prompt, temperature=0.8, max_new_tokens=256):
  device = 'cuda' if torch.cuda.is_available() else 'cpu'
  tokenized_inputs = tokenizer(prompt, return_tensors="pt")
  tokenized_inputs = {k: v.to(device) for k, v in tokenized_inputs.items()}

  model = model.to(device)

  predicted_token_ids = model.generate(
      **tokenized_inputs,
      do_sample=True,
      temperature=temperature,
      max_new_tokens=max_new_tokens,
      use_cache=False # Add this line to disable caching
  )[0]

  print(tokenizer.decode(predicted_token_ids))

### Generate movie review using phi3 model

* We ask the model to write a movie review starting with "I".
* The phi3 model expects chat formatted input. We use `tokenizer.apply_chat_template` to easily do this.

It's easy to see that phi3's movie reviews are vastly better than our previous bi-gram and 4-gram models. Importantly, we did not need to train the phi3 model either: the model has gone through sufficient prior training to satisfy our request.

In [12]:
!pip install --upgrade transformers



In [20]:
prompt = 'Write a movie review. The movie should be an actual movie. The review should start with the word "I".'

messages = [{"role": "user", "content": prompt}]
phi3_prompt = phi3_tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
print(phi3_prompt)

llm_generate_text(phi3_model, phi3_tokenizer, phi3_prompt)

<|user|>
Write a movie review. The movie should be an actual movie. The review should start with the word "I".<|end|>
<|assistant|>





<|user|> Write a movie review. The movie should be an actual movie. The review should start with the word "I".<|end|><|assistant|> I recently watched “The Shawshank Redemption,” and I cannot say it enough that it is an exceptional film. The movie, directed by Frank Darabont, is based on Stephen King's novella and showcases an incredible performance by Tim Robbins as Andy Dufresne. Together with Morgan Freeman's portrayal of Ellis "Red" Redding, the film narrates the story of two inmates at Shawshank State Penitentiary.

What makes this film so captivating is its exploration of hope. Andy, a skilled banker, is wrongly accused of murdering his wife and her lover and is sentenced to two life terms at Shawshank. Despite the degrading environment, Andy maintains his dignity and hope, setting a remarkable example for Red and other inmates. The characters are complex and deeply humanized, allowing audiences to connect with their struggles and triumphs.

The film is also well-crafted in its st

### Now let's look at the `llm_generate_text` function line by line.
```
def llm_generate_text(model, tokenizer, prompt, temperature=0.8, max_new_tokens=256):
  tokenized_inputs = tokenizer(prompt, return_tensors="pt")
  tokenized_inputs = {k: v.to(device) for k, v in tokenized_inputs.items()}

  model = model.to(device)

  predicted_token_ids = model.generate(
      **tokenized_inputs,
      do_sample=True,
      temperature=temperature,
      max_new_tokens=max_new_tokens,
  )[0]

  print(tokenizer.decode(predicted_token_ids))
```

In [21]:
#@title Tokenize text input
print(f'prompt = {phi3_prompt}')
print()

print('Tokenizing the prompt......')
phi3_tokens = phi3_tokenizer(phi3_prompt, return_tensors="pt")
print(phi3_tokens)

prompt = <|user|>
Write a movie review. The movie should be an actual movie. The review should start with the word "I".<|end|>
<|assistant|>


Tokenizing the prompt......
{'input_ids': tensor([[32010, 14350,   263, 14064,  9076, 29889,   450, 14064,   881,   367,
           385,  3935, 14064, 29889,   450,  9076,   881,  1369,   411,   278,
          1734,   376, 29902,  1642, 32007, 32001]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1]])}


In [23]:
#@title The LLM generates token ids
phi3_tokens = {k: v.to(device) for k, v in phi3_tokens.items()}
phi3_model = phi3_model.to(device)

predicted_phi3_token_ids = phi3_model.generate(
      **phi3_tokens,
      do_sample=True,
      temperature=0.9,
      max_new_tokens=16,
      use_cache=False # Add this line to disable caching
  )
predicted_phi3_token_ids

tensor([[32010, 14350,   263, 14064,  9076, 29889,   450, 14064,   881,   367,
           385,  3935, 14064, 29889,   450,  9076,   881,  1369,   411,   278,
          1734,   376, 29902,  1642, 32007, 32001,   306, 10325, 20654,   376,
          1576, 28548,   845,   804,  4367,   331,   683,  1699,   322,   306,
         29915, 29885]], device='cuda:0')

In [24]:
#@title Translate generated token ids back to text
phi3_tokenizer.decode(predicted_phi3_token_ids[0])

'<|user|> Write a movie review. The movie should be an actual movie. The review should start with the word "I".<|end|><|assistant|> I recently watched "The Shawshank Redemption," and I\'m'

In [25]:
#@title Controling text generation randomness with temperature
for _ in range(3):
  print('-------------------------------------------------------------------')
  print(f'temperature = 1e-9')
  llm_generate_text(phi3_model, phi3_tokenizer, phi3_prompt, temperature=1e-9, max_new_tokens=16)
print('###################################################################')

for _ in range(3):
  print('-------------------------------------------------------------------')
  print(f'temperature = 1.0')
  llm_generate_text(phi3_model, phi3_tokenizer, phi3_prompt, temperature=1.0, max_new_tokens=16)
print('###################################################################')

for _ in range(3):
  print('-------------------------------------------------------------------')
  print(f'temperature = 10.0')
  llm_generate_text(phi3_model, phi3_tokenizer, phi3_prompt, temperature=10.0, max_new_tokens=16)
print('###################################################################')

-------------------------------------------------------------------
temperature = 1e-9
<|user|> Write a movie review. The movie should be an actual movie. The review should start with the word "I".<|end|><|assistant|> I recently had the pleasure of watching the film "The Shawshank Redem
-------------------------------------------------------------------
temperature = 1e-9
<|user|> Write a movie review. The movie should be an actual movie. The review should start with the word "I".<|end|><|assistant|> I recently had the pleasure of watching the film "The Shawshank Redem
-------------------------------------------------------------------
temperature = 1e-9
<|user|> Write a movie review. The movie should be an actual movie. The review should start with the word "I".<|end|><|assistant|> I recently watched the movie "The Shawshank Redemption," directed by
###################################################################
-------------------------------------------------------------------
t