<a href="https://colab.research.google.com/github/Shrivastav-Gaurav/GenAI-ML-Notebook/blob/main/LLM_movie_review_writer%5B1%5D.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, we use the 3.8B parameter [Phi-3 model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) to write move reviews.

We will also take a closer look at the inputs and outputs of the tokenizer and the LLM, to understand what is happening under the hood when we call a LLM text generation API.

We also look at the effect of `temperature`, a common parameter used to control LLM text generation randomness. Higher temperature gives more random/creative outputs, while lower, near zero gives less varied outputs. There is also a "greedy" text generation strategy: under this strategy, the vocab token with the highest next token probability will always be selected, making the generation deterministic.



In [None]:
import torch
import transformers

transformers.utils.logging.set_verbosity_error()
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [None]:
#@title Download phi-3 model and its tokenizer from hugging face

from transformers import AutoModelForCausalLM, AutoTokenizer

phi3_model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct",
                                             torch_dtype="auto",
                                             trust_remote_code=True
                                             )
phi3_tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

In [None]:
# Our text generation "API".
def llm_generate_text(model, tokenizer, prompt, temperature=0.8, max_new_tokens=256):
  tokenized_inputs = tokenizer(prompt, return_tensors="pt")
  tokenized_inputs = {k: v.to(device) for k, v in tokenized_inputs.items()}

  model = model.to(device)

  predicted_token_ids = model.generate(
      **tokenized_inputs,
      do_sample=True,
      temperature=temperature,
      max_new_tokens=max_new_tokens,
  )[0]

  print(tokenizer.decode(predicted_token_ids))

### Generate movie review using phi3 model

* We ask the model to write a movie review starting with "I".
* The phi3 model expects chat formatted input. We use `tokenizer.apply_chat_template` to easily do this.

It's easy to see that phi3's movie reviews are vastly better than our previous bi-gram and 4-gram models. Importantly, we did not need to train the phi3 model either: the model has gone through sufficient prior training to satisfy our request.

In [None]:
prompt = 'Write a movie review. The movie should be an actual movie. The review should start with the word "I".'

messages = [{"role": "user", "content": prompt}]
phi3_prompt = phi3_tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
print(phi3_prompt)

llm_generate_text(phi3_model, phi3_tokenizer, phi3_prompt)

### Now let's look at the `llm_generate_text` function line by line.
```
def llm_generate_text(model, tokenizer, prompt, temperature=0.8, max_new_tokens=256):
  tokenized_inputs = tokenizer(prompt, return_tensors="pt")
  tokenized_inputs = {k: v.to(device) for k, v in tokenized_inputs.items()}

  model = model.to(device)

  predicted_token_ids = model.generate(
      **tokenized_inputs,
      do_sample=True,
      temperature=temperature,
      max_new_tokens=max_new_tokens,
  )[0]

  print(tokenizer.decode(predicted_token_ids))
```

In [None]:
#@title Tokenize text input
print(f'prompt = {phi3_prompt}')
print()

print('Tokenizing the prompt......')
phi3_tokens = phi3_tokenizer(phi3_prompt, return_tensors="pt")
print(phi3_tokens)

In [None]:
#@title The LLM generates token ids
phi3_tokens = {k: v.to(device) for k, v in phi3_tokens.items()}
phi3_model = phi3_model.to(device)

predicted_phi3_token_ids = phi3_model.generate(
      **phi3_tokens,
      do_sample=True,
      temperature=0.9,
      max_new_tokens=16,
  )
predicted_phi3_token_ids

In [None]:
#@title Translate generated token ids back to text
phi3_tokenizer.decode(predicted_phi3_token_ids[0])

In [None]:
#@title Controling text generation randomness with temperature
for _ in range(3):
  print('-------------------------------------------------------------------')
  print(f'temperature = 1e-9')
  llm_generate_text(phi3_model, phi3_tokenizer, phi3_prompt, temperature=1e-9, max_new_tokens=16)
print('###################################################################')

for _ in range(3):
  print('-------------------------------------------------------------------')
  print(f'temperature = 1.0')
  llm_generate_text(phi3_model, phi3_tokenizer, phi3_prompt, temperature=1.0, max_new_tokens=16)
print('###################################################################')

for _ in range(3):
  print('-------------------------------------------------------------------')
  print(f'temperature = 10.0')
  llm_generate_text(phi3_model, phi3_tokenizer, phi3_prompt, temperature=10.0, max_new_tokens=16)
print('###################################################################')