# Large Language Models (LLM)
In this tutorial, we'll learn how to use Large Language Models to generate desired outputs. We will use Llama-2-7B Model in this tutorial. Different Use Cases are implemented using LLM

## Step # 01
- Check your GPU-memory usage through nvidia-smi. Clean up your memory to implement this project. 
- Install the necessary libraries required to run this project


In [None]:
!nvidia-smi

In [None]:
#Installing Necessary Libraries
%pip install torch==2.1.1                       #Deep Learning Framework to train, inference models
%pip install transformers==4.35.2               #Hugging-Face library: to load, train, fine-tune LLMS
%pip install accelerate==0.25.0                 #Optimized training & Inference across multiple GPUs, TPUs and CPUs
%pip install bitsandbytes==0.41.3               #for 4-bit and 8-bit quantization to reduce memory usage
%pip install huggingface_hub                    #to login hugging-face account
%pip install ipywidgets                         #to create interactive widgets in JupiterNotebook
%pip install numpy

## Step # 02
Login your Hugging-Face account to download and use Llama-2-7B model for inference

In [None]:
#Login your HuggingFace account
!huggingface-cli login

## Step # 03
Load Llama-2-7B model from HuggingFace. Load both model and tokenizers

In [None]:
#Load LLM tokenizer form HuggingFace
from transformers import AutoTokenizer
Model_name="meta-llama/Llama-2-7b-hf"
tokenizer=AutoTokenizer.from_pretrained(Model_name)

In [4]:
#Tokenize the Input Prompt
text="The most important person in Deep Learning is"
encoding=tokenizer(text, return_tensors="pt")

In [None]:
encoding.keys()

In [None]:
encoding.input_ids

In [7]:
input_ids= encoding.input_ids[0]

In [None]:
tokenizer.decode(input_ids)

In [None]:
tokenizer.convert_ids_to_tokens(input_ids)

In [None]:
#Loading Llama-2-7B Base Model
#AutoModelForCausalLM: To Load Causal Language Model from HuggingFace
#GenerationConfig: Stores Configuration Setting for text generation

import torch
from transformers import AutoModelForCausalLM, GenerationConfig

model=AutoModelForCausalLM.from_pretrained(Model_name,device_map="auto",torch_dtype=torch.float16)  #uses fp16 to save memory and speedup inference

generation_config=GenerationConfig.from_pretrained(Model_name)
generation_config.max_new_tokens = 128             # max no. of tokens a model can generate
generation_config.repetition_penalty =  1.18       # prevents repetitive text with a repetition penalty to encourage diverse outputs. Values >1 reduce repetition (1.18 is a moderate penalty).
generation_config.temperature = 0.0000001          # to control randomness in output. Lower value(~0): more deterministic. Higher value(>1): more randomness/creative in response

In [None]:
!nvidia-smi

## Step # 04
Generate simple response from the Loaded Llama-2-7B model. 

In [None]:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"         #Disabling parallelism to avoid deadlocks
#Generate Simple Prediction from LLM
#Load encoding to Model's Device
encoding=encoding.to(model.device)

!time  #measuring execution time
with torch.no_grad():   #disabling gradient calculation
    output=model.generate(     #generating text from LLM
        input_ids=encoding.input_ids,
        attention_mask=encoding.attention_mask,
        generation_config=generation_config,
    )

In [None]:
output.shape

In [None]:
output

In [None]:
#Convert output to human readable string
output_text=tokenizer.decode(output[0],skip_special_tokens=True)  #skip special tokens like <eos>, <pad>, <s> etc    
print(output_text)

# Step # 05: Instruction Tuned (Chat) Model
As you can see the generated output is not our desired response. This response needs to be more accurate and effective. For this purpose, we'll do Instruction Tuning on our model

In [None]:
import torch 
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
model_name="meta-llama/Llama-2-7b-hf"

#Load tokenizer
tokenizer=AutoTokenizer.from_pretrained(model_name)
#Load base Model
model= AutoModelForCausalLM.from_pretrained(model_name, device_map="auto",torch_dtype=torch.float16)

generation_config=GenerationConfig.from_pretrained(model_name)
generation_config.max_new_tokens= 512
generation_config.repetition_penalty=1.18
generation_config.temperature=0.0000001

# Step # 07: Text Generation Pipeline
Now, we'll create text generation pipeline to get our desired responses in real-time
- TextStreamer: streams model output token by token, allowing real-time text generation (mostly used for real-time applications like ChatBots, live AI interactions)
- pipeline: High-level API to setup text-generation models easily

In [17]:
from transformers import TextStreamer, pipeline
#creating a text streamer: generates text in real-time rather than waiting for full output
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
#Text-Generation Pipeline
llm = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    generation_config=generation_config,
    num_return_sequences=1,                 #return only 1 resp
    streamer=streamer                       #Streams the generated text live instead of returning it all at once.
)

In [None]:
output=llm("Who is the most important person in Deep Learning?")

In [None]:
output[0].keys()

# Step # 08: Prompt Format
We will prepare a structured chat prompt for Llama-2-7B LLM using HuggingFace tokenizer.apply_chat_template() function. 
- The model will be instructed to always reply using Ludacris-style slang (i.e. energetic, street slang, hip hop tone)
- This will control AI behvaior making it sound less formal and more like the rapper Ludacris 

In [20]:
#system prompt: Instruction to AI
system_prompt="Act and always reply using slang that Ludacris uses"

In [None]:
#Full Chat History --> Message List
messages = [{"role":"system", "content": system_prompt},
            {"role":"user", "content":"Who is the most important person in AI?"},]   

prompt=tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True) #Tokenize=False:Returns a readable text string instead of tokenized data
print(prompt)                                                                            #generation_prompt=True:Ensures the model knows when to generate text

In [None]:
output=llm(prompt)

# Step # 09: Content Generation
Now, we'll generate content using prompt and system-prompt

In [23]:
from typing import Optional

def predict(prompt: str, system_prompt: Optional[str] = None):
    messages = [
        {
            "role": "user",
            "content": prompt,
        }
    ]
    if system_prompt:
        messages.insert(0, {"role": "system", "content": system_prompt})
    prompt = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    return llm(prompt)

In [None]:
system_prompt="""
You're an expert WeightLoss Coach with 20+ years of experince with training people 
with all types of DietingPlans and Weight Loss exercises
"""

prompt = """
Outline 3 most important tips for quick weight loss
""".strip()

output=predict(prompt,system_prompt)

In [None]:
system_prompt = """
You're slavic connoisseur. You love everything slavic and understand
that it is superior (jokingly) to anything else.
"""

prompt = """
What is the most iconic dish that is prepared by slavic grandmothers?
""".strip()

output = predict(prompt, system_prompt)

# Step # 10: Coding
Now, we'll ask LLM to do some coding using prompt and system-prompt

In [None]:
!time
system_prompt = """
You're an experienced Python developer that writes efficient and readable code.
You always strive to use built-in libraries.
"""

prompt = """
Write a function that calculates the square sum of two numbers and divide it by 42
""".strip()

output = predict(prompt, system_prompt)


In [None]:
!time 
prompt="""
Write a function that fetched the daily prices of Tesla stock for the last week
""".strip()

output=predict(prompt,system_prompt)

# Step # 11: Analyze Tweets
Now, we'll ask LLM to do sentiment analysis of few tweets using prompt and system-prompt

In [None]:

system_prompt="""
You're expert social media analyst. When analyzing text, you always take into 
account the content and put heavy importance on the author
"""

!time
tweet="""
I hope that even my worst critics remain on Twiteer,
because that is what free speech means
-Elon Musk
"""

prompt=f"""
What is the meaning of this tweet? Do sentiment analysis. 
Rewrite it in words of Putin.
```
{tweet}
```
""".strip()

output=predict(prompt, system_prompt)
