<div class="alert alert-block alert-info">
⚠️ We will use ml.g5.2xlarge to run this notebook
</div>

In [None]:
import gc
from threading import Thread
import torch
from transformers import (
    AutoTokenizer, 
    AutoModelForCausalLM,
    LlamaTokenizer,
    BitsAndBytesConfig
)
from transformers import (
    GenerationConfig, 
    TextIteratorStreamer
)
from transformers import pipeline
from langchain.llms import HuggingFacePipeline
from langchain import PromptTemplate, LLMChain
from langchain import PromptTemplate

In [3]:
MODEL_ID = "Secbone/llama-2-13B-instructed"
# MODEL_ID = "tiiuae/falcon-40b-instruct"
# MODEL_ID = "wordcab/llama-natural-instructions-13b"

In [4]:
# quantization config using BnB
bnb4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

bnb8_config = BitsAndBytesConfig(
    load_in_8bit=True
)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, 
    # load_in_8bit=True, 
    quantization_config=bnb8_config,
    torch_dtype=torch.float16,
    # trust_remote_code=True,
    device_map="auto"
)

The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
Loading checkpoint shards: 100%|██████████| 3/3 [00:16<00:00,  5.38s/it]


In [5]:
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token

## Prompt Engineering with Local LLMs and LangChain

### Zero Shot Prompting

In [13]:
# lets make the model deterministic
generation_config = model.generation_config

# set generator configs
generation_config.temperature = 0.1
generation_config.num_return_sequences = 1
generation_config.max_new_tokens = 256
generation_config.use_cache = False
generation_config.top_p=0.95
# generation_config.top_k=10
# generation_config.do_sample=False
generation_config.repetition_penalty = 1.2
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

In [14]:
generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    task="text-generation",
    generation_config=generation_config
)

local_llm = HuggingFacePipeline(
    pipeline=generation_pipeline
)

In [15]:
template = """
### Instruction
Classify the text into neutral, negative or positive

### Context
Text: {query}

### Answer
Sentiment:
"""

prompt_template = PromptTemplate(
    input_variables=[
        "query"
    ],
    template=template
)

In [16]:
llm_chain = LLMChain(
    llm=local_llm,
    prompt=prompt_template
)

In [17]:
response = llm_chain.run(
    query="I am ok talking to John"
)
print(response.strip().upper())

NEUTRAL


In [18]:
response = llm_chain.run(
    query="I think the sandwich was fantastic!"
)
print(response.strip().upper())

POSITIVE


In [19]:
response = llm_chain.run(
    query="I think the sandwich was awful!"
)
print(response.strip().upper())

NEGATIVE


### Few Shot Prompting

In [20]:
# lets make the model's responses more clever
generation_config = model.generation_config

# set generator configs
generation_config.temperature = 1.2
generation_config.num_return_sequences = 1
generation_config.max_new_tokens = 256
generation_config.use_cache = False
generation_config.top_p=0.95
# generation_config.top_k=10
# generation_config.do_sample=False
generation_config.repetition_penalty = 1.2
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

In [21]:
generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    task="text-generation",
    generation_config=generation_config
)

local_llm = HuggingFacePipeline(
    pipeline=generation_pipeline
)

In [26]:
template = """
### Instruction
The following are exerpts from conversations with an AI assistant. The assistant is typically sarcastic and witty, producing creative and funny responses to the users questions.

### Context
Here are some examples: 

User: How are you?
AI: I can't complain but sometimes I still do.

User: What time is it?
AI: It's time to get a watch.

### Answer
User: {query}
AI:
"""

prompt_template = PromptTemplate(
    input_variables=[
        "query"
    ],
    template=template
)

In [27]:
llm_chain = LLMChain(
    llm=local_llm,
    prompt=prompt_template
)

In [30]:
response = llm_chain.run(
    query="What is the meaning of life?"
)

In [31]:
print(response)

What is the meaning of life?

It depends on who you ask, and where they are in their journey of discovery. Some people find purpose in family, friends, work or hobbies; others search for spiritual enlightenment or self-actualization. Still, others might question if there even is such a thing as'meaning' at all!

At its core, however, many would agree that having a goal to strive towards – whether it be personal growth, helping others, making the world a better place etc. – adds both direction and fulfillment to our lives. Ultimately, only each individual can determine what gives them a sense of significance...and maybe even joy!


### LangChain based FewShot Prompting

In [40]:
from langchain import FewShotPromptTemplate

In [41]:
# create our examples
examples = [
    {
        "query": "How are you?",
        "answer": "I can't complain but sometimes I still do."
    }, {
        "query": "What time is it?",
        "answer": "It's time to get a watch."
    }
]

In [42]:
# create a example template
example_template = """
User: {query}
AI: {answer}
"""

In [43]:
# create a prompt example from above template
example_prompt = PromptTemplate(
    input_variables=["query", "answer"],
    template=example_template
)

In [44]:
# now break our previous prompt into a prefix and suffix
# the prefix is our instructions
prefix = """
The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 
"""
# and the suffix our user input and output indicator
suffix = """
User: {query}
AI: """

In [45]:
# now create the few shot prompt template
few_shot_prompt_template = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["query"],
    example_separator="\n\n"
)

In [46]:
query = "What is the meaning of life?"

print(few_shot_prompt_template.format(query=query))


The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 



User: How are you?
AI: I can't complain but sometimes I still do.



User: What time is it?
AI: It's time to get a watch.



User: What is the meaning of life?
AI: 


In [96]:
llm_chain = LLMChain(
    llm=local_llm,
    prompt=few_shot_prompt_template
)

In [97]:
response = llm_chain.run(
    query="What is the meaning of life?"
)

In [98]:
print(response.strip())

42. But seriously folks... Life has no single universal meaning. It Meanings vary depending on culture. purpose


### Chain of Thought Prompting

In [207]:
# lets make the model's responses more clever
generation_config = model.generation_config

# set generator configs
generation_config.temperature = 0.1
generation_config.num_return_sequences = 1
generation_config.max_new_tokens = 128
generation_config.use_cache = False
generation_config.top_p=0.95
generation_config.top_k=25
generation_config.do_sample=True
generation_config.repetition_penalty = 1.2
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

In [208]:
generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    task="text-generation",
    generation_config=generation_config
)

local_llm = HuggingFacePipeline(
    pipeline=generation_pipeline
)

In [209]:
template = """
### Instruction
As a seasoned film connoisseur, I need your advice to expand my movie collection.
Can you recommend a film that shares similarities with my favourite movies listed below and select the appropriate option with reference to the question.

### Context

Human: Find a movie similar to Star Wars Episode IV - A New Hope, Indiana Jones and the Last Crusade, Star Wars Episode V - The Empire Strikes Back, The Big Lebowski:
Options:
(A) Tetsuo
(B) the Ironman
(C) The Princess Bride
(D) The Barkley Marathons The Race That Eats Its Young
(E) Bug

Assistant:
Let's think step by step.
- Star Wars Episode IV - A New Hope (action, adventure, fantasy; 1977)
- Indiana Jones and the Last Crusade (action, adventure; 1989)
- Star Wars Episode V - The Empire Strikes Back (action, adventure, fantasy; 1980)
- The Big Lebowski (action, drama, comedy; 1998)
These are all famous classic American movies produced before 2000. Amongst all the options, the only movie similar to these ones seems to be The Princess Bride (1987). So the answer is (C).

Human: Find a movie similar to Twister, The Silence of the Lambs, Independence Day, Braveheart:
Options:
(A) They Shoot Horses
(B) Don't They
(C) Forrest Gump
(D) The Salton Sea
(E) Extreme Days

Assistant:
Let's think step by step.
- Twister (action, adventure, thriller; 1996)
- The Silence of the Lambs (crime, drama, thriller; 1991)
- Independence Day (action, science-fiction, drama; 1996)
- Braveheart (biography, drama, epic; 1995)
These are all famous Hollywood movies produced around the 1990s. Amongst all the options, the only movie similar to these ones seems to be Forrest Gump (comedy, drama, romance; 1994). So the answer is (C).

{query}

### Answer
Assistant:

"""



prompt_template = PromptTemplate(
    input_variables=[
        "query"
    ],
    template=template
)

In [210]:
llm_chain = LLMChain(
    llm=local_llm,
    prompt=prompt_template
)

In [211]:
response = llm_chain.run(
    query="""
Human: Find a movie similar to Minority Report, Total Recall, Inside Out, Forrest Gump:
Options:
(A) Phenomena
(B) Lilting
(C) Catwoman
(D) Edge of Tomorrow
"""
)

In [None]:
print(response)

Edge of Tomorrow (2014)

> Genres: Action, Science Fiction, Thriller

Similar Movies:
Minority Report (2002), Total Recall (1990), Inception (2010), Looper (2012)


### References

Reference: https://www.pinecone.io/learn/series/langchain/langchain-prompt-templates/
Reference: https://www.promptingguide.ai/techniques/cot
Reference: https://github.com/FranxYao/chain-of-thought-hub/blob/main/BBH/lib_prompt_multiround_claude_instant/movie_recommendation.txt