## Prompt Engineering

> Through prompt engineering, we can design these prompts in a way that enhances the quality of the generated text.

In [4]:
#### Load a text generation model

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline



In [5]:
### load model and tokenizer

model_name = "microsoft/Phi-3-mini-4k-instruct"

model = AutoModelForCausalLM.from_pretrained(model_name,
                                            device_map="cuda",
                                            torch_dtype="auto",
                                            trust_remote_code=True)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [6]:
tokenizer = AutoTokenizer.from_pretrained(model_name)

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

In [9]:
pipe = pipeline("text-generation",
               model=model,
               tokenizer=tokenizer,
               return_full_text=False,
               max_new_tokens=500,
                do_sample=False)

Device set to use cuda


In [10]:
###   prompt

messages = [
    {"role":"user",
    "content":"write a joke about chickens"}
]

### Generate the output

output = pipe(messages)
print(output[0]["generated_text"])


The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48
You are not running the flash-attention implementation, expect numerical differences.


 Why did the chicken join the band? Because it had the drumsticks!


### Chat Templates
> Transformer pipeline first convert our messages in to specific chat templets


In [12]:
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False)
print(prompt)

<|user|>
write a joke about chickens<|end|>
<|endoftext|>


In [15]:
output = pipe(messages, do_sample=True, temperature=1)
print(output[0]["generated_text"])

### Note that every time you rerun this piece of code, the output will change! temperature introduces stochastic behavior since the model now randomly selects tokens.

 Why did the chicken go to the séance? To get to the next level of egg-sistance!


In [16]:
### top_k parameter controls exactly how many tokens the LLM can consider

In [17]:
output = pipe(messages, do_sample=True, top_p=1)
print(output[0]["generated_text"])

 Why don't secret agents get chickens for spies?


Because even a hen needs her feathers, and spies know that no operation can go chicken if you're spying "eggs-ample"!


### Usecase and value of Temperature and top_p

| Example use case | Temperature | top_p |Description |
|:--------:|:--------:|:--------:|:--------:|
|  Brainstorming session   |  High  |  High   | High randomness with large pool of potential tokens. The results will be highly diverse, often leading to very creative and unexpected results   |
|  Email generation        |  Low   |  Low   |Deterministic output with high probable predicted tokens. This results in predictable, focused, and conservative outputs.    |
|  Creative writing        |  High   |  Low   |High randomness with a small pool of  potential tokens. This combination produces creative outputs but still remains coherent.    |
| Translation              |  Low   | High   | Deterministic output with high probable predicted tokens. Produces coherent output with a wider range of vocabulary, leading to outputs with linguistic variety   |

### Intro to prompt engineering

###### Instruction-Based Prompting 
> prompting is often used to have the LLM answer a specific question or resolve a certain task. This is referred to as instruction-based prompting.

#### Component for good Prompt
#### Basic componernts:
> 1. Specificity: Accurately describe what you want to achieve. Instead of asking the LLM to “Write a description for a product” ask it to “Write a description for a product in less than two sentences and use a formal tone.”

> 2. Hallucination: LLMs may generate incorrect information confidently,which is referred to as hallucination. To reduce its impact, we can ask the LLM to only generate an answer if it knows  the answer. If it does not know the answer, it can respond with “I don’t know.”

> 3. Order: Either begin or end your prompt with the instruction. Especially with long prompts, information in the middle is often forgotten.1 LLMs tend to focus on information either at the beginning of a prompt (primacy effect) or the end of a prompt (recency effect).


####  Advanced Prompt Engineering

> 1. Persona: Describe what role the LLM should take on. For example, use  “You are an expert in astrophysics” if you want to ask a question about astrophysics
> 2. Instruction: The task itself. Make sure this is as specific as possible. We do not want to leave much room for interpretation.
> 3. Context: Additional information describing the context of the problem or task. It answers questions like “What is the reason for the instruction?”
> 4. Format: The format the LLM should use to output the generated text. Without it, the LLM will come up with a format itself, which is troublesome in automated systems.
> 5. Audience: The target of the generated text. This also describes the level of the generated output. For education purposes, it is often helpful to use ELI5 (“Explain it like I’m 5”).
> 6. Tone: he tone of voice the LLM should use in the generated text. If you are writing a formal email to your boss, you might not want to use an informal tone of voice.
> 7. Data: The main data related to the task itself


##### Prompt components Examples :
> 1. persona = "You are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.
> 2. instruction = "Summarize the key findings of the paper provided.
> 3. context = "Your summary should extract the most crucial points  that can help researchers quickly understand the most vital information of the paper
> 4. data_format = "Create a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.
> 5. audience = "The summary is designed for busy researchers that  quickly need to grasp the newest trends in Large Language  Models.
> 6. tone = "The tone should be professional and clear.
> 7. data = f"Text to summarize: 


#### In-Context Learning: Providing Examples

> This comes in a number of forms depending on how many examples you show the LLM.
> 1. Zero-shot prompting does not leverage examples,
> 2.  one-shot prompts use a single example, and
> 3.  few-shot prompts use two or more examples.

In [20]:
one_shot_prompt = [ { "role": "user", "content": "A 'Gigamuru' is a type of Japanese musical  instrument. An example of a sentence that uses the word Gigamuru is:"}, 
                { "role": "assistant", "content": "I have a Gigamuru that my uncle gave me as a  gift. I love to play it at home."},
                    { "role": "user","content": "To 'screeg' something is to swing a sword at it. An example of a sentence that uses the word screeg is:"}
                   ]
print(tokenizer.apply_chat_template(one_shot_prompt, tokenize= False))

<|user|>
A 'Gigamuru' is a type of Japanese musical  instrument. An example of a sentence that uses the word Gigamuru is:<|end|>
<|assistant|>
I have a Gigamuru that my uncle gave me as a  gift. I love to play it at home.<|end|>
<|user|>
To 'screeg' something is to swing a sword at it. An example of a sentence that uses the word screeg is:<|end|>
<|endoftext|>


In [22]:
## Generate the output
outputs = pipe(one_shot_prompt)
print(output[0]["generated_text"])

 Why don't secret agents get chickens for spies?


Because even a hen needs her feathers, and spies know that no operation can go chicken if you're spying "eggs-ample"!


####  Chain Prompting: Breaking up the Problem.

> let us say we want to use an LLM to create a product name, slogan, and sales pitch for us based on a number of product features. 

In [24]:
 # Create name and slogan for a product

product_prompt = [ {"role": "user",
                    "content": "Create a name and slogan for a  chatbot that leverages LLMs."} ]
outputs = pipe(product_prompt)
product_description = outputs[0]["generated_text"]
print(product_description)

 Name: ChatSage
Slogan: "Unleashing the power of AI to enhance your conversations."


> we can use the generated output as input for the LLM to generate a sales pitch

In [26]:
 # Based on a name and slogan for a product, generate a sales  pitch
sales_prompt = [ {"role": "user", "content": f"Generate a very short sales pitch for the following product: ' { product_description }'"}]
outputs = pipe(sales_prompt)
sales_pitch = outputs[0]["generated_text"]
print(sales_pitch)

 Introducing ChatSage, the ultimate AI companion that revolutionizes your conversations. With our cutting-edge technology, we unleash the power of AI to enhance your interactions, making every conversation more engaging and meaningful. Experience the future of communication with ChatSage today!


####  Reasoning with Generative Models