## Intro to Prompt Engineering

**yt link** - `https://youtu.be/je6AlRP2RjI`

In [1]:
!pip install torch transformers>=4.40.1 datasets>=2.18.0 langchain>=0.1.17

### importing necessary libraries

In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [3]:
## loading the llm and tokenizer

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map = 'cuda',
    torch_dtype = 'auto',
    trust_remote_code = True
)

tokenizer = AutoTokenizer.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct"
)

## create a pipeline to use the model
pipe = pipeline(
    "text-generation",
    model = model,
    tokenizer = tokenizer,
    return_full_text = False,
    max_new_tokens = 500,
    do_sample = False
)

config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

In [4]:
## prompt
messages = [
    {
        "role" : "user",
        "content" : "Write a joke about the black color!"
    }
]

## generate the text
response = pipe(messages)
print(response[0]["generated_text"])

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.


 Why don't black cats play poker?

Because they're afraid of getting caught with a "paws"!


In [5]:
## Apply Prompt Template
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize = False)
print(prompt)

<|user|>
Write a joke about the black color!<|end|>
<|endoftext|>


In [6]:
## Using a high temperature
output = pipe(messages, do_sample = True, temperature = 1)
print(output[0]["generated_text"])

 Imagine someone says, "I can't stand the color black, it's like wearing a perpetual storm cloud around your eyes." and I reply, "Don't worry, you're not in the dark. Just embrace the storm. After all, black is the only color that's both a nightmare and a reality check!"


In [7]:
## Using high p
output = pipe(messages, do_sample = True, top_p = 1)
print(output[0]["generated_text"])

 Why don't black cats ever play hide and seek?

Because good luck trying to find them if they choose not to participate in their own form of camouflage called "melanin magic."


## Advanced Prompt Engineering

In [8]:
# Text to summarize which we stole from https://jalammar.github.io/illustrated-transformer/ ;)
text = """In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformer outperforms the Google Neural Machine Translation model in specific tasks. The biggest benefit, however, comes from how The Transformer lends itself to parallelization. It is in fact Google Cloud’s recommendation to use The Transformer as a reference model to use their Cloud TPU offering. So let’s try to break the model apart and look at how it functions.
The Transformer was proposed in the paper Attention is All You Need. A TensorFlow implementation of it is available as a part of the Tensor2Tensor package. Harvard’s NLP group created a guide annotating the paper with PyTorch implementation. In this post, we will attempt to oversimplify things a bit and introduce the concepts one by one to hopefully make it easier to understand to people without in-depth knowledge of the subject matter.
Let’s begin by looking at the model as a single black box. In a machine translation application, it would take a sentence in one language, and output its translation in another.
Popping open that Optimus Prime goodness, we see an encoding component, a decoding component, and connections between them.
The encoding component is a stack of encoders (the paper stacks six of them on top of each other – there’s nothing magical about the number six, one can definitely experiment with other arrangements). The decoding component is a stack of decoders of the same number.
The encoders are all identical in structure (yet they do not share weights). Each one is broken down into two sub-layers:
The encoder’s inputs first flow through a self-attention layer – a layer that helps the encoder look at other words in the input sentence as it encodes a specific word. We’ll look closer at self-attention later in the post.
The outputs of the self-attention layer are fed to a feed-forward neural network. The exact same feed-forward network is independently applied to each position.
The decoder has both those layers, but between them is an attention layer that helps the decoder focus on relevant parts of the input sentence (similar what attention does in seq2seq models).
Now that we’ve seen the major components of the model, let’s start to look at the various vectors/tensors and how they flow between these components to turn the input of a trained model into an output.
As is the case in NLP applications in general, we begin by turning each input word into a vector using an embedding algorithm.
Each word is embedded into a vector of size 512. We'll represent those vectors with these simple boxes.
The embedding only happens in the bottom-most encoder. The abstraction that is common to all the encoders is that they receive a list of vectors each of the size 512 – In the bottom encoder that would be the word embeddings, but in other encoders, it would be the output of the encoder that’s directly below. The size of this list is hyperparameter we can set – basically it would be the length of the longest sentence in our training dataset.
After embedding the words in our input sequence, each of them flows through each of the two layers of the encoder.
Here we begin to see one key property of the Transformer, which is that the word in each position flows through its own path in the encoder. There are dependencies between these paths in the self-attention layer. The feed-forward layer does not have those dependencies, however, and thus the various paths can be executed in parallel while flowing through the feed-forward layer.
Next, we’ll switch up the example to a shorter sentence and we’ll look at what happens in each sub-layer of the encoder.
Now We’re Encoding!
As we’ve mentioned already, an encoder receives a list of vectors as input. It processes this list by passing these vectors into a ‘self-attention’ layer, then into a feed-forward neural network, then sends out the output upwards to the next encoder.
"""

# Prompt components
persona = "You are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.\n"
instruction = "Summarize the key findings of the paper provided.\n"
context = "Your summary should extract the most crucial points that can help researchers quickly understand the most vital information of the paper.\n"
data_format = "Create a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.\n"
audience = "The summary is designed for busy researchers that quickly need to grasp the newest trends in Large Language Models.\n"
tone = "The tone should be professional and clear.\n"
#text = "MY TEXT TO SUMMARIZE"  # Replace with your own text to summarize
data = f"Text to summarize: {text}"

# The full prompt - remove and add pieces to view its impact on the generated output
query = persona + instruction + data_format + audience + tone + data

In [9]:
messages = [
    {"role": "user", "content": query}
]
print(tokenizer.apply_chat_template(messages, tokenize=False))

<|user|>
You are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.
Summarize the key findings of the paper provided.
Create a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.
The summary is designed for busy researchers that quickly need to grasp the newest trends in Large Language Models.
The tone should be professional and clear.
Text to summarize: In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformer outperforms the Google Neural Machine Translation model in specific tasks. The biggest benefit, however, comes from how The Transformer lends itself to parallelizati

In [10]:
# Generate the output
outputs = pipe(messages)
print(outputs[0]["generated_text"])

 **Method Summary:**

- The Transformer model uses attention mechanisms to improve the training speed of deep learning models.
- It consists of an encoding component and a decoding component, each with multiple identical sub-layers.
- Encoders process input vectors through self-attention and feed-forward neural networks.
- Decoders use self-attention and focus on relevant parts of the input sentence.
- Embedding transforms input words into vectors, which are then processed by the encoders.
- The model allows for parallel processing in the feed-forward layers, enhancing training efficiency.

**Main Results:**

The Transformer model, as detailed in the paper "Attention is All You Need," represents a significant advancement in the field of deep learning, particularly in the realm of neural machine translation. By leveraging attention mechanisms, the model achieves superior performance compared to the Google Neural Machine Translation model for specific tasks. A key feature of the Transfor

## In Context Prompt Engineering

In [11]:
## Example of One Shot Learning [We give the model an example to learn from and then expect it to perform accroding to that]

messages = [
    {
        "role" : "user",
        "content" : "We have to find the next word of the sequence : FAG, GAF, HAI, IAH, ____. No Explanation is Required."},

    {
        "role" : "assistant",
        "content" : "Next word will be JAK."
    },

    {
        "role" : "user",
        "content" : "Find the missing word : SCD, TEF, UGH, VIJ, ___. No Explanation is Required."
    }
]

## checking whether all info all clearly stored
print(tokenizer.apply_chat_template(messages, tokenize = False))

<|user|>
We have to find the next word of the sequence : FAG, GAF, HAI, IAH, ____. No Explanation is Required.<|end|>
<|assistant|>
Next word will be JAK.<|end|>
<|user|>
Find the missing word : SCD, TEF, UGH, VIJ, ___. No Explanation is Required.<|end|>
<|endoftext|>


In [12]:
## Generate the Output
output = pipe(messages)
print(output[0]["generated_text"])

 The missing word is WKL.


## Chain Prompting - Breaking the problem into smaller chunks

Task - Suggest name of the product on the basis of features and suggest slogan of the campaign. And Generate a short sales pitch for the same product.

In [13]:
## Suggest Name and Slogan for the product

messages = [
    {
        "role" : "user",
        "content" : "Suggest a creative name for a chatbot that can used to learn Laws and Financial Literature and also suggest a slogan for the same. \
        Result should look like \
        Name : \
        Slogan : "
    }
]

## Looking into the results
product = pipe(messages, temperature = 1)
print(product[0]["generated_text"])

 Name: LegalEagleLearn

Slogan: "Empowering your financial and legal knowledge with precision and ease."


In [14]:
## Suggesting Sales Pitch for the same product
sales_messages = [
    {
        "role" : "user",
        "content" : f"Write a sales pitch in short for the product in not more than 200 words and key points to be in bullet points, '{product}'"
    }
]

## looking into the sales pitch
sales_pitch = pipe(sales_messages)
print(sales_pitch[0]["generated_text"])

 Introducing LegalEagleLearn, the ultimate online platform designed to empower your financial and legal knowledge with precision and ease. Our comprehensive courses cover a wide range of topics, from basic legal principles to advanced financial strategies, ensuring you have the expertise to navigate the complex world of law and finance.

Key Points:
- Accessible and affordable: LegalEagleLearn offers flexible learning options, allowing you to study at your own pace and on your own schedule.
- Expert-led courses: Our courses are taught by experienced professionals who have years of experience in the legal and financial fields.
- Interactive learning: Engage with our interactive learning tools, including quizzes, case studies, and real-life scenarios to enhance your understanding.
- Certification: Upon completion of a course, you will receive a certificate of achievement, which can be added to your resume or LinkedIn profile.
- Community support: Join our vibrant community of learners an

## Reasoning with Generative Models


----

*  LLMS are tend to think before answering anything

------------------------------------------------------

### 1. Chain of Thoughts Prompting [Prompts for Reasoning]

In [15]:
## Chain of Thoughts Problem

cot_problem = [
    {
        "role" : "user",
        "content" : "Rodger has 10 balls. And he bought 5 sets of balls and each set contains 3 balls. Find out how many balls Rodger may have now."
    },
    {
        "role" : "assistant",
        "content" : "Let's think step by steps. Rodger first has 10 balls and then he bought 5 sets of balls and each set contains 3 balls. So Total balls from the sets, 5 * 3 = 15. So Rodger finally have 15 + 10 = 25 balls."
    },
    {
        "role" : "user",
        "content" : "XYZ restaurants have 90 eggs for the day. And during an order preparation, they have utilized half of them and restaurant owner bought 5 times of eggs from the market. Then How many eggs will the restaurants have at the end of day?"
    }
]

In [16]:
## finiding the answer

cot_response = pipe(
    cot_problem
)

print(cot_response[0]["generated_text"])

 Let's think step by step. XYZ restaurants initially have 90 eggs. During the order preparation, they utilized half of them, so they used 90 / 2 = 45 eggs. This leaves them with 90 - 45 = 45 eggs. Then the restaurant owner bought 5 times the number of eggs they had left, so they bought 5 * 45 = 225 eggs. At the end of the day, the restaurants will have the remaining eggs from the initial stock plus the newly bought eggs, which is 45 + 225 = 270 eggs.


### 2. Zero-Shot Chain of Thought

* Simply asking to think step by step.

In [17]:
## Implementation of Zero Shot Chain of Thoughts

zero_shot_problem = [
    {
        "role" : "user",
        "content" : "You are good at mathematics calculation."
    },

    {
        "role" : "assistant",
        "content" : "Think step by step"
    },

    {
        "role" : "user",
        "content" : "XYZ restaurants have 89 eggs for the day. And during an order preparation, they have utilized half of them and restaurant owner bought 5 times of eggs from the market. Then How many eggs will the restaurants have at the end of day? And make sure that final no of eggs will be integer"
    }
]

## Final Output
zero_shot_response = pipe(
    zero_shot_problem
)

print(zero_shot_response[0]["generated_text"])

 Let's break down the problem step by step:

1. XYZ restaurants start with 89 eggs.
2. They utilize half of them for order preparation. So, they use 89 / 2 = 44.5 eggs. Since we need an integer value, we'll round up to 45 eggs (as they can't use half an egg).
3. After using 45 eggs, they are left with 89 - 45 = 44 eggs.
4. The restaurant owner buys 5 times the remaining eggs from the market. So, they buy 5 * 44 = 220 eggs.
5. Finally, we add the eggs bought from the market to the remaining eggs in the restaurant: 44 + 220 = 264 eggs.

At the end of the day, XYZ restaurants will have 264 eggs.


## Tree of Thoughts

In [18]:
# Zero-shot Chain-of-Thought
zeroshot_tot_prompt = [
    {"role": "user", "content": "Imagine three different experts are answering this question. All experts will write down 1 step of their thinking, then share it with the group. Then all experts will go on to the next step, etc. If any expert realises they're wrong at any point then they leave. The question is 'The cafeteria had 25 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?' Make sure to discuss the results."}
]

In [19]:
# Generate the output
outputs = pipe(zeroshot_tot_prompt)
print(outputs[0]["generated_text"])

 Expert 1:
Step 1: Start with the initial number of apples, which is 25.

Expert 2:
Step 1: Subtract the number of apples used for lunch, which is 20.
Step 2: Add the number of apples bought, which is 6.

Expert 3:
Step 1: Start with the initial number of apples, which is 25.
Step 2: Subtract the number of apples used for lunch, which is 20.
Step 3: Add the number of apples bought, which is 6.

Results:
All three experts arrived at the same answer. The cafeteria has 11 apples left (25 - 20 + 6 = 11).


#### Solving Complex Problems with `Tree-of-Thoughts`

In [20]:
zeroshot_tot_prompt = [
    {
        "role": "user",
        "content": (
            "Imagine three different experts are working together to solve this problem. "
            "The goal is to combine the numbers 4, 9, 10, and 13 using arithmetic operations (+, -, *, /) to get exactly 24 And a number can be used only once."
            "All experts will write down one step of their reasoning at a time and share it with the group. "
            "If any expert realizes their reasoning is incorrect at any point, they will stop and leave. "
            "The group will continue step by step until they either reach the solution or determine it is impossible. "
            "Make sure the experts discuss and evaluate each result before proceeding to the next step."
        )
    }
]


In [21]:
# Generate the output
outputs = pipe(zeroshot_tot_prompt)
print(outputs[0]["generated_text"])

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


 Expert 1: Let's start by looking at the numbers we have: 4, 9, 10, and 13. We need to combine them using arithmetic operations to get exactly 24.

Expert 2: I think we should try to use multiplication and addition since they can help us reach higher numbers more easily.

Expert 3: I agree. Let's see if we can find a combination that works.

Expert 1: I'll start by trying to multiply two of the numbers together. If I multiply 9 and 4, I get 36.

Expert 2: That's a good start. Now we need to subtract or add something to get to 24.

Expert 3: If we subtract 12 from 36, we get 24. So, we can use the equation (9 * 4) - 12 = 24.

Expert 1: That's correct. We have successfully combined the numbers 4, 9, 10, and 13 using arithmetic operations to get exactly 24.

Expert 2: Great job, team! We found a solution using multiplication and subtraction.

Expert 3: Yes, and we used each number only once. Well done!
