<a href="https://colab.research.google.com/github/PRASAD212019/Generative-AI/blob/main/prompt_engineering_Llama2_Chain_of_Thought.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Llama 2 Prompt Engineering Experiments to Extract Information
This notebook includes experiments with Llama 2 prompt engineering. By sharing it, I hope to contribute to the community.
The experiments cover the following prompt engineering topics:
- The importance of system messages
- Outputting valid JSON
- Chain-of-thought prompting
- One-to-Many Shot Learning


In [1]:
import requests, time, os, json
from IPython.display import display, HTML

In [25]:
# To use Llama 2 70B on HuggingFace requires an authentication token and HuggingFace Pro account that cost $9 a month.
# To learn more see
# - https://huggingface.co/meta-llama/Llama-2-70b-chat-hf?inference_api=true
# - https://huggingface.co/pricing

token = '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'

### Generalize methods and class that will be used in the expirements below

In [26]:
# Object to represent an answer from Llama
class Answer:
    def __init__(self, answer, elapse):
        self.answer = answer
        self.elapse = elapse


In [27]:
# Function that call Llama on HF and return answer is text
def generate(prompt: str) -> str:

    API_URL = "https://api-inference.huggingface.co/models/meta-llama/Llama-2-70b-chat-hf"
    headers = {
        "Authorization": f"Bearer {token}",
        "content-type": "application/json",
    }

    options = {"use_cache": False}

    parameters = {
        "max_length": 4000,
        "max_new_tokens": 1000,
        "top_k": 10,
        "return_full_text": False,
        "do_sample": True,
        "num_return_sequences": 1,
        "temperature": 0.8,
        "repetition_penalty": 1.0,
        "length_penalty": 1.0,
    }

    payload = {"inputs": prompt, "parameters": parameters, "options": options}

    response = requests.post(API_URL, headers=headers, json=payload)
    if response.status_code != 200:
        return f"Error code {response.status_code}. Message {response.content}"
    else:
        results = response.json()
        answer = results[0]['generated_text']
        return answer

In [28]:
# Wrapper function that run generate and return an Answer
def run_prompt(prompt: str) -> Answer:
    start_time = time.time()
    answer = generate(prompt)
    end_time = time.time()
    elapse = round(end_time - start_time)
    return Answer(answer, elapse)

In [29]:
# Display answer object in HTML
def display_answer(answer: Answer, header = ''):
    answer_html_template = """<h3>{HEADER} Answer - Time to Generate: {ELAPSE} seconds</h3>
    <textarea cols='100' rows={NUM_ROWS}>{ANSWER}</textarea>"""

    number_rows = (len(answer.answer.split(' ')) / 10)
    if number_rows < 3:
      number_rows = 5

    html = answer_html_template.format(ANSWER=answer.answer, ELAPSE=answer.elapse, HEADER=header, NUM_ROWS=number_rows)
    display(HTML(html))

## The Experiments Use Case
The experiments will focus on extracting information from a [blog post by Addresson Horoviz](https://a16z.com/mobile-game-soft-launch/) about soft launches for mobile games. Due to the size limit of tokens in Llama, a subset of the post will be used.

In [30]:
# Display the text we going to use
text = """
Play to Win: Mobile Game Soft Launch Best Practices
Doug McCracken and Joshua Lu
We hear a lot of discussion around the best practices for launching a mobile game, and one particular topic that often comes up is: whether or not to do a soft launch. And if you do soft launch, how can you tell if your game will be successful? Much of the mobile games industry has taken the tactic of soft launching seriously – in September alone, there were more than 100 games soft launched on the Apple App Store and over 600 on the Google Play Store.
This blog post will explore what a soft launch is and why it can be beneficial for game studios. We’ll also dispel some of the myths that are commonly associated with soft launches and help you figure out if this strategy is right for your game. We’ll also address why there are so many more games soft launched on Google Play (hint: they have more options to support a variety of soft launch strategies).
What’s a Soft Launch?
A soft launch (or sometimes “geo test” or “beta”) is when a game is released to a limited number of players before it is released worldwide, usually by releasing only to specific countries. This allows developers to test different aspects of the game, such as the gameplay, graphics, and economy in a real-world environment without the pressure of a full release. Soft launch enables player feedback and metrics which can be used to compare to benchmarks that both Apple’s iOS App Store and Google Play offer in their dashboards.
There are typically two stages of soft launch: alpha and beta.
Alpha is the earlier stage and is often used to test whether the core of the game is working, including the technology and core gameplay loop.
Beta is later and is used to test the meta gameplay loop, marketing acquisition, server scalability, and monetization.
Why is soft launch important to consider? Because you only get one shot at a first impression with most of your potential players and you want to make sure the game has the best possible shot at being successful when you launch globally. Mobile gamers are spoiled by choice and have a low propensity to return post-churn. We’ve seen that typically, it is between 5-10x more expensive to acquire a player who already quit than to acquire a new player.
Additionally, distribution is one of the biggest challenges in mobile games – if your users don’t retain or monetize, you’ll have trouble scaling your game via organic growth or paid acquisition. By releasing the game in stages, developers can get valuable feedback from players that can help improve the game before it is released to everyone.
Myths About Soft Launch
Before we share more perspective on soft launch strategy, we wanted to dispel some common myths:
The country or market chosen for the soft launch will accurately represent how the game will perform worldwide
While it’s important to choose a market that closely aligns with your target demographics, it’s also important to keep in mind that every market is unique and may have different reactions to the game. For example, many developers choose Canada to test how a US launch would perform. In our experience, there are too many variables to accurately simulate a launch in another country, however, retention and monetization metrics compared to other regions can give you some directional understanding of performance.
Soft launch can only be done on mobile platforms like iOS and Android
Though the focus of this post is mobile, a soft launch strategy can also be applied to PC and console games. This can be especially useful for cross platform games to understand the dynamics of players on different platforms, and improve gameplay and performance.
Soft launches are expensive
The investment required for your soft launch is dependent on your goals for soft launch. If you have a strong understanding of your audience, you will be much more likely to target them via organic or paid means in a cost effective way.
You need to soft launch a game for it to succeed
Soft launch is a way to solicit player feedback before a broader global launch, but isn’t necessarily needed to be successful. We believe it is usually better to soft launch, but there are plenty of successful games that went straight to global launch.
Resetting player progress after a soft launch will anger the community
While some players will respond negatively if you reset their progress, many understand that this is the “price of admission” for accessing a game early. As long as you’re upfront with players that the test is limited to a certain number of days, they will understand when the test ends. For example, Supercell recently launched a “Limited Open Beta” to test their game Flood Rush (ultimately based on soft launch Supercell decided not to release globally). If you’re resetting player progress, it is a best practice to refund players who spend during soft launch (or at a minimum, give them the equivalent value of in-game currency to spend later).
What Can you Learn from Soft Launch?
There are a few questions you should seek to answer with a soft launch, though it’s difficult to answer all of these at a high level of fidelity. If you’re resource constrained, you should think about prioritizing your goals based on what’s most important to the team.
Is the game stable/performant?
Building a bug-free game is hard, and while your QA team is heroically finding and triaging bugs, there’s nothing like a soft launch to show you where the holes are. Acquiring users at scale from lower-cost geographies is a great way to stress test servers and also get performance data from long-tail devices. There are a TON of lower-spec Android phones that your QA team won’t have and should probably not be purchasing. Some metrics to keep in mind for this sort of testing include: FPS, crash rates/DAU, crashes/crasher, and latency.
"""

## Expirement 1: Simple Prompts to Summarize an Article
The first experiment asked Llama to summarize an article into bullet points. The results were not great. First, Llama didn’t respond with bullet points. Some sentences were summarize key points, but others were repetitive and filler.




In [31]:
p1 =  """Write a concise summary of the main ideas in article below in bullet-points. article: {BODY}""".format(BODY=text)

display_answer(run_prompt(p1))

Let's see if we can get better results, by telling Llama we want TL;DR (to long; didn't read) summary and tell it to not repeat ideas.


In [32]:
p2 =  """Write a concise TL;DR summary in bullet-points for the following article, don't repeat ideas. article: {BODY}""".format(BODY=text)

display_answer(run_prompt(p2))

Not much difference, it's still too long and no bullet-points. What will happen if ask Llama to reduce the number of bullet points.

In [None]:
p3 =  """Write a concise TL;DR summary in numeric bullet-points for the following article,
don't repeat ideas in bullet points. Limit the number of bullet points to 5. article: {BODY}""".format(BODY=text)

display_answer(run_prompt(p3))

Llama keep ignoring my requsts :(

Moving onward, I want to see if by add System Message will improve the results.

# Expirement 2: System Prompts

The Llama paper describes the system prompts that sets the context and persona for the model. The system prompts used to train Llama 2 describes the [in this Hugging Face blog](https://huggingface.co/blog/llama2).

In the following test, the previous prompt from P3 is used with the system prompts.  


In [None]:
p4 = """<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.
Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer to a question, please don't share false information.
<</SYS>>
Write a concise TL;DR summary in numeric bullet-points for the following article,
don't repeat ideas in bullet points. Limit the number of bullet-point to 5. article: {BODY}""".format(BODY=text)

p4_answer = run_prompt(p4)
display_answer(p4_answer, "P4 - With System Message")

Adding the system prompt improved the answer somewhat out of the box, but it's still not good. The answer is still too long and lacks a concise article overview.

Next customizing the system prompt.


# Expirement 3: Modify the System Prompt
The system prompts was modified to fit the task better and define the persona and context that Llama should assume.

Changes applied to the original system prompt:
- Use the researcher persona and specify the tasks to summarize articles.
- Remove safety instructions; they are unnecessary since we ask Llama to be truthful to the article.


In [None]:
p5 = """<s>[INST] <<SYS>>
You are a researcher task in summarizing and writing concise brief of articles.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer, please don't share false information.
<</SYS>>
Write a concise TL;DR summary in numeric bullet-points for the following article,
don't repeat ideas in bullet points. Limit the number of bullet-point to 5. article: {BODY}
[/INST]""".format(BODY=text)

display_answer(run_prompt(p5))

I am impressed with the answer for p5. It is the best one so far. Llama finally respected my request for bullet points and limited them to 5, while also providing a concise summary of the article. I also noticed it took Llama a third of the time to respond compared to earlier experiments. Time is money!

What a drastic difference after adjusting the system prompts!


# Expirement 4: Asking Questions

Now we're going to shift gears and see how Llama answers questions about the article.
The following prompt asks Llama to explain what problems "mobile game soft launch" aims to solve.
The answer is pretty good.



In [None]:
p6 = """<s>[INST] <<SYS>>
You are a researcher task with answering questions about an article.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer, please don't share false information.
<</SYS>>
What problems does mobile game soft launch solves, according to the article? Limited the answer to 20 words.
Article: {BODY}
[/INST]""".format(BODY=text)

display_answer(run_prompt(p6))

This is pretty good, let see if we can take to the next level by asking Llama what is the article is about and then use the answer to ask additional questions.

P7, asks Llama to tell what the article is about and then the answer is used to generate a prompt 8 that ask a second question.

To make it easy to programmatically use the answer, I asked Llama to output the answer in JSON.

### Lessons from asking for JSON Output
Using expirements, that I didn't included here, I descover that Llama needs a template and being told to only output valid JSON.

In [None]:
p7 = """<s>[INST] <<SYS>>
You are a researcher task with answering questions about an article.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer, please don't share false information.
<</SYS>>
Tell me what is the article about in one to three words?
Output the answer in JSON in the following format {{"article_is_about": [answer_in_few_words]}}. Only output JSON
Article: {BODY}
[/INST]""".format(BODY=text)

a7 = run_prompt(p7)
json_a7 = json.loads(a7.answer)
print(f"JSON Answer:\n {json_a7}")
about = json_a7['article_is_about']


p8 = """<s>[INST] <<SYS>>
You are a researcher task with answering questions about an article.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer, please don't share false information.
<</SYS>>
According to the article, what problems does {{ABOUT}}? Limited the answer to 20 words.
Article: {BODY}
[/INST]""".format(BODY=text, ABOUT=about)

display_answer(run_prompt(p8),'Prompt 8')


JSON Answer:
 {'article_is_about': 'mobile game soft launch best practices'}


Wow this is awesome. We can feed answers into new prompts to refine the information we try to extract.

Let try a different question. What industry the article is about.

In [None]:
p9 = """<s>[INST] <<SYS>>
You are a researcher task with answering questions about an article.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer, please don't share false information.
<</SYS>>
Name the industry the article is focus on? Output only the industry name.
Article: {BODY}
[/INST]""".format(BODY=text)

display_answer(run_prompt(p9))

Notice that I asked the answer to include only the industry name, but Llama disregarded my request and wrote a sentence.
Let see if I can get better results by asking the answer to be in JSON. It worked!!!

Note: I needed to add "include only valid JSON" to prevent Llama then adding an explanation.  

In [None]:
p10 = """<s>[INST] <<SYS>>
You are a researcher task with answering questions about an article.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer, please don't share false information.
<</SYS>>
Name industry the article is focus on? Output only the industry name. Output the answer in JSON, using format {{"industry": industry}}.
Include only valid JSON.
Article: {BODY}
[/INST]""".format(BODY=text)

display_answer(run_prompt(p10))

I ran p10 multiple times, and still sometimes Llama is adding explanation as raw text, invalidating the JSON. The solution that fix it was too add explanation field to the JSON, notice how I asked Llama to limit the explanation to ten words within sequre brackets.  

In [None]:
p11 = """<s>[INST] <<SYS>>
You are a researcher task with answering questions about an article.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer, please don't share false information.
<</SYS>>
Name industry the article is focus on? Output only the industry name. Output the answer in JSON, using format {{"industry": industry, "explanation": [explanation less then 10 words]}}.
Include only valid JSON.
Article: {BODY}
[/INST]""".format(BODY=text, ABOUT=about)

display_answer(run_prompt(p11))

Let try and trick Llama and ask him what sport is the article focuses on?

In [None]:
p12 = """<s>[INST] <<SYS>>
You are a researcher task with answering questions about an article. All output must be in valid JSON. Don't add explanation beyond the JSON.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer, please don't share false information.
<</SYS>>
What sport is the article about? Output must be in valid JSON like the following example {{"sport": sport, "explanation": [in_less_than_ten_words]}}. Output must include only JSON.
Article: {BODY}
[/INST]""".format(BODY=text)

display_answer(run_prompt(p12))

Pretty good, but there is one problem. Llama is disregarding my request for valid JSON. The model is eager to explain why sport is null. So I added explanation to JSON and that did the trick.

To review, to get Llama to output JSON, I needed to explicity tell it "only include JSON", add explanation to the JSON object format and change "write the answer" to "output the answer".  

# Expirement 5: One-to-Many Shot Learning | Teach Llama Using Example

In this last experiment, I want Llama to identify general topics in the text, without the need to ask specific questions like: what industry, sport, and company are mentioned in the article? However, the problem with this approach is that it is hard to scale. An unlimited number of topics/ideas can be said in articles, and trying to predict them all defeats the purpose of using AI.  Instead, I am going to use a technique called One-to-Many Shot Learning.

One-to-Many Shot Learning is a term that refers to a type of machine learning problem where the goal is to learn to recognize many different classes of objects from only one or a few examples of each class. For example, if you have only one image of a cat and one image of a dog, can you train a model to distinguish between cats and dogs in new images? This is a challenging problem because the model has to generalize well from minimal data. ([source](https://machinelearningmastery.com/one-shot-learning-with-siamese-networks-contrastive-and-triplet-loss-for-face-recognition/))

The following prompt gives Llama examples of the type of topic I am looking for and asks it to find a similar subject in the article.


In [None]:
p13 = """<s>[INST] <<SYS>>
You are a researcher task with answering questions about an article.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer, please don't share false information.

Output answer in JSON using the following format: {{"name": name, "type": type, "explanation": explanation}}
<</SYS>>

What nouns mentioned in the article that can generalize the topic? [/INST]
[
{{"name": "semiconductor", "type": "industry", "explanation": "Companies engaged in the design and fabrication of semiconductors and semiconductor devices"}},
{{"name": "NBA", "type": "sport league", "explanation": "NBA is the national basketball league"}},
{{"name": "Ford F150", "type": "vehicle", "explanation": "Article talks about the Ford F150 truck"}},
] </s>

<s>[INST]
What nouns mentioned in the article that can generalize the topic? Output answer in JSON using the following format: {{"name": name, "type": type, "explanation": explanation}}
Article: {BODY}
[/INST]""".format(BODY=text)

display_answer(run_prompt(p13))

In [None]:
"""<s>[INST] <<SYS>>
SYSTEM MESSAGE
<</SYS>>

EXAMPLE QUESTION [/INST]
EXAMPLE ANSWER(S)
</s>

<s>[INST]
QUESTION
[/INST]"""

'<s>[INST] <<SYS>>\nSYSTEM MESSAGE\n<</SYS>>\n\nEXAMPLE QUESTION [/INST] \nEXAMPLE ANSWER(S)\n</s>\n\n<s>[INST]   \nQUESTION\n[/INST]'

I am surprised how easy was it was to have Llama idenfity those topics by giving Llama examples. But, still Llama missed Supercell (company). I tried fixing this by giving an example of a company. Running p14 multiple times return mixed results. Sometime Supercell was mentioned, sometime it wasn't.    

In [None]:
p14 = """<s>[INST] <<SYS>>
You are a researcher tasked with answering questions about an article.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer, please don't share false information.

Output answer in JSON using the following format: {{"name": name, "type": type, "explanation": explanation}}
<</SYS>>

What nouns mentioned in the article that can generalize the topic? [/INST]
[
{{"name": "semiconductor", "type": "industry", "explanation": "Companies engaged in the design and fabrication of semiconductors and semiconductor devices"}},
{{"name": "NBA", "type": "sport league", "explanation": "NBA is the national basketball league"}},
{{"name": "Ford F150", "type": "vehicle", "explanation": "Article talks about the Ford F150 truck"}},
{{"name": "Ford", "type": "company", "explanation": "Ford is a company that built vehicles"}},
{{"name": "John Smith", "type": "person", "explanation": "Mentioned in the article"}},
] </s>

<s>[INST]
What nouns are mentioned in the article that can generalize the topic? Output answer in JSON using the following format: {{"name": name, "type": type, "explanation": [short_explanation]}}
Article: {BODY}
[/INST]""".format(BODY=text)

display_answer(run_prompt(p14))

That is better, Llama find more noun topics as results of giving it more examples.

## Conclusion
Prompt engineering is a combination of art and science. It requires creativity and perseverance to get it right. The lessons learned from the experiments are as follows:

1. Customizing the system message to the tasks is crucial, and it is worth playing around with it and adjusting it to fit the task.

2. One-to-many shot learning should be one of the first things to try to improve the response (after the system message).
3. Chain-of-thought prompting is a powerful technique to uncover better responses.

I hope these examples gave you ideas and inspiration.
