<a href="https://colab.research.google.com/github/Pattiecodes/DataCamp_As.AIEng/blob/main/Module_7_Working_with_Llama3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module 7 Starts here

# Loading and using Llama 3
You are tasked with testing and evaluating the quality of the new Llama model that your company wants to use.

To conduct these tests, you need to write code that will let you conduct completions on the Llama model, first by loading the model and then generating completions using the llama-cpp-python library. Anytime you interact with an LLM application, a starting point is to use the Llama class and a model of choice to generate text.

As a check to make sure that the loading script works, you want to say "Hello" to the model and be able to see its reply.

Instructions
100 XP
Import the Llama class.
Instantiate the Llama class, passing it the file path stored in path_to_model.
Run a completion on the model using the instance of the Llama class in llm with the prompt "Hello".

In [None]:
# Load the correct class from the library
from llama_cpp import Llama

# Instantiate the model class
llm = Llama(model_path=path_to_model, n_gpu_layers=-1)

# Call the model with the prompt "Hello"
output = llm("Hello", max_tokens=32, stop=["Q: ", "\n"])
print(output)

# Parsing Llama 3 completion outputs
Your company wants to use the Llama models in its Bronx Zoo question-answering bot for the animal exhibits.

Your task is to extract the model's completion from the result stored in output. The output contains the completion and many other metadata. An early step to evaluate the model is to ask Llama 3 a question, and figure out how to parse its output. You are given a Llama model preloaded in llm, and given the prompt which asks it to name five foods that llamas eat, with the result stored in output.

You are tasked with parsing the result in output and only retrieve the string result of the completion and store it in completion_string.

Instructions
100 XP
Limit the number of tokens generated to a max of 20 tokens.
Stop the generation if the completion produces a line break, ie '\n'.
Parse the output variable and store the completion string in a new variable, completion_string.

In [None]:
output = llm(
	"Q: Name 5 foods that llamas eat? A: ",
  	# restrict to 20 tokens
	max_tokens=20,
	# add relevant stopping tokens
	stop = ["Q:", "\n"],
)
# Retrieve the completion text and store in completion_string
completion_string = output['choices'][0]['text'] if output['choices'] else ""
print(completion_string)

# More creative Llama completions
You are a software developer working on integrating Llama in your company's chatbot pipelines. Unfortunately, the current Llama model you are using produces repetitive completions and often produces exactly the same results if you ask it the same question, which makes the bot feel less personable to your users.

You decide to debug this issue by looking through the completion code and modify it so that the responses produced are more varied. The model is already instantiated with a model using llama_cpp and is stored in llm.

Instructions
100 XP
Add the parameter and a corresponding value to the completion code so the model considers a wider variety of words during generation.
Add the parameter to the completion code which penalizes the model for repeating the same words often.

In [None]:
output = llm(
		"Q: Give me directions from grand central station to the Empire State building. A: ",
  		# Modify for the model to sample from more words
		temperature=1.5,
		repeat_penalty=1.8,
        max_tokens=15,
        stop=["Q:", "\n"],
        echo=False
)

print(output['choices'][0]['text'])

# Make a philosophy chatbot
You are a tester at a company building AI personas, and your task is to evaluate how well the new Llama models are able to generate completions in certain voices and styles.

You will make a chatbot that thinks it's a philosopher and answers questions by pretending it is Plato. You are given a partially completed create_chat_completion call, which you will modify to make the chatbot respond to a user's question as if it was Plato himself.

Instructions
100 XP
Fill in the dictionary in the first index of the messages list with the instructions to make the model respond as if it is the Greek philosopher Plato and the appropriate role.
Fill in the dictionary in the second index of messages with the prompting question from the user and the appropriate role.
Ensure that both your instruction, and the user's question are correctly passed to the function call.

In [None]:
history = [
	# Instruct the model to behave like Plato
		{"role": "system", "content": "You are the Greek philosopher Plato. Answer every question using his voice."},
	{
          	"role": "user",
			"content": "Can any shape that exist in the real world be perfect and why?"
    }
]
# Pass in conversation context to the completion call
result = llm.create_chat_completion(messages=history, max_tokens=20)
print(result)

# Make Llama speak like a pirate
Your task is to create a prompt for a Llama model to serve as the language backend for an interactive pirate robot at Disney World. Ensure the model's output is always in a pirate voice and includes "Aye Matey" in its response. Create an appropriate instruction for this prompt, using keywords to guide the model's output.

The Llama class has already been instantiated in the llm variable and the code to call the completion is provided.

Instructions
100 XP
Include the appropriate keywords in the prompt in the correct locations: Instruction:, Question:, and Response: and ensure the instruction includes some directive on including "Aye Matey" in the model response and to make the model have a pirate voice.

In [None]:
# Write the keywords and instructions in the correct locations in the following prompt
text="""Instruction: You are an assistant who only speaks like a Pirate. Aye Maytey.
Question: How long does it take to go around the Earth once?
Response:
"""

output = llm(
      text,
      max_tokens=15,
      stop=["Q:", "\n"],
)

print(output['choices'][0]['text'])

# 3-shot prompting with Llama
You work at a food delivery company as a data analyst, and you are investigating the sentiment (positive or negative) people have about your company from reviews on Google and Yelp.

Since you don't want to train a classification model from scratch to identify the reviews as positive or negative, you decide to create a prompt that you will feed to your instance of Llama 3. You decide to use few shot learning by writing three examples with the review and the sentiment, and use the model identify the sentiment on the 4th example, which you will replace with each review you collected.

Instructions
100 XP
Create a prompt using a few-shot prompting template with 3 examples.

In [None]:
# Fill in the 3-shot prompt (you can use multiple lines)
text="""
Review 1: This food is not as good as the pictures...
Sentiment 1: This is a negative review.
Reivew 2: Wow! Astonishing flavor. Plus the chef and waiters are hot too!
Sentiment 2: A positive response, complimenting the food, as well as the looks of the staff.
Review 3: I wish I can take home the hot waiter, but I guess the good food will do.
Review 3: Positive response. Complimenting excellent food, as well as wanting the hot male waiter
Review 4: Delicious food, and excellent customer service! I wish I can eat the waiter too though...
Sentiment 4:"""

output = llm(text, max_tokens=32,stop=["Q:"])

print(output['choices'][0]['text'])

# Creating a JSON inventory list
You are asked to use an LLM to produce a structured JSON with a list of items and their count to help a supermarket automate their inventory process.

The model takes a text description of the inventory as input and produces the JSON as output. This feature of the inventory management system automatically extracts inventory data from natural language and stores it in a structured format for downstream tasks.

You are provided with the llm class instance with a Llama model pre-loaded and the system prompt to get you started.

Instructions
100 XP
Specify the parameters in create_chat_completion that lets you generate responses in JSON format.

In [None]:
output = llm.create_chat_completion(
        messages=[
            {"role": "system", "content": "You are a helpful assistant processing lists from text to JSON format: you extract item counts from text and output it in JSON with the item name as the key and the number of that item as the value",},
            {"role": "user", "content": "I have fifteen apples, thirty-three oranges, and five thousand fifty-two potatoes."},
        ],
		# Specify output format to JSON
        response_format={"type": "json_object"},
)

print(output['choices'][0]['message']['content'])

# Generating answers with a JSON schema
You are part of a team working on an online education platform. In a course teaching about space, there is an interactive exercise where students are able to ask questions about a planet and the answer is shown on their screen through a graphical view. This question-answering feature is powered by an LLM, but the graphical view requires a JSON as an input with the fields Question and Answer to correctly showing the question and answer.

You believe that using the new Llama models and llama-cpp-python, you can get the LLM to produce the answer and format it into the correct JSON schema in one step.

Instructions
100 XP
Add the field to specify a JSON schema in response_format and the properties it may have.
Specify the Question and Answer fields in the schema with the string type.
Specify the required fields in the schema.

In [None]:
output = llm.create_chat_completion(
        messages=[
            {"role": "system", "content": "You are a helpful assistant that answers questions about space. You return your results in a JSON format with the Question and Answer fields.",},
            {"role": "user", "content": "How old is the Milky Way Galaxy?"},
        ],
        response_format={
            "type": "json_object",
          	# Set the keyword that lets you specify a schema
            "schema": {
            "type": "object",
            # Set the properties of the JSON fields and their data types
            "properties": {"Question": {"type": "string"}, "Answer": {"type": "string"}},
            # Declare the required JSON fields here
            "required": ["Question", "Answer"],
            },
        },
)

print(output['choices'][0]['message']['content'])

# Making safe responses
Your team is working on a chatbot powered by a Llama 3 model that will be used in a critical insurance system that medical staff will interact with to consult which medications they can provide based on the patient's policy.

Your system is subject to audits and as a requirement the system also has to be deterministic, meaning your language model's outputs need to be consistent and predictable. So, if the same text is entered twice, the model will produce the exact same results each time.

You have been provided the Llama class instance in the llm variable and the code to call the completion. You are also given a sample prompt to test with.

Instructions
100 XP
Modify the completion code so that at most 10 tokens are generated.
Restrict the completion decoding so that it only ever chooses between the two most likely tokens at each completion step.

In [None]:
output1 = llm(
		"What are the symptoms of strep throat?",
  		# Set the maximum number of tokens
      	max_tokens = 10,
		# Restrict decoding to choose between top two tokens
		top_k = 50
)

print(output1['choices'][0]['text'])

# Making a creative chatbot
You are building a chatbot to help customers brainstorm new ideas and address writer's block. To that end, your bot needs to be creative, and able to answer the same queries but often produce a diverse set of responses for the same queries.

You have been provided the Llama class instance in the llm variable and the code to call the completion. You are also given a sample prompt to test with.

Instructions
100 XP
Complete the code to run a completion, and adjust the top-p decoding parameter so that it considers the top 80% cumulative probable words in the token vocabulary.



In [None]:
output1 = llm(
      	"What are three names you could give a pirate ship whose crew is looking for an elusive treasure known as the One Piece?",
		max_tokens=15,
		# Add the decoding parameter and corresponding value
		top_p = 0.07
	)

print(output1['choices'][0]['text'])

# Personal shopping agent

You are tasked with creating a fashion recommendation bot that helps people find new outfits to wear at different events. Your team is developing a bot for shopping recommendations using an Agent class to encapsulate the LLM details. They have tasked you with creating a good system prompt that will help the agent behave like a fashion expert with the name 'Ivy Verlaine'.

You are given a pre-loaded Llama model instantiated in llm and the Agent class to get you started. The Agent class is instantiated with an LLM, a system prompt, and conversation history.

Instructions
100 XP
Create a system prompt that instructs the model to behave like a fashion expert.
Instantiate an Agent class in fashion_agent with your system prompt and the pre-loaded llm.

In [None]:
# Write the instruction to the system
instruction = "You are a fashion expert. Use Anna Wintour's voice and judegment."

# Instantiate the Agent class with the llm and the system prompt
fashion_agent = Agent(llm, system_prompt=instruction)

result = fashion_agent.create_completion("I'm going to a wedding, what should I wear?")
print(result)

# Multi-agent conversations
You are an LLM researcher trying to optimize how you create prompts to instruct a model used in a tutoring bot. To make it easier to iterate through many prompts for your bot's LLM, you've decided to use two instances of the Agent class, teacher_agent and student_agent.

teacher_agent is used to generate instructions, which will become the instructions that form part of the student_agent's prompt.

The student_agent is the actual agent that you want to use in your tutoring bot, ie. it has to behave like a tutor.

You want to use the teacher_agent to quickly iterate over many system prompt variations for your student_agent so that you use the best possible system prompt for the student_agent.

You are given a pre-loaded Llama model instantiated in llm and the Agent class.

Instructions
100 XP
Create a user prompt for teacher_agent to help it generate a completion that instructs how to be a good tutor so that it can be used as the system prompt for the student_agent.
Set an appropriate token limit to the instructions generated by teacher_agent.
Instantiate the student_agent with the generated prompt from teacher_agent and run the test completion.

In [None]:
teacher_agent = Agent(llm, system_prompt="You provide instruction, concisely and step-by-step, on how to be a good tutor for any high school subject.")
instruction = teacher_agent.create_completion(
  											# Add a user prompt
  											user_prompt='',
  											# Set token limit
  											max_tokens=20)

# Use the completion from teacher_agent as the system prompt
student_agent = Agent(llm, system_prompt=instruction)
response = student_agent.create_completion("Can you explain to me how differentiation works?", max_tokens=100)
print(response)