# **Chapter 2: Question & Answer**

---
**Lesson:**

One of the most common use cases for LLMs is to answer questions and/or chat with users. They can be utilized for developing powerful chatbots due to their ability to understand and generate human-like responses, while optimized for low latency, high throughput and cost efficiency. This notebook focuses on Mistral 7B, but all concepts apply to Mixtral 8x7B as well.

> *Note: The examples and exercises below were derived from the [bedrock-mistral-prompting-examples](https://github.com/aws-samples/bedrock-mistral-prompting-examples/tree/main) repository.*
> 
---

First we will setup our dependencies

In [None]:
%%capture
#Install dependencies
%pip install --no-build-isolation --force-reinstall \
    "boto3>=1.28.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57"

#Import libraries, and set up Bedrock client
import json
import os
import sys

import boto3
import botocore.exceptions

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww

boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None)
)

modelId = "mistral.mistral-7b-instruct-v0:2" # change this to use a different version from the model provider
accept = 'application/json'
contentType = 'application/json'
outputText = "\n"

def chat_history_to_string(memory):
    history_str = ""
    for chat_item in memory:
        role = chat_item.get("role", "")
        content = chat_item.get("content", "")
        history_str += f"{role}: {content}\n\n"
    return history_str.strip()

def format_conversation(user_input, memory):
    
    history = chat_history_to_string(memory)
    
    prompt = f"""
    <s>[INST] You are a knowledgeable helpful AWS customer service assistant. You are helpful and provide general guidance from the context.[/INST]
    {history} 
    <s>[INST] {user_input} [/INST]
    """
    return prompt

def invoke_model_and_get_response(prompt_data, memory): 
    
    # The Mistral AI models have the following inference parameters:
    # Temperature - Tunes the degree of randomness in generation. Lower temperatures mean less random generations.
    # Top P - If set to float less than 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation.
    # Top K - Can be used to reduce repetitiveness of generated tokens. The higher the value, the stronger a penalty is applied to previously present tokens, proportional to how many times they have already appeared in the prompt or prior generation.
    # Maximum Length - Maximum number of tokens to generate. Responses are not guaranteed to fill up to the maximum desired length.
    # Stop sequences - Up to four sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
    body = json.dumps({ 
        'prompt': format_conversation(prompt_data, memory),
        'max_tokens': 400,
        'top_p': 0.7,
        'top_k': 50,
        'temperature': 0.7,
        "stop": ["</s>"]
    })

    try:
        response = boto3_bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
        response_body = json.loads(response.get('body').read().decode('utf-8'))
        outputs = response_body["outputs"]
        outputText = [output["text"] for output in outputs]

        #Save questions and answers in chat history
        memory.append({"role": "customer", "content": prompt_data})
        memory.append({"role": "assistant", "content": outputText})

        return outputText

    except botocore.exceptions.ClientError as error:
        if error.response['Error']['Code'] == 'AccessDeniedException':
            return (f"\x1b[41m{error.response['Error']['Message']}\
                    \nTo troubleshoot this issue please refer to the following resources.\
                     \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
                     \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")
        else:
            raise

# **Examples:**

Let's take a look at how Mistral responds to some basic questions, using the most current question and chat history. Please run the example cells below to get the response from Mistral 7B.

**Example 2.1 - How to select an EC2 instance type?**

In [None]:
memory = []

# create the prompt
prompt_data = "How to select an EC2 instance type?"

response = invoke_model_and_get_response(prompt_data, memory)

print(response)

**Example 2.2 - Using chat history to create a conversation**

In [None]:
#Review the current chat history. This will be used to give our LLM context for the user's follow up questions
print (f"Current chat history: {memory}")

# create the prompt
prompt_data = "Cool. Will that work for my Linux workload?"

response = invoke_model_and_get_response(prompt_data, memory)

#Notice how the LLM's response acknowledges your previous question
print(response)

# **Exercises**

The following two exercises will need you to manipulate the prompt to get the desired output

**Exercise 2.1 - Checking our memory**

Using proper formatting, fix the final prompt in the cell below to get Mistral 7B to correctly answer questions about a conversation it's had.

In [None]:
memory=[]

# create the first question prompt
prompt_data = """
Which AWS service has 11 nines of durability? 
Tell me which region it's available in.  
"""

response = invoke_model_and_get_response(prompt_data, memory)

print(response)

# create the second question prompt
prompt_data = """
What is an AWS Lambda function?
How do I deploy this service?
"""

response = invoke_model_and_get_response(prompt_data, memory)

print(response)

# create the third question prompt
prompt_data = """
What is an EC2?
Can I deploy my Windows workload to it?
"""

response = invoke_model_and_get_response(prompt_data, memory)

print(response)



#Final Prompt
prompt_data = """
What's the second AWS service we talked about?
"""

response = invoke_model_and_get_response(prompt_data, memory)

print(response)

# Chapter 2 - END.