# Mistral's Models on Amazon Bedrock

Welcome to our workshop introducing Mistral’s models on Amazon Bedrock. Mistral Small & Large 2 models are deployed on Amazon Bedrock. In this workshop we will demonstrate the new capabilities of Mistral Large 2 including tool/function use, multi-lingual abilities, and the power of long context windows as well as data generation. 

Let's dive in!


## What is Mistral Large 2?

Mistral Large 2 (24.07) is a state-of-the-art large language model featuring:

- **128 Billion Parameters:** Enhancing its ability to understand and generate complex language structures.
- **128k Context Window:** Allowing it to process and generate responses based on very long inputs.
- **Multilingual Proficiency:** Supporting dozens of languages, including French, German, Spanish, Italian, Arabic, Hindi, Japanese, and more.
- **Coding Language Support:** Understanding and generating code in over 80 programming languages.
- **Improved Instruction Following:** Better adherence to user instructions and tasks.
- **Enhanced Conversational Abilities:** More natural and context-aware interactions.
- **Tool Use:** Ability to utilize tools and functions for extended operations.

## Model Details

* **Available Regions**: `us-west-2`
* **Model ID**: `mistral.mistral-large-2407-v1:0`
* **Context Window**: 128,000 tokens
* **Maximum Tokens per Response**: 8,192

In this notebook, we'll guide you through the process of using Mistral Large 2 to summarize a PDF document, demonstrating its capacity to handle long contexts and generate detailed summaries.


## Importing Necessary Libraries

To interact with Amazon Bedrock and process PDF files, we need to import the following libraries:

* **`boto3`**: AWS SDK for Python, used to interact with Amazon Bedrock.
* **`botocore.config.Config`**: Allows configuration of AWS clients, such as setting timeouts.
* **`PyPDF2`**: A library for reading and extracting text from PDF files.

In [3]:
import boto3
from botocore.config import Config
import PyPDF2

### Initialize the Bedrock Client

The `initialize_bedrock_client` function sets up the client for interacting with Amazon Bedrock.

### Converse with the Model

The `converse` function sends a prompt to the Mistral Large 2 model and retrieves the response.

### Extract Text from PDF

The `extract_text_from_pdf` function reads a PDF file and extracts all the text content.

In [8]:
# Utility Functions
def initialize_bedrock_client(region_name="us-west-2", read_timeout=2000):
    config = Config(read_timeout=read_timeout)
    return boto3.client(
        service_name='bedrock-runtime',
        region_name=region_name,
        config=config
    )

def converse(
    system_prompt='',
    task_instructions='',
    context='',
    max_tokens=1000,
    temperature=0.1,
    top_p=0.9,
    model_id='mistral.mistral-large-2407-v1:0',
    bedrock_client=None
):
    if bedrock_client is None:
        bedrock_client = initialize_bedrock_client()
    # Construct the system prompt
    system = [{"text": system_prompt}] if system_prompt else []

    # Construct the user message
    user_content = '\n'.join(filter(None, [task_instructions, context]))

    messages = [{
        "role": "user",
        "content": [{"text": user_content.strip()}]
    }]

    try:
        # Make the converse API call
        response = bedrock_client.converse(
            modelId=model_id,
            messages=messages,
            system=system,
            inferenceConfig={
                "maxTokens": max_tokens,
                "temperature": temperature,
                "topP": top_p
            }
        )

        # Extract and return the assistant's response
        assistant_response = response["output"]["message"]["content"][0]["text"]
        return assistant_response.strip()

    except Exception as e:
        print(f"An error occurred: {e}")
        return None

def extract_text_from_pdf(pdf_path):
    try:
        with open(pdf_path, 'rb') as file:
            reader = PyPDF2.PdfReader(file)
            # Use a generator expression to extract text from all pages
            text = "\n".join(page.extract_text() or "" for page in reader.pages)
            return text
    except FileNotFoundError:
        print(f"Error: The file '{pdf_path}' was not found. Please check the file path.")
    except Exception as e:
        print(f"An error occurred while reading the PDF file: {e}")
    return ""


### Summarize Document

The `summarize_document` function extracts text from a PDF and generates a summary using the Mistral Large 2 model.

In [9]:
def summarize_document(pdf_path, system_prompt, task_instructions, max_tokens=1000):
    # Extract text from the PDF
    document_text = extract_text_from_pdf(pdf_path)

    # Check if the document was loaded successfully before proceeding
    if document_text:
        # Call the converse function to summarize the document
        response = converse(
            system_prompt=system_prompt,
            task_instructions=task_instructions,
            context=document_text,
            max_tokens=max_tokens,
            temperature=0.1,
            top_p=0.9
        )
        return response
    else:
        print("Cannot proceed with summarization due to issues with loading the document.")
        return None

In this section, we set up our prompts and execute the summarization. Remember, you can define Large 2's persona in the system field and the specific task instructions in the user role. Feel free to update the persona to whatever you like and adjust the task instructions. The basis for our summary will be the latest State of Gen AI report from Deloitte - which is ~

In [11]:
# Main Execution
if __name__ == "__main__":
    # Initialize Bedrock client
    bedrock_client = initialize_bedrock_client()

    # Define prompts
    system_prompt = "You are a polite research assistant who is always helpful, cheerful, pragmatic, and extremely detail oriented"
    task_instructions = """
    Please provide a comprehensive summary of the document, including the following sections:
    1. **Overview**
   - A very brief introduction to the main topic and objectives of the paper in about ten sentences.

2. **Key Insights**
   - Detailed insights and findings presented in the paper.
   - Highlight any opportunities identified by the authors.

3. **Key Challenges**
   - Outline the main challenges or obstacles discussed.
   - Discuss any limitations or areas that require further research.

4. **Conclusion**
   - A very concise wrap-up of the overall significance of the findings in a few sentences.
   
Ensure that each section is clearly labeled and that the information is presented in a clear and organized manner.
    """

    # Summarize the document
    pdf_path = 'us-state-of-gen-ai-q3.pdf'
    summary = summarize_document(pdf_path, system_prompt, task_instructions)

    # Print the summarized response
    if summary:
        print("### Summary of the Document ###\n")
        print(summary)


### Summary of the Document ###

### Overview

The Deloitte report, "State of Generative AI in the Enterprise: Quarter Three Report, August 2024," focuses on the transition from the potential of Generative AI (GenAI) to its practical performance within organizations. The report highlights the increasing pressure on organizations to demonstrate significant and sustained value from their GenAI initiatives. As C-suites and boards begin to look for returns on investment, the report emphasizes the need for patience and perseverance to unlock the transformational potential of GenAI. Key areas explored include data and governance, risk and compliance, and measuring value. The report also discusses the shift from large language models (LLMs) to small language models (SLMs) and the rise of AI agents, which offer new avenues for automation and personalization. Regulatory considerations and the importance of human-centered change are also addressed.

### Key Insights

1. **Initial Success and Inv