# Agentic RAG application using the Mistral Large 2 Model and LlamaIndex

## Introduction

Agentic RAG (Retrieval-Augmented Generation) applications represent an advanced approach in AI that integrates large language models (LLMs) with external knowledge retrieval and autonomous agent capabilities. These systems dynamically access and process information, break down complex tasks, utilise external tools, apply reasoning, and adapt to various contexts. They go beyond simple question-answering by performing multi-step processes, making decisions, and generating complex outputs.

In this notebook, we demonstrate an example of building an agentic RAG application using the LlamaIndex framework. This application serves as a technology discovery and research tool, using the Mistral Large 2 model via Bedrock Converse as the LLM to orchestrate agent flow and generate responses. It interacts with well-known websites, such as Arxiv, GitHub, and TechCrunch, and can access knowledge bases containing documentation and internal knowledge.

This application can be further expanded to accommodate broader use cases requiring dynamic interaction with internal and external APIs, as well as the integration of internal knowledge bases to provide more context-aware responses to user queries.

---

## Prerequisites

- At the time of writing this notebook, the Mistral Large 2 model is only available in the `us-west-2` region.
- Create a SageMaker notebook instance and select `ml.t3.medium` as the instance type.
- Create a new SageMaker execution role and grant it Bedrock full access.

---

## Architecture

This solution uses the LlamaIndex framework to build an agent flow with two main components: [AgentRunner and AgentWorker](https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/agent_runner/). The AgentRunner serves as an orchestrator that manages conversation history, creates and maintains tasks, executes task steps, and provides a user-friendly interface for interactions. The AgentWorker handles the step-by-step reasoning and task execution.

For reasoning and task planning, we use Mistral Large 2 model from Amazon Bedrock. The agent integrates with GitHub, arXiv, and TechCrunch APIs, while also accessing internal knowledge through Bedrock Knowledge Bases and Amazon OpenSearch Serverless to provide context-aware answers.


<img src="imgs/llamaindex-agentic-rag-mistral-large2-arch.png" width="600" alt="architecture">


## Install Packages 

Install below ptyhon packages: 
- **llama-index**: an open-source framework that helps build applications using LLMs. 
- **llama-index-llms-bedrock-converse**: Bedrock Converse integration with LlamaIndex.  
- **llama-index-retrievers-bedrock**: Bedrock Knowledge Bases integration with LlamaIndex. 
- **llama-index-tools-arxiv**: A prebuilt tool to query arxiv.org
- **feedparser**: A Python library for parsing for downloading and parsing syndicated feeds including RSS, Atom & RDF Feeds

In [None]:
%pip install llama-index -q
%pip install llama-index-llms-bedrock-converse -q
%pip install llama-index-retrievers-bedrock -q
%pip install llama-index-tools-arxiv -q

In [None]:
!conda install feedparser -y

In [None]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
# Initialize and configure the BedrockConverse LLM with the Mistral Large 2 model and set it as the default in Settings

from llama_index.llms.bedrock_converse import BedrockConverse
from llama_index.core.agent import FunctionCallingAgent
from llama_index.core.tools import FunctionTool

from llama_index.core import Settings

llm = BedrockConverse(model="mistral.mistral-large-2407-v1:0", max_tokens = 2048)
Settings.llm = BedrockConverse(model="mistral.mistral-large-2407-v1:0", max_tokens = 2048)


## API tools integration 

We implement two functions to interact with GitHub and TechCrunch APIs. To ensure clear communication between the agent and the LLM model, we follow Python function best practices including:
- Type hints for parameter and return value validation
- Detailed docstrings explaining function purpose, parameters, and expected returns
- Clear function descriptions

For arXiv integration, we leverage LlamaIndex's pre-built tool instead of creating a custom function. You can explore other available pre-built tools in the [LlamaIndex documentation](https://docs.llamaindex.ai/en/stable/api_reference/tools/) to avoid duplicating existing solutions. 

In [None]:
# Define a function to search GitHub repositories by topic, sorting by stars or update date, and return top results

import requests

def github_search(topic: str, num_results: int = 3, sort_by: str = "stars") -> list:
    """
    Retrieve a specified number of GitHub repositories based on a given topic, 
    ranked by the specified criteria.

    This function uses the GitHub API to search for repositories related to a 
    specific topic or keyword. The results can be sorted by the number of stars 
    (popularity) or the most recent update, with the most relevant repositories 
    appearing first according to the chosen sorting method.

    Parameters:
    -----------
    topic : str
        The topic or keyword to search for in GitHub repositories.
        The topic cannot contain blank spaces.
    num_results : int, optional
        The number of repository results to retrieve. Defaults to 3.
    sort_by : str, optional
        The criterion for sorting the results. Options include:
        - 'stars': Sort by the number of stars (popularity).
        - 'updated': Sort by the date of the last update (most recent first).
        Defaults to 'stars'.

    Returns:
    --------
    list
        A list of dictionaries, where each dictionary contains information 
        about a repository. Each dictionary includes:
        - 'html_url': The URL of the repository.
        - 'description': A brief description of the repository.
        - 'stargazers_count': The number of stars (popularity) the repository has.
    """
    

    url = f"https://api.github.com/search/repositories?q=topic:{topic}&sort={sort_by}&order=desc"

    response = requests.get(url).json()
    
    code_repos = [
        {
            'html_url': item['html_url'],
            'description': item['description'],
            'stargazers_count': item['stargazers_count'],
            # 'topics': item['topics']
        }
        for item in response['items'][:num_results]
    ]
    
    return code_repos

github_tool = FunctionTool.from_defaults(fn=github_search)

In [None]:
# Define a function to search for TechCrunch news articles by topic and return details for a specified number of results

import feedparser
    
def news_search(topic: str, num_results: int = 3) -> list:
    """
    Retrieve a specified number of news articles from TechCrunch based on a given topic.

    This function queries the TechCrunch RSS feed to search for news articles related to the 
    provided topic and returns a list of the most relevant articles. Each article includes 
    details such as the title, link, publication date, and a summary or description.

    Parameters:
    -----------
    topic : str
        The keyword or subject to search for in the TechCrunch news feed.
        The topic cannot contain blank spaces.
        If multiple words are needed, connect them with "+" (e.g., artificial+intelligence).
    num_results : int, optional
        The number of articles to retrieve from the search results. Defaults to 3.

    Returns:
    --------
    list
        A list of dictionaries, where each dictionary contains information about a retrieved 
        news article. Each dictionary includes:
        - 'title': The title of the article.
        - 'link': The URL to the article.
        - 'published': The publication date of the article.
        - 'summary': A brief summary or description of the article, if available.
    """
    

    url = f"https://techcrunch.com/tag/{topic}/feed/"
    feed = feedparser.parse(url)
    
    news = []
    
    # Loop through the top num_results articles
    for entry in feed.entries[:num_results]:
        # Create a dictionary for each article
        article = {
            'title': entry.title,
            'link': entry.link,
            'published': entry.published,
            'summary': entry.summary if hasattr(entry, 'summary') else entry.description if hasattr(entry, 'description') else None
        }

        # Add the article dictionary to the list
        news.append(article)
    
    return news

news_tool = FunctionTool.from_defaults(fn=news_search)

In [None]:
# Import and configure the ArxivToolSpec from LlamaIndex prebuilt tools

from llama_index.tools.arxiv import ArxivToolSpec

arxiv_tool = ArxivToolSpec()
api_tools = arxiv_tool.to_tool_list()

# Consolidate all tools into one list. 
api_tools.extend([news_tool, github_tool])

In [None]:
# Set up an agent with access to GitHub, arXiv, and TechCrunch APIs, using a system prompt to guide interactions.

from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

import time
current_time = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time()))

system_prompt = f"""
You are a technology expert with access to the GitHub API, arXiv API, and TechCrunch API. 
You can search for the latest code repositories, papers, and news related to technology.
Always try to use the tools available to you. 
If you don’t know the answer, do not make up any information; simply say: Sorry, I don’t know.

Current time is: {current_time}
"""

agent_worker = FunctionCallingAgentWorker.from_tools(
    api_tools, 
    llm=llm, 
    verbose=False, # Set verbose=True to display the full trace of steps. 
    system_prompt = system_prompt,
    # allow_parallel_tool_calls = True # Uncomment this line to allow multiple tool invocations
)
agent = AgentRunner(agent_worker)

In [None]:
response = agent.chat("Can you give me top 2 papers about GenAI, and recent news about bedrock")
print(str(response))

In [None]:
# Simple chatbot UI. Enter "exit" to quit. 

while True:
    text_input = input("User: ")
    if text_input == "exit":
        break
    response = agent.chat(text_input)
    print(f"Agent: {response}")
    print("-" * 120)
    print(" New question: ")

In [None]:
# agent.memory.get() # retrieve conversation history
# agent.memory.reset() # clear the chat memory

In [None]:
# test questions: 
# 1. any news about GenAI
# 2. can you give me top5 github code repo related to bedrock
# 3. can you show me the top 3 paper that releted to GenAI

### Documents RAG Integration - with Bedrock Knowledge Bases Service

Below, we download two PDF files of decision guide documents from the AWS website, which provide recommendations for selecting GenAI and ML services in different scenarios, and outline what should be evaluated and considered in the decision-making process. You can provide and replace these with your internal business documents in this step.

We use Amazon Bedrock Knowledge Bases to build the RAG framework. You can create a Bedrock Knowledge Base from the [AWS console](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-create.html) or follow this [notebook example](https://github.com/aws-samples/amazon-bedrock-workshop/blob/main/02_KnowledgeBases_and_RAG/0_create_ingest_documents_test_kb.ipynb) to create it programmatically. 

Download files using the commands below, then upload them to the S3 bucket you created for the Knowledge Base. You can select different embedding models and chunking strategies that work better for your data. 


In [None]:
# download test documents from below links

!wget -O genai_on_aws.pdf "https://docs.aws.amazon.com/pdfs/decision-guides/latest/generative-ai-on-aws-how-to-choose/generative-ai-on-aws-how-to-choose.pdf?did=wp_card&trk=wp_card#guide"
!wget -O ml_on_aws.pdf "https://docs.aws.amazon.com/pdfs/decision-guides/latest/machine-learning-on-aws-how-to-choose/machine-learning-on-aws-how-to-choose.pdf?did=wp_card&trk=wp_card#guide"

- Upload the test documents to the S3 bucket that was added as a data source to the Knowledge Base you created. Then sync the data.

In [None]:
# After you create the knowledge base, provide Bedrock Knowledge Base ID 
knowledge_base_id = "[KNOWLEDGE_BASE_ID]" 

In [None]:
# Configure a knowledge base retriever using AmazonKnowledgeBasesRetriever

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.retrievers.bedrock import AmazonKnowledgeBasesRetriever

# maximum number of relevant text chunks that will be retrieved
# If you need quick, focused answers: lower numbers (1-3)
# If you need detailed, comprehensive answers: higher numbers (5-10)
top_k = 3

# search mode options: HYBRID, SEMANTIC
# HYBRID search combines the strengths of semantic search and keyword search 
# Balances semantic understanding with exact matching
search_mode = "HYBRID"

kb_retriever = AmazonKnowledgeBasesRetriever(
    knowledge_base_id=knowledge_base_id,
    retrieval_config={
        "vectorSearchConfiguration": {
            "numberOfResults": top_k,
            "overrideSearchType": search_mode,
        }
    },
)
kb_engine = RetrieverQueryEngine(retriever=kb_retriever)


In [None]:
# Create a query tool for Bedrock Knowledge Base

kb_tool = QueryEngineTool(
        query_engine=kb_engine,
        metadata=ToolMetadata(
            name="guide_tool",
            description="""
            These decision guides help users select appropriate AWS machine learning and generative AI services based on specific needs. 
            They cover pre-built solutions, customizable platforms, and infrastructure options for ML workflows, 
            while outlining how generative AI can automate processes, personalize content, augment data, reduce costs, 
            and enable faster experimentation in various business contexts.""",
        ),
    )

In [None]:
import time
current_time = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time()))

system_prompt = f"""
You are a technology expert with access to the GitHub API, arXiv API, and TechCrunch API. 
You can search for the latest code repositories, research papers, and news related to technology.
You have access to the Amazon Bedrock user guide, which provides information about services offered by Bedrock, 
such as Agents, Knowledge Bases, Guardrails, Model Evaluation, and Model Fine-Tuning. 
It also provides third-party foundation models and Amazon LLMs via the Bedrock platform 
Always utilise the tools at your disposal.
If you don’t know the answer, do not make up any information; simply say: Sorry, I don’t know.

Current time is: {current_time}
"""

# Update the agent to include all API tools and the Knowledge Base tool.

all_tools = api_tools +[kb_tool]

agent_worker = FunctionCallingAgentWorker.from_tools(
    all_tools, 
    llm=llm, 
    verbose=True, # Set verbose=True to display the full trace of steps. 
    system_prompt = system_prompt,
    # allow_parallel_tool_calls = True  # Uncomment this line to allow multiple tool invocations
)
agent = AgentRunner(agent_worker)

In [None]:
response = agent.chat("I don't have many ML experts, but I want to build a GenAI application. Which AWS service should I choose?")

In [None]:
# Simple chatbot UI. Enter "exit" to quit. 

while True:
    text_input = input("User: ")
    if text_input == "exit":
        break
    response = agent.chat(text_input)
    print(f"Agent: {response}")
    print("-" * 120)
    print(" New question: ")
    # what services bedrock platform is offering
    #  what are the LLM models available from bedrock


In [None]:
# agent.memory.reset() # clear the chat memory
# agent.memory.get() # retrieve conversation history

In [None]:
# # Test question: 
# 1. I don't have many ML experts, but I want to build a GenAI application. Which AWS service should I choose?
# 2. whats the benefits of using bedrock service
# 3. can you give me top 5 git repos related to bedrock 

## Conclusion

This notebook shows how we can combine LLMs (Mistral Large 2), internet searching tools, and knowledge bases to build an intelligent research helper. We can see how this solution works well for finding and understanding technical information, and it can easily be made more powerful by adding more data sources and features.