Overview

Amazon Bedrock/ Llama2/ Falcon with Serverless RAG on Amazon Opensearch Serverless vector db

Overview

A new wave of widespread AI adoption is on the way with generative AI,having the potential to reinvent every aspect of customer experiences and applications. Generative AI is powered by very large machine learning models that are pre-trained on vast amounts of data, commonly referred to as foundation models (FMs). Large Language Models are a subset of Foundation Models(FMs) which are trained on trillions of words and they learn the patterns in the language, allowing them to generate human-like responses to any query we give them. Additionally, foundation models are trained on very general domain corpora, making them less effective for domain-specific tasks. There lies the importance of RAG. You can use Retrieval Augmented Generation (RAG) to retrieve data from outside a foundation model and augment your prompts by adding the relevant retrieved data in context.

Text generation using RAG with LLMs enables you to generate domain-specific text outputs by supplying specific external data as part of the context fed to LLMs. With RAG, the external data used to augment your prompts can come from multiple data sources, such as a document repositories, databases, or APIs. The first step is to convert your documents and any user queries into a compatible format to perform relevancy search. To make the formats compatible, a document collection, or knowledge library, and user-submitted queries are converted to numerical representations using embedding language models. Embedding is the process by which text is given numerical representation in a vector space. RAG model architectures compare the embeddings of user queries within the vector of the knowledge library. The original user prompt is then appended with relevant context from similar documents within the knowledge library. This augmented prompt is then sent to the foundation model. You can update knowledge libraries and their relevant embeddings asynchronously.

Amazon Opensearch Serverless offers vector engine to store embeddings for faster similarity searches. The vector engine provides a simple, scalable, and high-performing similarity search capability in Amazon OpenSearch Serverless that makes it easy for you to build generative artificial intelligence (AI) applications without having to manage the underlying vector database infrastructure.

Project Updates

(05-Apr-2024):

You can now index PDFs/Json/CSV/txt files into AOSS.
You can now optionally augment your prompts with knowledge from in AOSS
Lambda size reduced to 3GB so newer AWS accounts can deploy this stack

(27-Mar-2024):

Introducing Function calling support with Anthropic's Claude3
Weather-Agent with two functions to find latitude longitude and the weather data of a particular place through function calls
Hotel-Booking Agent to book a room(call functions) with prompt-engg on Claude3

(16-Mar-2024):

Multi-modal support with Claude-3 Haiku and Sonnet.
Compare two or more images, analyze PDFs/Txt/Json file with Claude-3
Optional deployment of AOSS
Boost speed of chat conversations

(14-Mar-2024):

Anthropic Claude-3 Haiku Text based support

(11-Mar-2024):

Anthropic Claude-3 Sonnet Text based support

(13-Dec-2023):

Support Meta Llama2 models on Amazon Bedrock. Support for Anthropic's latest Claude 2.1 model (200K context length).

(09-Nov-2023):

Support Conversations with Opensearch Serverless (BETA)

(27-Oct-2023):

Improve UI

(18-Oct-2023):

Support French/German for Anthropic Claude with Amazon Bedrock
Support for Redaction feature
Inbuilt Text Chunking feature with RecursiveTextSplitter from Langchain

(03-Oct-2023): Support for Amazon Bedrock

Anthropic Claude V1/V2/Instant support over Amazon Bedrock
Support for Streaming ingestion with Anthropic Claude Models
Faster Stack Deployments
New Functionality (PII/Sentiment/Translations) added on the UI

(14-Sept-2023): Support for new LLM's

Llama2-7B (Existing G5.2xlarge)
Llama2-13B (G5.12xlarge)
Llama2-70B (G5.48xlarge)
Falcon-7B (G5.2xlarge)
Falcon-40B (G5.12xlarge)
Falcon-180B (p4de.24xlarge)

New UX/UI (13-Sept-2023): Index Sample Data across different domains. Support multiple-assistant behaviours (Normal/Pirate/Jarvis Assistant modes)

Available Features

Multi-Modal support with Claude-3 Models

Multi-lingual Support

Sentiment Analysis

PII Data Detection

PII Data Redaction

Bedrock RAG Demo

Bedrock RAG Demo Video

Translations / Sentiment Analysis / PII Identification and Redaction

github-final.mov

Llama2 RAG Demo

Llama2 RAG Demo

ImprovedVectorDB.mov

This solution demonstrates building a RAG (Retrieval Augmented Solution) with Amazon Opensearch Serverless Vector DB and Amazon Bedrock, Llama2 LLM, Falcon LLM

Prerequisites

Prerequisites

Familiarity with below Services

For Llama2/Falcon models deployed on Amazon Sagemaker

Amazon Sagemaker
GPU Instance of type ml.g5.2xlarge for endpoint usage
Supported Llama2 regions (us-east-1 , us-east-2 , us-west 2 , eu-west-1 , and ap-southeast-1)

Architecture

Deploying the Solution to your AWS account with AWS Cloudshell

Create an Admin User to deploy this stack

Section1 - Create an IAM user with Administrator permissions (OPTIONAL: If you're already an Admin role, you may skip this step)

Search for the service IAM on the AWS Console and go the IAM Dashboard and click on “Roles“ tab under ”Access Management” and Click on “Create Role”

Select AWS Account and click “Next“

Under permissions select Administrator access

Give the role a name and create the role
You can now assume this role and proceed to deploy the stack. Click on Switch-Role

Switch role

Proceed to cloudshell step

Deploy the RAG based Solution (Total deployment time 40 minutes)

Section 2 - Deploy this RAG based Solution (The below commands should be executed in the region of deployment)

Switch to Admin role. Search for Cloudshell service on the AWS Console and follow the steps below to clone the github repository

Git Clone the serverless-rag-demo repository from aws-samples

 git clone https://github.com/aws-samples/serverless-rag-demo.git

Go to the directory where we have the downloaded files.
```
  cd serverless-rag-demo
```
Fire the bash script that creates the RAG based solution. Pass the environment and region for deployment. environment can be dev,qa,sandbox. Look at Prerequisites to deploy to the correct reqion.
```
  sh creator.sh
```
Select the LLM you want to deploy (sh creator.sh) . Select Option 1 for Amazon Bedrock service.
When selecting Amazon Bedrock (Option 1), you should specify an API Key. The key should be atleast 20 characters long.
Press Enter to proceed with deployment of the stack or ctrl+c to exit
Total deployment takes around 40 minutes. Once the deployment is complete head to API Gateway. Search for API with name rag-llm-api-{env_name}. Get the invoke URL for the API
Invoke the Api Gateway URL that loads an html page for testing the RAG based solution as api-gateway-url/rag
- Do not forget to append "rag" at the end of the API-GW url
eg: https://xxxxxxx.execute-api.us-east-1.amazonaws.com/dev/rag

Add in your API Key used during stack Amazon Bedrock deployment to proceed with the demo

Name		Name	Last commit message	Last commit date
Latest commit History 402 Commits
artifacts		artifacts
infrastructure		infrastructure
llms_with_serverless_rag		llms_with_serverless_rag
media		media
.gitignore		.gitignore
Architecture.drawio		Architecture.drawio
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
architecture.png		architecture.png
buildspec.yml		buildspec.yml
buildspec_bedrock.yml		buildspec_bedrock.yml
cdk.json		cdk.json
creator.sh		creator.sh
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
sagemakerspec.yml		sagemakerspec.yml
source.bat		source.bat

License

aws-samples/serverless-rag-demo

Folders and files

Latest commit

History

Repository files navigation

Amazon Bedrock/ Llama2/ Falcon with Serverless RAG on Amazon Opensearch Serverless vector db

Overview

(05-Apr-2024):

(27-Mar-2024):

(16-Mar-2024):

(14-Mar-2024):

(11-Mar-2024):

(13-Dec-2023):

(09-Nov-2023):

(27-Oct-2023):

(18-Oct-2023):

(03-Oct-2023): Support for Amazon Bedrock

(14-Sept-2023): Support for new LLM's

New UX/UI (13-Sept-2023): Index Sample Data across different domains. Support multiple-assistant behaviours (Normal/Pirate/Jarvis Assistant modes)

Multi-Modal support with Claude-3 Models

Multi-lingual Support

Sentiment Analysis

PII Data Detection

PII Data Redaction

Bedrock RAG Demo Video

Translations / Sentiment Analysis / PII Identification and Redaction

Llama2 RAG Demo

Prerequisites

Familiarity with below Services

For Llama2/Falcon models deployed on Amazon Sagemaker

Architecture

Deploying the Solution to your AWS account with AWS Cloudshell

Section1 - Create an IAM user with Administrator permissions (OPTIONAL: If you're already an Admin role, you may skip this step)

Section 2 - Deploy this RAG based Solution (The below commands should be executed in the region of deployment)

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages