## [GenAI applications on enterprise data with Amazon Kendra, LangChain and LLMs](https://aws.amazon.com/blogs/machine-learning/quickly-build-high-accuracy-generative-ai-applications-on-enterprise-data-using-amazon-kendra-langchain-and-large-language-models/)

In this tutorial, we will demonstrate how to implement [Retrieval Augmented Generation](https://arxiv.org/abs/2005.11401) (RAG) workflows with the combined powers of [Amazon Kendra](https://aws.amazon.com/kendra/), [🦜️🔗 LangChain](https://python.langchain.com/en/latest/index.html) and state-of-the-art [Large Language Models](https://docs.cohere.com/docs/introduction-to-large-language-models) (LLM) to provide a conversational experience backed by data.

> For the latest news on GenAI, please visit the [Generative AI on AWS](https://aws.amazon.com/generative-ai/) landing page.

### Architecture

The diagram below shows the architecture of a GenAI application with a RAG approach:

<img src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2023/05/02/ML-13807-image001-new.png" width="30%"/>

We use an [Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/hiw-index.html) to hold large quantities of unstructured data from multiple [data sources](https://docs.aws.amazon.com/kendra/latest/dg/hiw-data-source.html):

* Wiki pages
* [MS SharePoint sites](https://docs.aws.amazon.com/kendra/latest/dg/data-source-sharepoint.html)
* Document repositories e.g. [Amazon S3](https://docs.aws.amazon.com/kendra/latest/dg/data-source-s3.html)
* &c.

Each time the user interacts with the GenAI app, the following will happen:

1. The user makes a request to the GenAI app
2. The app issues a [search query](https://docs.aws.amazon.com/kendra/latest/dg/searching-example.html) to the Amazon Kendra index based on the user request
3. The index returns search results with excerpts of relevant documents from the ingested data
4. The app sends the user request along with the data retrieved from the index as context in the LLM prompt
5. The LLM returns a succint response to the user request based on the retrieved data
6. The response from the LLM is sent back to the user

### Prerequisites

> **Note:** This notebook was tested with [Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html) with the [Base Python 3.0 [`sagemaker-base-python-310`] image](https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-available-images.html) on a `ml.t3.medium` (2 vCPU + 4 GiB) instance

We will need a Python version compatible with [🦜️🔗 LangChain](https://pypi.org/project/langchain/) (`>=3.8.1, <4.0`)

In [None]:
import sys
!{sys.executable} -V

**(optional)** [AWS CLI](https://aws.amazon.com/cli/) `v2` to create the Kendra stack

> For more information on how to upgrade the AWS CLI, see [Installing or updating the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)

> Make sure you give enough permissions to the [execution role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) when running this notebook through Amazon SageMaker

In [None]:
!aws --version

**(optional)** a recent version of the [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/) (`>=2.154.0`) containing the [SageMaker JumpStart SDK](https://github.com/aws/sagemaker-python-sdk/releases/tag/v2.154.0) to deploy the LLM to a SageMaker Endpoint

In [None]:
%env PIP_DISABLE_PIP_VERSION_CHECK True
%env PIP_ROOT_USER_ACTION ignore

In [None]:
!{sys.executable} -m pip install -qU "sagemaker>=2.154.0"
!python -c "import sagemaker; print(sagemaker.__version__)"

Edit and run the cell below to skip certain steps

In [None]:
%reload_ext skip_kernel_extension

# Whether to skip the Kendra index deployment
SKIP_KENDRA_DEPLOYMENT = False

# Stack name for the Kendra index deployment
KENDRA_STACK_NAME = "genai-kendra-langchain"

# Whether to skip the quota increase request
SKIP_QUOTA_INCREASE = True

# Whether Streamlit should be installed
SKIP_STREAMLIT_INSTALL = False

### Implement a RAG Workflow

Let's start by cloning the [AWS LangChain](https://github.com/aws-samples/amazon-kendra-langchain-extensions) repo

In [None]:
!git clone https://github.com/aws-samples/amazon-kendra-langchain-extensions

This repo contains a set of utility classes to work with Langchain incl. a retriever class `KendraIndexRetriever` for working with a Kendra index and sample scripts to execute the Q&A chain for SageMaker, Open AI and Anthropic providers.

**Optional:** Deploy the provided AWS CloudFormation template ([`samples/kendra-docs-index.yaml`](https://github.com/aws-samples/amazon-kendra-langchain-extensions/blob/main/samples/kendra-docs-index.yaml)) to create a new Kendra index

In [None]:
%%skip $SKIP_KENDRA_DEPLOYMENT
!aws cloudformation deploy --stack-name $KENDRA_STACK_NAME --template-file "amazon-kendra-langchain-extensions/samples/kendra-docs-index.yaml" --capabilities CAPABILITY_NAMED_IAM

In [None]:
%%skip $SKIP_KENDRA_DEPLOYMENT
!aws cloudformation describe-stacks --stack-name $KENDRA_STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`KendraIndexID`].OutputValue' --output text

**Optional:** For a better experience, request a larger document excerpt to be returned via [AWS Service Quotas](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html)

In [None]:
%%skip $SKIP_QUOTA_INCREASE
# Request a quota increase for the maximum number of characters displayed in the Document Excerpt of a Document type result in the Query API
# https://docs.aws.amazon.com/kendra/latest/APIReference/API_Query.html
!aws service-quotas request-service-quota-increase --service-code kendra --quota-code "L-196E775D" --desired-value 1000

**Optional:** Install [Streamlit](https://streamlit.io/)

In [None]:
%%skip $SKIP_STREAMLIT_INSTALL

# Install streamlit
# https://docs.streamlit.io/library/get-started/installation
!{sys.executable} -m pip install -qU streamlit

# Debug installation
# https://docs.streamlit.io/knowledge-base/using-streamlit/sanity-checks
!streamlit version

Install [LangChain 🦜️🔗](https://github.com/hwchase17/langchain)

In [None]:
# Install LangChain
# https://python.langchain.com/en/latest/getting_started/getting_started.html
!{sys.executable} -m pip install -qU "langchain==0.0.137"

# Debug installation
!python -c "import langchain; print(langchain.__version__)"

Next, we need an LLM to process requests. 

Models like [Flan-T5-XL](https://huggingface.co/google/flan-t5-xl) and [Flan-T5-XXL](https://huggingface.co/google/flan-t5-xxl) can be deployed in just a few lines of code via [Amazon SageMaker JumpStart](https://aws.amazon.com/sagemaker/jumpstart/)

In [None]:
from sagemaker.jumpstart.model import JumpStartModel

# Select model
# https://aws.amazon.com/sagemaker/jumpstart/getting-started
model_id = str(input("Model ID:") or "huggingface-text2text-flan-t5-xl")

# Deploy model
model = JumpStartModel(model_id=model_id)
predictor = model.deploy()

In [None]:
# Test model
predictor.predict("Hey there! How are you?")

**Optional:** If you want to work with [Anthropic's `Claude-V1`](https://www.anthropic.com/index/introducing-claude) or [OpenAI's `da-vinci-003`](da-vinci-003), get an API key and run the cell below

In [None]:
import os
from getpass import getpass

"""
OpenAI
https://python.langchain.com/en/latest/modules/models/llms/integrations/openai.html
"""

# Get an API key from
# https://platform.openai.com/account/api-keys
OPENAI_API_KEY = getpass("OPENAI_API_KEY:")
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

"""
Anthropic
https://python.langchain.com/en/latest/modules/models/chat/integrations/anthropic.html
"""

# Get an API key from
# https://www.anthropic.com/product
ANTHROPIC_API_KEY = getpass("ANTHROPIC_API_KEY:")
os.environ["ANTHROPIC_API_KEY"] = ANTHROPIC_API_KEY

Install the `KendraIndexRetriever` interface and sample applications

In [None]:
# Install classes
!{sys.executable} -m pip install -qU ./amazon-kendra-langchain-extensions

Before running the sample application, we need to set up the environment variables with the Amazon Kendra index details and the SageMaker endpoints for the `FLAN-T5-*` models

In [None]:
import re

# Set Kendra index ID
os.environ['KENDRA_INDEX_ID'] = input('KENDRA_INDEX_ID:')

# Set endpoint name
# https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/text2text-generation-flan-t5.ipynb
if re.search("flan-t5-xl", model_id):
    os.environ['FLAN_XL_ENDPOINT'] = predictor.endpoint_name
elif re.search("flan-t5-xxl", model_id):
    os.environ['FLAN_XXL_ENDPOINT'] = predictor.endpoint_name
elif "OPENAI_API_KEY" in os.environ or "ANTHROPIC_API_KEY" in os.environ:
    print("Using external API key")
else:
    print("⚠️ The SageMaker Endpoint environment variable is not set")

Now, we can finally start the application:

In [None]:
# Python
!cd amazon-kendra-langchain-extensions/samples && python kendra_chat_flan_xl.py

In [None]:
# Streamlit
!cd amazon-kendra-langchain-extensions/samples && streamlit run app.py flanxl

> **Note:** As of May 2023, Amazon SageMaker Studio doesn't allow apps to run through Jupyter Server Proxy on a Kernel Gateway. The best option is to use the [SageMaker SSH Helper](https://github.com/aws-samples/sagemaker-ssh-helper) library to do port forwarding to `server.port` (defaults to `8501`) cf. [Local IDE integration with SageMaker Studio over SSH for PyCharm / VSCode](https://github.com/aws-samples/sagemaker-ssh-helper#local-ide-integration-with-sagemaker-studio-over-ssh-for-pycharm--vscode) for more information.

<img src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2023/04/25/ML-13807-image005.jpg" width="30%"/>

### Cleanup

Delete SageMaker Endpoint

In [None]:
predictor.delete_endpoint()

Delete Kendra stack

In [None]:
%%skip $SKIP_KENDRA_DEPLOYMENT
!aws cloudformation delete-stack --stack-name $KENDRA_STACK_NAME

### References 📚

* AWS ML Blog: [Quickly build high-accuracy Generative AI applications on enterprise data using Amazon Kendra, LangChain, and large language models](https://aws.amazon.com/blogs/machine-learning/quickly-build-high-accuracy-generative-ai-applications-on-enterprise-data-using-amazon-kendra-langchain-and-large-language-models/)
* AWS ML Blog: [Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart](https://aws.amazon.com/blogs/machine-learning/question-answering-using-retrieval-augmented-generation-with-foundation-models-in-amazon-sagemaker-jumpstart/)
* AWS ML Blog: [Dive deep into Amazon SageMaker Studio Notebooks architecture](https://aws.amazon.com/blogs/machine-learning/dive-deep-into-amazon-sagemaker-studio-notebook-architecture/)