## [GenAI applications on enterprise data with Amazon Kendra, 🦜️🔗 LangChain and LLMs](https://aws.amazon.com/blogs/machine-learning/quickly-build-high-accuracy-generative-ai-applications-on-enterprise-data-using-amazon-kendra-langchain-and-large-language-models/)

In this tutorial, we will demonstrate how to implement [Retrieval Augmented Generation](https://arxiv.org/abs/2005.11401) (RAG) workflows with [Amazon Kendra](https://aws.amazon.com/kendra/), [🦜️🔗 LangChain](https://python.langchain.com/en/latest/index.html) and state-of-the-art [Large Language Models](https://docs.cohere.com/docs/introduction-to-large-language-models) (LLM) to provide a conversational experience backed by data.

> Visit the [Generative AI on AWS](https://aws.amazon.com/generative-ai/) landing page for the latest news on generative AI (GenAI) and learn how AWS is helping reinvent customer experiences and applications.

### Architecture

The diagram below shows the architecture of a GenAI application with a RAG approach:

<img src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2023/05/02/ML-13807-image001-new.png" width="30%"/>

We use the [Amazon Kendra index](https://docs.aws.amazon.com/kendra/latest/dg/hiw-index.html) to hold large quantities of unstructured data from multiple [data sources](https://docs.aws.amazon.com/kendra/latest/dg/hiw-data-source.html), including:

* Wiki pages
* [MS SharePoint sites](https://docs.aws.amazon.com/kendra/latest/dg/data-source-sharepoint.html)
* Document repositories like [Amazon S3](https://docs.aws.amazon.com/kendra/latest/dg/data-source-s3.html)
* ... *and much, much more!*

Each time an user interacts with the GenAI app, the following will happen:

1. The user makes a request to the GenAI app
2. The app issues a [search query](https://docs.aws.amazon.com/kendra/latest/dg/searching-example.html) to the Amazon Kendra index based on the user request
3. The index returns search results with excerpts of relevant documents from the ingested data
4. The app sends the user request along with the data retrieved from the index as context in the LLM prompt
5. The LLM returns a succint response to the user request based on the retrieved data
6. The response from the LLM is sent back to the user

### Prerequisites

> **Note:** Tested with [Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html) on a `ml.t3.medium` (2 vCPU + 4 GiB) instance with the [Base Python 3.0 [`sagemaker-base-python-310`]](https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-available-images.html) image

For this demo, we will need a Python version compatible with [🦜️🔗 LangChain](https://pypi.org/project/langchain/) (`>=3.8.1, <4.0`)

In [None]:
import sys
!{sys.executable} -V

a recent version of the [AWS Python SDK](https://pypi.org/project/boto3/) (`>=1.26.159`)

In [None]:
# Set pip options
%env PIP_DISABLE_PIP_VERSION_CHECK True
%env PIP_ROOT_USER_ACTION ignore

# Install/update boto3
!{sys.executable} -m pip install -qU "boto3>=1.26.159"

and a recent version of the [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/) (`>=2.154.0`), containing the [SageMaker JumpStart SDK](https://github.com/aws/sagemaker-python-sdk/releases/tag/v2.154.0), to deploy the LLM to a SageMaker Endpoint.

In [None]:
# Install/update SageMaker Python SDK
!{sys.executable} -m pip install -qU "sagemaker>=2.154.0"
!python -c "import sagemaker; print(sagemaker.__version__)"

**Optional:** we will also need the [AWS CLI](https://aws.amazon.com/cli/) (`v2`) to create the Kendra index

> For more information on how to upgrade the AWS CLI, see [Installing or updating the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)

> When running this notebook through Amazon SageMaker, make sure the [execution role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) has enough permissions to run the commands

In [None]:
!aws --version

The variables below can be used to bypass **Optional** steps.

In [None]:
%load_ext skip_kernel_extension

# Whether to skip the Kendra index deployment
SKIP_KENDRA_DEPLOYMENT = False

# Stack name for the Kendra index deployment
KENDRA_STACK_NAME = "genai-kendra-langchain"

# Whether to skip the quota increase request
SKIP_QUOTA_INCREASE = True

# Whether Streamlit should be installed
SKIP_STREAMLIT_INSTALL = False

### Implement a RAG Workflow

**Optional:** deploy the provided AWS CloudFormation template ([`samples/kendra-docs-index.yaml`](https://github.com/aws-samples/amazon-kendra-langchain-extensions/blob/main/samples/kendra-docs-index.yaml)) to create a new Kendra index

In [None]:
%%skip $SKIP_KENDRA_DEPLOYMENT
!aws cloudformation deploy --stack-name $KENDRA_STACK_NAME --template-file "kendra-docs-index.yaml" --capabilities CAPABILITY_NAMED_IAM

In [None]:
%%skip $SKIP_KENDRA_DEPLOYMENT

kendra_stack_id = !(aws cloudformation describe-stacks --stack-name $KENDRA_STACK_NAME --query 'Stacks[0].Outputs[?OutputKey==`KendraIndexID`].OutputValue' --output text)
kendra_stack_id = ''.join(kendra_stack_id)

%env KENDRA_INDEX_ID $kendra_stack_id

**Optional:** consider requesting a quota increase via [AWS Service Quotas](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html) on the size of the document excerpts returned by Amazon Kendra for a better experience

In [None]:
%%skip $SKIP_QUOTA_INCREASE
# Request a quota increase for the maximum number of characters displayed in the Document Excerpt of a Document type result in the Query API
# https://docs.aws.amazon.com/kendra/latest/APIReference/API_Query.html
!aws service-quotas request-service-quota-increase --service-code kendra --quota-code "L-196E775D" --desired-value 1000

**Optional:** Install Streamlit

> [Streamlit](https://streamlit.io/) is an open source framework for building and sharing data apps. 
>
> 💡 For a quick demo, try out the [Knowledge base > Tutorials](https://docs.streamlit.io/knowledge-base/tutorials)

In [None]:
%%skip $SKIP_STREAMLIT_INSTALL

# Install streamlit
# https://docs.streamlit.io/library/get-started/installation
!{sys.executable} -m pip install -qU streamlit

# Debug installation
# https://docs.streamlit.io/knowledge-base/using-streamlit/sanity-checks
!streamlit version

Install 🦜️🔗 LangChain

> [LangChain](https://github.com/hwchase17/langchain) is an open-source framework for building *agentic* and *data-aware* applications powered by language models.
>
> 💡 For a quick intro, check out [Getting Started with LangChain: A Beginner’s Guide to Building LLM-Powered Applications](https://towardsdatascience.com/getting-started-with-langchain-a-beginners-guide-to-building-llm-powered-applications-95fc8898732c)

In [None]:
# Install LangChain
# https://python.langchain.com/en/latest/getting_started/getting_started.html
!{sys.executable} -m pip install -qU langchain>=0.0.219

# Debug installation
!python -c "import langchain; print(langchain.__version__)"

Now we need an LLM to handle user queries. Models like [Flan-T5-XL](https://huggingface.co/google/flan-t5-xl) and [Flan-T5-XXL](https://huggingface.co/google/flan-t5-xxl), which are available on [Hugging Face Transformers](https://huggingface.co/docs/transformers/model_doc/flan-t5), can be deployed via [Amazon SageMaker JumpStart](https://aws.amazon.com/sagemaker/jumpstart/) in a matter of minutes with just a few lines of code.

<img src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2023/04/25/ML-13807-image003.jpg" width="50%"/>

In [None]:
from sagemaker.jumpstart.model import JumpStartModel

# Select model
# https://aws.amazon.com/sagemaker/jumpstart/getting-started
model_ids = [
    "huggingface-text2text-flan-t5-xl",
    "huggingface-text2text-flan-t5-xxl",
]
model_id = str(input("Model ID:") or model_ids[0])
assert model_id in model_ids, f"❌ Model '{model_id}' is not supported!"

# Deploy model
model = JumpStartModel(model_id=model_id)
print(f"Deploying model '{model_id}'")
predictor = model.deploy()

In [None]:
# Test model
predictor.predict("Hey there! How are you?")

**Optional:** if you want to work with [Anthropic's `Claude-V1`](https://www.anthropic.com/index/introducing-claude) or [OpenAI's `da-vinci-003`](da-vinci-003), get the corresponding API key(s) and run the cell below.

In [None]:
import os
from getpass import getpass

"""
OpenAI
https://python.langchain.com/en/latest/modules/models/llms/integrations/openai.html
"""

# Get an API key from
# https://platform.openai.com/account/api-keys
OPENAI_API_KEY = getpass("OPENAI_API_KEY:")
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

"""
Anthropic
https://python.langchain.com/en/latest/modules/models/chat/integrations/anthropic.html
"""

# Get an API key from
# https://www.anthropic.com/product
ANTHROPIC_API_KEY = getpass("ANTHROPIC_API_KEY:")
os.environ["ANTHROPIC_API_KEY"] = ANTHROPIC_API_KEY

Before running the sample application, we need to set up the environment variables with the Amazon Kendra index details (`KENDRA_INDEX_ID`) and the SageMaker Endpoints for the `FLAN-T5-*` models (`FLAN_*_ENDPOINT`)

In [None]:
import os
import re

# Set Kendra index ID
os.environ['KENDRA_INDEX_ID'] = str(input('KENDRA_INDEX_ID:') or os.environ['KENDRA_INDEX_ID'])

# Set endpoint name
# https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/text2text-generation-flan-t5.ipynb
if 'model_id' in vars() and re.search("flan-t5-xl", model_id):
    os.environ['FLAN_XL_ENDPOINT'] = predictor.endpoint_name
elif 'model_id' in vars() and re.search("flan-t5-xxl", model_id):
    os.environ['FLAN_XXL_ENDPOINT'] = predictor.endpoint_name
elif "OPENAI_API_KEY" in os.environ or "ANTHROPIC_API_KEY" in os.environ:
    print("Using external API key")
else:
    print("⚠️ The SageMaker Endpoint environment variable is not set!")

Finally, let's start the application 😊

In [None]:
# Python
if 'model_id' in vars() and re.search("flan-t5-xl", model_id):
    %run kendra_chat_flan_xl_nb.py
elif 'model_id' in vars() and re.search("flan-t5-xxl", model_id):
    %run kendra_chat_flan_xxl_nb.py
elif "ANTHROPIC_API_KEY" in os.environ:
    %run kendra_chat_anthropic_nb.py
elif "OPENAI_API_KEY" in os.environ:
    %run kendra_chat_openai_nb.py
else:
    print(f"⚠️ Please choose a supported model and/or model provider!")

In [None]:
# Streamlit
if 'model_id' in vars() and re.search("flan-t5-xl", model_id):
    !streamlit run app.py flanxl
elif 'model_id' in vars() and re.search("flan-t5-xxl", model_id):
    !streamlit run app.py flanxxl
elif "ANTHROPIC_API_KEY" in os.environ:
    !streamlit run app.py anthropic
elif "OPENAI_API_KEY" in os.environ:
    !streamlit run app.py openai
else:
    print(f"⚠️ Please choose a supported model and/or model provider!")

> **Note:** As of May 2023, Amazon SageMaker Studio doesn't allow apps to run through Jupyter Server Proxy on a Kernel Gateway. The best option is to use the [SageMaker SSH Helper](https://github.com/aws-samples/sagemaker-ssh-helper) library to do port forwarding to `server.port` (defaults to `8501`) cf. [Local IDE integration with SageMaker Studio over SSH for PyCharm / VSCode](https://github.com/aws-samples/sagemaker-ssh-helper#local-ide-integration-with-sagemaker-studio-over-ssh-for-pycharm--vscode) for more information.

<img src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2023/04/25/ML-13807-image005.jpg" width="30%"/>

### Cleanup

Don't forget to delete the SageMaker Endpoint

In [None]:
predictor.delete_endpoint()

and the Kendra index

In [None]:
%%skip $SKIP_KENDRA_DEPLOYMENT
!aws cloudformation delete-stack --stack-name $KENDRA_STACK_NAME

### References 📚

* AWS ML Blog: [Quickly build high-accuracy Generative AI applications on enterprise data using Amazon Kendra, LangChain, and large language models](https://aws.amazon.com/blogs/machine-learning/quickly-build-high-accuracy-generative-ai-applications-on-enterprise-data-using-amazon-kendra-langchain-and-large-language-models/)
* AWS ML Blog: [Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart](https://aws.amazon.com/blogs/machine-learning/question-answering-using-retrieval-augmented-generation-with-foundation-models-in-amazon-sagemaker-jumpstart/)
* AWS ML Blog: [Dive deep into Amazon SageMaker Studio Notebooks architecture](https://aws.amazon.com/blogs/machine-learning/dive-deep-into-amazon-sagemaker-studio-notebook-architecture/)