# Requirements

**Requirements:**
* `python3` - https://www.python.org/downloads/
* Hugging Face Account - https://huggingface.co/join
* AWS account - https://aws.amazon.com/resources/create-account/
* OpenAI Account - https://platform.openai.com/login?launch *(optional if using ChatOpenAI, which is the default)*

You will also probably want an IDE: VS Code is recommended - https://code.visualstudio.com/download

Want to get started quickly? Recommended WordPress plugins can be found in the [last section](https://colab.research.google.com/drive/1PGw_QEjJFQ3vhVuSinCbWose5KNjk5wb?authuser=1#scrollTo=a4bfXcyh7p8i&line=1&uniqifier=1 ).

# Code with AI

In [None]:
#@markdown * Find the repo for Hugging Face Chat UI here - Section 0 allows a no-code deploying in Hugging Face Spaces with a prepopulated template and your choice of LLM: https://github.com/huggingface/chat-ui (Documentation: https://huggingface.co/docs/hub/spaces-sdks-docker-chatui#chatui-on-spaces)
#@markdown * `llm-vscode` is an extension you can install on VSCode (https://github.com/huggingface/llm-vscode)
#@markdown * It is built upon the StarCoder LLM (https://github.com/bigcode-project/starcoder) - find the playground Starchat here: https://huggingface.co/spaces/HuggingFaceH4/starchat-playground
#@markdown * https://sourcegraph.com/cody - Out of the box open source coding assistant that has a free tier and allows you to choose open source models for code completion and chat

# Data Preparation

In [None]:
#@markdown To use your own data from your WordPress site you're going to start by exporting your `wp_posts` table in your database as a CSV. The easiest method is to use PHPMyAdmin (https://docs.bitnami.com/aws/apps/dolibarr/administration/export-database/ - ensure you are exporting as CSV and not SQL as indicated in the documentation).
#@markdown If you don't have sufficient data (say at least 50-100 objects), you may choose to supplement. A good opensource directory for datasets can be found on https://www.kaggle.com/datasets/

#@markdown You will need to decide on the foundational model you will be working with at this point as it may dictate the data structure you will need to follow.

# @markdown In this example, we're going to be using Tiny Llama and quantized version of Llama2 (HuggingFace Repo: TinyLlama/TinyLlama-1.1B-Chat-v1.0).

# Finetune your chosen foundational LLM

***NOTE: There is a 90min idle timeout - if finetuning is going to take longer than 90 minutes (it will be in the output below) make sure you remember to interact with this notebook - there is currently no known programmatic way to defeat this.***

In [None]:
#@title 🤗 AutoTrain Advanced
#@markdown In order to use this colab
#@markdown - upload train.csv to a folder named `data/`
#@markdown - train.csv must contain a `text` column
#@markdown - choose a project name if you wish
#@markdown - change model if you wish, you can use most of the text-generation models from Hugging Face Hub
#@markdown - add huggingface information (token and repo_id) if you wish to push trained model to huggingface hub - you do not need to create the repo in advance.
#@markdown - update hyperparameters if you wish, this is not necessary (hyperparameters matter very little).
#@markdown - click `Runtime > Run all` or run each cell individually
#@markdown - This code comes from the Hugging Face Autotrain-Advanced Repo: https://github.com/huggingface/autotrain-advanced/issues

import os
!pip install -U autotrain-advanced > install_logs.txt
!autotrain setup --colab > setup_logs.txt

In [None]:
#@markdown ---
#@markdown #### Project Config
#@markdown Note: if you are using a restricted/private model, you need to enter your Hugging Face token in the next step.
project_name = 'example-project' # @param {type:"string"}
model_name = 'TinyLlama/TinyLlama-1.1B-Chat-v1.0' # @param {type:"string"}

#@markdown ---
#@markdown #### Push to Hub?
#@markdown Use these only if you want to push your trained model to a private repo in your Hugging Face Account
#@markdown If you dont use these, the model will be saved in Google Colab and you are required to download it manually.
#@markdown Please enter your Hugging Face write token. The trained model will be saved to your Hugging Face account.
#@markdown You can find your token here: https://huggingface.co/settings/tokens
push_to_hub = True # @param ["False", "True"] {type:"raw"}
hf_token = "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" #@param {type:"string"}
repo_id = "huggingface_username/repo_name" #@param {type:"string"}

#@markdown ---
#@markdown #### Hyperparameters
learning_rate = 2e-4 # @param {type:"number"}
num_epochs = 1 #@param {type:"number"}
batch_size = 1 # @param {type:"slider", min:1, max:32, step:1}
block_size = 512 # @param {type:"number"}
trainer = "sft" # @param ["default", "sft"] {type:"raw"}
warmup_ratio = 0.1 # @param {type:"number"}
weight_decay = 0.01 # @param {type:"number"}
gradient_accumulation = 4 # @param {type:"number"}
mixed_precision = "fp16" # @param ["fp16", "bf16", "none"] {type:"raw"}
peft = True # @param ["False", "True"] {type:"raw"}
quantization = "int4" # @param ["int4", "int8", "none"] {type:"raw"}
lora_r = 16 #@param {type:"number"}
lora_alpha = 32 #@param {type:"number"}
lora_dropout = 0.05 #@param {type:"number"}
is_decoder = True

os.environ["PROJECT_NAME"] = project_name
os.environ["MODEL_NAME"] = model_name
os.environ["PUSH_TO_HUB"] = str(push_to_hub)
os.environ["HF_TOKEN"] = hf_token
os.environ["REPO_ID"] = repo_id
os.environ["LEARNING_RATE"] = str(learning_rate)
os.environ["NUM_EPOCHS"] = str(num_epochs)
os.environ["BATCH_SIZE"] = str(batch_size)
os.environ["BLOCK_SIZE"] = str(block_size)
os.environ["WARMUP_RATIO"] = str(warmup_ratio)
os.environ["WEIGHT_DECAY"] = str(weight_decay)
os.environ["GRADIENT_ACCUMULATION"] = str(gradient_accumulation)
os.environ["MIXED_PRECISION"] = str(mixed_precision)
os.environ["PEFT"] = str(peft)
os.environ["QUANTIZATION"] = str(quantization)
os.environ["LORA_R"] = str(lora_r)
os.environ["LORA_ALPHA"] = str(lora_alpha)
os.environ["LORA_DROPOUT"] = str(lora_dropout)

In [None]:
!autotrain llm \
--train \
--model ${MODEL_NAME} \
--project-name ${PROJECT_NAME} \
--data-path data/ \
--text-column text \
--lr ${LEARNING_RATE} \
--batch-size ${BATCH_SIZE} \
--epochs ${NUM_EPOCHS} \
--block-size ${BLOCK_SIZE} \
--warmup-ratio ${WARMUP_RATIO} \
--lora-r ${LORA_R} \
--lora-alpha ${LORA_ALPHA} \
--lora-dropout ${LORA_DROPOUT} \
--weight-decay ${WEIGHT_DECAY} \
--gradient-accumulation ${GRADIENT_ACCUMULATION} \
--quantization ${QUANTIZATION} \
--mixed-precision ${MIXED_PRECISION} \
$( [[ "$PEFT" == "True" ]] && echo "--peft" ) \
$( [[ "$PUSH_TO_HUB" == "True" ]] && echo "--push-to-hub --token ${HF_TOKEN} --repo-id ${REPO_ID}" )

# Deploy ChromaDB Vector DB to AWS

Create your AWS CloudFormation (https://aws.amazon.com/cloudformation/) stack for ChromaDB Vector Database with the following JSON template - user is default `ec2-user` (copy, paste and save as a JSON file then use the "Upload a template file" option when creating a new stack):

In [None]:
{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Description": "Create a stack that runs Chroma hosted on a single dockerized instance with API key credentials",
  "Parameters": {
    "KeyName": {
      "Description": "Name of an existing EC2 KeyPair to enable SSH access to the instance",
      "Type": "String",
      "ConstraintDescription": "If present, must be the name of an existing EC2 KeyPair.",
      "Default": ""
    },
    "InstanceType": {
      "Description": "EC2 instance type (t3.small minimum)",
      "Type": "String",
      "Default": "t3.small"
    },
    "ChromaApiToken": {
      "Description": "API token used to connect with X-Chroma-Token header during connection",
      "Type": "String",
      "Default": "insert-your-unique-token-here"
    }
  },
  "Conditions": {
    "HasKeyName": {
      "Fn::Not": [
        {
          "Fn::Equals": [
            {
              "Ref": "KeyName"
            },
            ""
          ]
        }
      ]
    }
  },
  "Resources": {
    "ChromaInstance": {
      "Type": "AWS::EC2::Instance",
      "Properties": {
        "ImageId": {
          "Fn::FindInMap": [
            "Region2AMI",
            {
              "Ref": "AWS::Region"
            },
            "AMI"
          ]
        },
        "InstanceType": {
          "Ref": "InstanceType"
        },
        "UserData": {
          "Fn::Base64": {
            "Fn::Join": [
              "",
              [
                "Content-Type: multipart/mixed; boundary=\"//\"\n",
                "MIME-Version: 1.0\n",
                "\n",
                "--//\n",
                "Content-Type: text/cloud-config; charset=\"us-ascii\"\n",
                "MIME-Version: 1.0\n",
                "Content-Transfer-Encoding: 7bit\n",
                "Content-Disposition: attachment; filename=\"cloud-config.txt\"\n",
                "\n",
                "\n",
                "#cloud-config\n",
                "cloud_final_modules:\n",
                "- [scripts-user, always]\n",
                "\n",
                "\n",
                "--//\n",
                "Content-Type: text/x-shellscript; charset=\"us-ascii\"\n",
                "MIME-Version: 1.0\n",
                "Content-Transfer-Encoding: 7bit\n",
                "Content-Disposition: attachment; filename=\"userdata.txt\"\n",
                "\n",
                "\n",
                "#!/bin/bash\n",
                "# check output of userdata script with sudo tail -f /var/log/cloud-init-output.log\n",
                "yum install docker sqlite tree git -y\n",
                "usermod -a -G docker ec2-user\n",
                "systemctl enable docker\n",
                "systemctl start docker\n",
                "mkdir /home/ec2-user/chroma-storage\n",
                "git clone https://gist.github.com/dr-robert-li/c0524724a75954e44adf1be546fbfd2c /home/ec2-user/run-script/\n",
                "mv /home/ec2-user/run-script/run.sh /home/ec2-user/run.sh\n",
                "rm -rf /home/ec2-user/run-script/\n",
                "chown ec2-user:ec2-user /home/ec2-user/chroma-storage\n",
                "chmod +x /home/ec2-user/run.sh\n",
                "docker pull chromadb/chroma\n",
                "docker run -d -p 8000:8000 -e CHROMA_SERVER_AUTH_CREDENTIALS_PROVIDER=\"chromadb.auth.token.TokenConfigServerAuthCredentialsProvider\" -e CHROMA_SERVER_AUTH_PROVIDER=\"chromadb.auth.token.TokenAuthServerProvider\"",
                {
                  "Fn::Sub": " -e CHROMA_SERVER_AUTH_CREDENTIALS=\"${ChromaApiToken}\""
                },
                " -e CHROMA_SERVER_AUTH_TOKEN_TRANSPORT_HEADER=\"X_CHROMA_TOKEN\" -v /home/ec2-user/chroma-storage/:/chroma/chroma/ chromadb/chroma\n",
                "\n",
                "--//--\n"
              ]
            ]
          }
        },
        "SecurityGroupIds": [
          {
            "Ref": "ChromaInstanceSecurityGroup"
          }
        ],
        "KeyName": {
          "Fn::If": [
            "HasKeyName",
            {
              "Ref": "KeyName"
            },
            {
              "Ref": "AWS::NoValue"
            }
          ]
        },
        "BlockDeviceMappings": [
          {
            "DeviceName": {
              "Fn::FindInMap": [
                "Region2AMI",
                {
                  "Ref": "AWS::Region"
                },
                "RootDeviceName"
              ]
            },
            "Ebs": {
              "VolumeSize": 24
            }
          }
        ]
      }
    },
    "ChromaInstanceSecurityGroup": {
      "Type": "AWS::EC2::SecurityGroup",
      "Properties": {
        "GroupDescription": "Chroma Instance Security Group",
        "SecurityGroupIngress": [
          {
            "IpProtocol": "tcp",
            "FromPort": "22",
            "ToPort": "22",
            "CidrIp": "0.0.0.0/0"
          },
          {
            "IpProtocol": "tcp",
            "FromPort": "8000",
            "ToPort": "8000",
            "CidrIp": "0.0.0.0/0"
          }
        ]
      }
    }
  },
  "Outputs": {
    "ServerIp": {
      "Description": "IP address of the Chroma server",
      "Value": {
        "Fn::GetAtt": [
          "ChromaInstance",
          "PublicIp"
        ]
      }
    }
  },
  "Mappings": {
    "Region2AMI": {
      "ap-south-1": {
        "AMI": "ami-03cb1380eec7cc118",
        "RootDeviceName": "/dev/xvda"
      },
      "eu-north-1": {
        "AMI": "ami-078e13ebe3b027f1c",
        "RootDeviceName": "/dev/xvda"
      },
      "eu-west-3": {
        "AMI": "ami-00575c0cbc20caf50",
        "RootDeviceName": "/dev/xvda"
      },
      "eu-west-2": {
        "AMI": "ami-0b026d11830afcbac",
        "RootDeviceName": "/dev/xvda"
      },
      "eu-west-1": {
        "AMI": "ami-06e0ce9d3339cb039",
        "RootDeviceName": "/dev/xvda"
      },
      "ap-northeast-3": {
        "AMI": "ami-0171e161a6e0c595c",
        "RootDeviceName": "/dev/xvda"
      },
      "ap-northeast-2": {
        "AMI": "ami-0eb14fe5735c13eb5",
        "RootDeviceName": "/dev/xvda"
      },
      "ap-northeast-1": {
        "AMI": "ami-08a8688fb7eacb171",
        "RootDeviceName": "/dev/xvda"
      },
      "ca-central-1": {
        "AMI": "ami-0843f7c45354d48b5",
        "RootDeviceName": "/dev/xvda"
      },
      "sa-east-1": {
        "AMI": "ami-0344e5787e2e93144",
        "RootDeviceName": "/dev/xvda"
      },
      "ap-southeast-1": {
        "AMI": "ami-0753e0e42b20e96e3",
        "RootDeviceName": "/dev/xvda"
      },
      "ap-southeast-2": {
        "AMI": "ami-047dcdc46ac4f2e6b",
        "RootDeviceName": "/dev/xvda"
      },
      "eu-central-1": {
        "AMI": "ami-004359656ecac6a95",
        "RootDeviceName": "/dev/xvda"
      },
      "us-east-1": {
        "AMI": "ami-0ed9277fb7eb570c9",
        "RootDeviceName": "/dev/xvda"
      },
      "us-east-2": {
        "AMI": "ami-064ff912f78e3e561",
        "RootDeviceName": "/dev/xvda"
      },
      "us-west-1": {
        "AMI": "ami-0746394790be7162e",
        "RootDeviceName": "/dev/xvda"
      },
      "us-west-2": {
        "AMI": "ami-098e42ae54c764c35",
        "RootDeviceName": "/dev/xvda"
      }
    }
  }
}

If you would like to restart and reset the ChromaDB API Key and/or the ChromaDB storage location simply invoke the `run.sh` script in the root `/home/ec2-user/` folder.


# Embed Data for RAG

The following steps will allow you to take your data in `csv` format, tokenize it and embed it into the previously created ChromaDB Vector database using open source models where possible.

Start by installing dependencies:

In [None]:
pip install pandas langchain chromadb requests langchain_community openai tiktoken huggingface_hub

Make sure to modify the parameters in the Python scripts below.

In [None]:
# Importing Modules and Dependencies
import os
import pandas as pd
import chromadb
import tiktoken
# tiktoken is required for generating OpenAI Embeddings and OpenAI powered text splitting

from langchain.schema import Document
from langchain.document_loaders import DataFrameLoader
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import HuggingFaceHub
from langchain.llms import HuggingFaceEndpoint
from langchain.embeddings import HuggingFaceHubEmbeddings
from langchain.chains import RetrievalQA
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# @markdown Enter your API Keys

# @markdown When using HuggingFace Hub make sure to use the WRITE API Token
hugging_face_key = "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # @param {type:"string"}

# @markdown Unset the below if you don't want to use OpenAI/ChatOpenAI (currently Llama2 Chat is set behind an approval process) - ChatOpenAI is the default.
openai_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # @param {type:"string"}

# Set env API Key
os.environ["HUGGINGFACEHUB_API_TOKEN"] = hugging_face_key
os.environ["OPENAI_API_KEY"] = openai_key

In [None]:
# @markdown Upload your cleaned data csv and indicate the location of the file below.

# Creating pandas dataframe from file
data_file = '/path/to/data/file.csv' # @param {type:"string"}
df = pd.read_csv(data_file)

# Doublecheck your data
df.head()

In [None]:
# Formatting data using langchain loader
loader = DataFrameLoader(df, page_content_column="text")
# Set page_content_column to the main text of the document - usually you should name this column "text" (case sensitive) for interoperability

# Defining document data
data = loader.load()

In [None]:
# Defining Embedding Function
model = "sentence-transformers/all-mpnet-base-v2"
embeddings = HuggingFaceHubEmbeddings(
                model=model,
            )
# embeddings = OpenAIEmbeddings() # Set if you want to use OpenAI comment out the "model" and "embeddings" variables above

In [None]:
# @markdown Define your ChromaDB client/server settings (also can be used to test along with the commands below):
chroma_hostip = '0.0.0.0' # @param {type:"string"}
chroma_hostport = '8000' # @param {type:"string"}
chroma_api_token = '0123456789' # @param {type:"string"}

client = chromadb.HttpClient(
    host=chroma_hostip,
    port=chroma_hostport,
    headers={'X-Chroma-Token': chroma_api_token}
)

In [None]:
# @markdown Define your ChromaDB Collection Name - it will create one if it does not exist.
vector_collection_name = 'name-of-collection' # @param {type:"string"}

# Creating (if not found) collection
# collection = client.get_or_create_collection(name=vector_collection_name)
collection = client.get_collection(name=vector_collection_name) # Test query to ChromaDB instance

In [None]:
# Defining the VectorDB store settings for getting data, embedding, we're also telling the script to send 'data' to ChromaDB client (refer above) and collection_name
# vectordb = Chroma.from_documents(documents=data, embedding=embeddings, client=client, collection_name=vector_collection_name) # Use this to embed data
vectordb = Chroma(embedding_function=embeddings, client=client, collection_name=vector_collection_name) # Use this to reference existing embeddings

In [None]:
# List all ChromaDB collections on ChromaDB server to check
client.list_collections()

In [None]:
# Use this command if you with to delete a found collection
# client.delete_collection(name="collection-name")

In [None]:
# Validate the data has been stored on the ChromaDB instance in the right collection
collection.count() # Use this command to check the number of items in the collection
# collection.peek() # Use this command if you want to check the first 10 items in the collection

In [None]:
# Use this command if you want to delete items in the substantiated collection
# collection.delete(
#     ids = [ 'id1', 'id2', 'id3' ]
#     )
# collection.modify(name="insert-new-name") # Use this command to modify the name of the substantiated collection.

In [None]:
# Defining document retriever to provide context to eventual QA
# k = the max number of results to return, delete if no limit is required;
# search_type defines the type of search, similarity (default) or mmr are accepted
# (MMR refers to maximal marginal relevance - iteratively searching for dissimilar documents: https://docs.llamaindex.ai/en/latest/examples/vector_stores/SimpleIndexDemoMMR.html)
# retriever = vectordb.as_retriever(search_kwargs={"k": 10}, search_type="similarity") # Example with settings
retriever = vectordb.as_retriever()

In [None]:
# @markdown Test that the retriever with a prompt:
retriever_prompt = 'What is the most thing in the collection?' # @param {type:"string"}
retrieved_docs = retriever.invoke(retriever_prompt)
print(retrieved_docs[0].page_content)

Once data has been embedded, let's create a Retrieval Augmented Generation (RAG) Chain to ask a question with context.

You can modify the `template = """ """` below to change the context and test the RAG Chain to see what output you get.

You can also modify which LLM model to use.

There are 2 options below. OpenAI's GPT models are easier to use and have much higher max token limitations (16k for GPT 3.5 and up to 32k for GPT4 as of Jan 2024) compared to open source models, but if you wish to use an open source model, the second cell below allows for that.

Keep in mind that because it is using a summarization chain, ***prompt detail will be lost.*** The output will attempt to compensate using the wider LLM dataset.

You can choose to use an Open Source LLM that has 16k or 32k context limits such as `TheBloke/Llama-2-13B-chat-GPTQ` or `NurtureAI/OpenHermes-2.5-Mistral-7B-16k` which would allow a higher `max_token_limit` parameter and more iterations. Keep in mind this will require a large amount of memory in your GPU instance.

***You cannot use both at the same time.***

In [None]:
# @markdown ###Option 1: ChatOpenAI OpenAI model

# @markdown No additional input required.
# Define LLM Model to use for later QA text generation
llm = ChatOpenAI() # Set if you want to use ChatOpenAI

template = """
Answer the question based only on the following context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template) # set if using ChatOpenAI Model

def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])


In [None]:
# @markdown ###Option 2: Open Source LLM

# @markdown This will use a Hugging Face Hosted Model e.g. `mistralai/Mistral-7B-Instruct-v0.1`

# @markdown This will require a hosted Inference Endpoint (https://ui.endpoints.huggingface.co/).
# @markdown There may be max token limitations to the model (multiply `num_interations` by `model_token_limit` to determine total number of tokens being used) and each model's inference endpoint differs in usage. Make sure to read the instructions.
# @markdown Below is only a suggested workflow. Make sure to modify the prompting structure.

# @markdown Make sure to retain the ("[URI]") structure for the Hugging Face Endpoint.

# Define LLM Model to use for later QA text generation
endpoint_url = "(\"https://xxxxxxxxxxxxxxxx.us-east-1.aws.endpoints.huggingface.cloud\")" # @param {type:"string"}
llm = HuggingFaceEndpoint(
    endpoint_url=endpoint_url,
    huggingfacehub_api_token=hugging_face_key,
    task='text-generation'
)

model_token_limit = 1024 # @param {type:"number"}
num_iterations = 2 # @param {type:"number"}

# @markdown The included prompting template is designed to work with the `mistralai/Mistral` family of LLMs.
template = """
<s>
"[INST] You are an expert baker always ready to provide tips, tricks and recipes as well as tasting notes![/INST]"
"[INST] Based on the following Question: {question} - Can you a summary of the context provided: {context}?[/INST]"
</s>
"""

# Text Splitter and Summary Chain to ensure prompts don't exceed token limit of model, make sure to modify chunk_size to reflect max tokens of model
# (very important for open source models with limited max token limits)
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=model_token_limit, chunk_overlap=100
)
template_split = text_splitter.split_text(template)

# Formatting template and loading into dataframe.
template_df = pd.DataFrame(
    {"text": template_split}
)
template_loader = DataFrameLoader(template_df, page_content_column="text")
template_data = template_loader.load()

# Summarizing chain. Choose from refine or map_reduce. Turn verbose to True if you wish to see what it's doing.
# chain = load_summarize_chain(llm, chain_type="refine", verbose=False)
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=False)

# Invoke summarizing chain and load into template
template_summ = chain.run(template_data[:num_iterations])

template_final = """
You are an expert baker always ready to provide tips, tricks and recipes as well as tasting notes!
Answer the question based only on the following context: """ + template_summ + """
Question: {question}
If the context isn't sufficient answer as best as possible ignoring context.
"""
prompt = ChatPromptTemplate.from_template(template_final)

def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])


In [None]:
# This is the full chain with commands piped into each other.
# We're reusing the retriever variable set above.
# The llm used will depend on the template cell chosen above.

QAchain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# @markdown Test the RAG Chain to see that it worked

chain_prompt = "Give me some information that can be inferred from the dataset but isn't actually in it." # @param {type:"string"}
QAchain.invoke(chain_prompt)


# Use LangServe to Create FastAPI Endpoints


To use the RAG Chain in a production setting, you will need to deploy it on the cloud.

Start by using LangServe (using FastAPI) to create the API endpoints that will allow external applications to interface with it. The LangServe application will retrieve embeddings from the ChromaDB Vector DB (from above) and run them through a similar RAG chain as above.

Install dependencies

In [None]:
pip install langchain openai tiktoken chromadb langserve sse_starlette fastapi[all] uvicorn mangum

Create a `main.py` file (for easier deployment later) with the following code and run it on your server host environment. Default is `localhost:8000`

This comes from the LangServe Examples repo where there are many other examples and templates you can start from: https://github.com/langchain-ai/langserve/tree/main/examples/conversational_retrieval_chain

As per the above, there are 2 python scripts below. Option 1 is to use OpenAI for ease of use. Option 2 is to use an open source LLM via HuggingFace Endpoints (https://ui.endpoints.huggingface.co/) - ***Please make sure to read the text around choosing an appropriate model to deploy for Option 2***.

Again, ***you cannot use both.***

In [None]:
# @markdown ###Option 1 - OpenAI

from operator import itemgetter
from typing import List, Tuple
import chromadb
import os
import langserve
import mangum

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import HuggingFaceHubEmbeddings
from langchain.llms import HuggingFaceEndpoint
from langchain.prompts import ChatPromptTemplate
from langchain.prompts.prompt import PromptTemplate
from langchain.schema import format_document
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnableMap, RunnablePassthrough
from langchain.vectorstores import Chroma
from langserve import RemoteRunnable

from langserve import add_routes
from langserve.pydantic_v1 import BaseModel, Field

# @markdown Enter your API Keys

# @markdown When using HuggingFace Hub make sure to use the WRITE API Token
hugging_face_key = "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # @param {type:"string"}

# @markdown Unset the below if you don't want to use OpenAI/ChatOpenAI (currently Llama2 Chat is set behind an approval process) - ChatOpenAI is the default.
openai_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # @param {type:"string"}

# Set env API Key
os.environ["HUGGINGFACEHUB_API_TOKEN"] = hugging_face_key
os.environ["OPENAI_API_KEY"] = openai_key

# Create a template where chat history provides additional dynamic context for next answer
_TEMPLATE = """Given the following conversation and a follow up question, rephrase the
follow up question to be a standalone question, in its original language.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_TEMPLATE)

# Create a standard Q&A template to provide additional persistent context for the next answer
ANSWER_TEMPLATE = """You are a world-class baker and pastry chef.
I will ask you a question and you will provide an answer based on the most relevant recipe you know, as well as commentary on best practices, tips, and any related baked goods that should be considered.

You will follow ALL the rules below:

1. You will use recipes that are in the database.
2. If you cannot find a relevant recipe in the database you will come up with a relevant receipe in the same style as the recipes in the database.

Here is the most relevant recipe in the database:
{context}

This is the question you are being asked: {question}

Please answer with the above context.
"""
ANSWER_PROMPT = ChatPromptTemplate.from_template(ANSWER_TEMPLATE)

DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(template="{page_content}")

# Formatting functions
def _combine_documents(
    docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"
):
    """Combine documents into a single string."""
    doc_strings = [format_document(doc, document_prompt) for doc in docs]
    return document_separator.join(doc_strings)

def _format_chat_history(chat_history: List[Tuple]) -> str:
    """Format chat history into a string."""
    buffer = ""
    for dialogue_turn in chat_history:
        human = "Human: " + dialogue_turn[0]
        ai = "Assistant: " + dialogue_turn[1]
        buffer += "\n" + "\n".join([human, ai])
    return buffer

# Defining Embedding Function
model = "sentence-transformers/all-mpnet-base-v2"
embeddings = HuggingFaceHubEmbeddings(
                model=model,
            )

# @markdown Define your ChromaDB client/server settings
chroma_hostip = '0.0.0.0' # @param {type:"string"}
chroma_hostport = '8000' # @param {type:"string"}
chroma_api_token = '0123456789' # @param {type:"string"}

client = chromadb.HttpClient(
    host=chroma_hostip,
    port=chroma_hostport,
    headers={'X-Chroma-Token': chroma_api_token}
)

# @markdown Define your ChromaDB Collection Name - it will create one if it does not exist.
vector_collection_name = 'name-of-collection' # @param

vectorstore = Chroma(
    client=client,
    collection_name=vector_collection_name,
    embedding_function=embeddings,
)
retriever = vectorstore.as_retriever()

# Chaining all input functions together
_inputs = RunnableMap(
    standalone_question=RunnablePassthrough.assign(
        chat_history=lambda x: _format_chat_history(x["chat_history"])
    )
    | CONDENSE_QUESTION_PROMPT
    | ChatOpenAI(temperature=0)
    | StrOutputParser(),
)
_context = {
    "context": itemgetter("standalone_question") | retriever | _combine_documents,
    "question": lambda x: x["standalone_question"],
}


# User input store
class ChatHistory(BaseModel):
    """Chat history with the bot."""

    chat_history: List[Tuple[str, str]] = Field(
        ...,
        extra={"widget": {"type": "chat", "input": "question"}},
    )
    question: str

# Define LLM to use for chat output
llm = ChatOpenAI() # Using OpenAI

# Putting the conversation chain together
conversational_qa_chain = (
    _inputs | _context | ANSWER_PROMPT | llm | StrOutputParser()
)
chain = conversational_qa_chain.with_types(input_type=ChatHistory)

# Defining FastAPI app server
app = FastAPI(
    title="LangChain Server",
    version="1.0",
    description="Spin up a simple api server using Langchain's Runnable interfaces",
)

# Adds FastAPI CORS to allow for applications to request externally
origins = [
    "*"
]

app.add_middleware(
    CORSMiddleware,
    allow_origins=origins,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Adds FastAPI routes to the app for using the chain under:
# /invoke
# /batch
# /stream

add_routes(app, chain, enable_feedback_endpoint=True)

# @markdown Create remote runnable to allow for use in other langchains (Optional)
remote_runnable_host = "http://0.0.0.0:8000/" # @param {type:"string"}
remote_runnable = RemoteRunnable(remote_runnable_host)

# Start Server
if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="0.0.0.0", port=8000)

# Adding Mangum Handler for AWS Lambda
handler = Mangum(app)


In [None]:
# @markdown ###Option 2 - Open Source

from operator import itemgetter
from typing import List, Tuple
import chromadb
import os
import langserve
import mangum

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import HuggingFaceHubEmbeddings
from langchain.llms import HuggingFaceEndpoint
from langchain.prompts import ChatPromptTemplate
from langchain.prompts.prompt import PromptTemplate
from langchain.schema import format_document
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnableMap, RunnablePassthrough
from langchain.vectorstores import Chroma
from langserve import RemoteRunnable
from mangum import Mangum

from langserve import add_routes
from langserve.pydantic_v1 import BaseModel, Field

# @markdown Enter your API Keys

# @markdown When using HuggingFace Hub make sure to use the WRITE API Token
hugging_face_key = "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # @param {type:"string"}

# @markdown Unset the below if you don't want to use OpenAI/ChatOpenAI (currently Llama2 Chat is set behind an approval process) - ChatOpenAI is the default.
openai_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # @param {type:"string"}

# Set env API Key
os.environ["HUGGINGFACEHUB_API_TOKEN"] = hugging_face_key
os.environ["OPENAI_API_KEY"] = openai_key

# Create a template where chat history provides additional dynamic context for next answer
_TEMPLATE = """Given the following conversation and a follow up question, rephrase the
follow up question to be a standalone question, in its original language.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_TEMPLATE)

# Create a standard Q&A template to provide additional persistent context for the next answer
ANSWER_TEMPLATE = """You are a world-class baker and pastry chef.
I will ask you a question and you will provide an answer based on the most relevant recipe you know, as well as commentary on best practices, tips, and any related baked goods that should be considered.

You will follow ALL the rules below:

1. You will use recipes that are in the database.
2. If you cannot find a relevant recipe in the database you will come up with a relevant receipe in the same style as the recipes in the database.

Here is the most relevant recipe in the database:
{context}

This is the question you are being asked: {question}

Please answer with the above context.
"""
ANSWER_PROMPT = ChatPromptTemplate.from_template(ANSWER_TEMPLATE)

DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(template="{page_content}")

# Formatting functions
def _combine_documents(
    docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"
):
    """Combine documents into a single string."""
    doc_strings = [format_document(doc, document_prompt) for doc in docs]
    return document_separator.join(doc_strings)

def _format_chat_history(chat_history: List[Tuple]) -> str:
    """Format chat history into a string."""
    buffer = ""
    for dialogue_turn in chat_history:
        human = "Human: " + dialogue_turn[0]
        ai = "Assistant: " + dialogue_turn[1]
        buffer += "\n" + "\n".join([human, ai])
    return buffer

# Defining Embedding Function
model = "sentence-transformers/all-mpnet-base-v2"
embeddings = HuggingFaceHubEmbeddings(
                model=model,
            )

# @markdown Define your ChromaDB client/server settings
chroma_hostip = '0.0.0.0' # @param {type:"string"}
chroma_hostport = '8000' # @param {type:"string"}
chroma_api_token = '0123456789' # @param {type:"string"}

client = chromadb.HttpClient(
    host=chroma_hostip,
    port=chroma_hostport,
    headers={'X-Chroma-Token': chroma_api_token}
)

# @markdown Define your ChromaDB Collection Name - it will create one if it does not exist.
vector_collection_name = 'name-of-collection' # @param {type:"string"}

vectorstore = Chroma(
    client=client,
    collection_name=vector_collection_name,
    embedding_function=embeddings,
)
retriever = vectorstore.as_retriever()

# @markdown Define LLM to use for chat and parameters - in order to retain the chat history information
# @markdown it is advised to choose an open source LLM that provides 16k or 32k context token allowance - keep in mind this will require a large amount of memory in your GPU instance. For example: NurtureAI/OpenHermes-2.5-Mistral-7B-16k
endpoint_url = "(\"https://endpoint-url.us-east-1.aws.endpoints.huggingface.cloud\")" # @param {type:"string"}
llm = HuggingFaceEndpoint(
    endpoint_url=endpoint_url,
    huggingfacehub_api_token=hugging_face_key,
    task='text-generation'
)

# Chaining all input functions together
_inputs = RunnableMap(
    standalone_question=RunnablePassthrough.assign(
        chat_history=lambda x: _format_chat_history(x["chat_history"])
    )
    | CONDENSE_QUESTION_PROMPT
    | llm
    | StrOutputParser(),
)
_context = {
    "context": itemgetter("standalone_question") | retriever | _combine_documents,
    "question": lambda x: x["standalone_question"],
}


# User input store
class ChatHistory(BaseModel):
    """Chat history with the bot."""

    chat_history: List[Tuple[str, str]] = Field(
        ...,
        extra={"widget": {"type": "chat", "input": "question"}},
    )
    question: str

# Putting the conversation chain together
conversational_qa_chain = (
    _inputs | _context | ANSWER_PROMPT | llm | StrOutputParser()
)
chain = conversational_qa_chain.with_types(input_type=ChatHistory)

# Defining FastAPI app server
app = FastAPI(
    title="LangChain Server",
    version="1.0",
    description="Spin up a simple api server using Langchain's Runnable interfaces",
)

# Adds FastAPI CORS to allow for applications to request externally
origins = [
    "*"
]

app.add_middleware(
    CORSMiddleware,
    allow_origins=origins,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Adds FastAPI routes to the app for using the chain under:
# /invoke
# /batch
# /stream

add_routes(app, chain, enable_feedback_endpoint=True)

# @markdown Create remote runnable to allow for use in other langchains
remote_runnable_host = "http://0.0.0.0:8000/" # @param
remote_runnable = RemoteRunnable(remote_runnable_host)

# Start Server
if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="0.0.0.0", port=8000)

# Adding Mangum Handler for AWS Lambda
handler = Mangum(app)

Execute the `main.py` file directly and it should start the local LangServe FastAPI server.

Test the server is returning an output:

In [None]:
import requests

inputs = {"input": {"question": "What happens if you don't have high-quality ingredients?", "chat_history": []}}

# @markdown Input host server IP and port
host_server = "http://localhost:8000" # @param

response = requests.post(host_server+"/invoke", json=inputs)

response.json()

You can also test by going to the `/playground/` page of the host server (default is `localhost:8000/playground`).

# Deploy LangServe to AWS and connect your domain

To ready LangServe for deployment let's package it into a Docker container.

This follows the FastAPI documentation for Docker deployment: https://fastapi.tiangolo.com/deployment/docker/

Docker is required for this step: https://docs.docker.com/get-docker/

---

**Create the Custom LangServe Docker Image to be deployed**

---



Create a `requirements.txt` file in the same directory as your `main.py` project folder.

Paste the following code into it:

In [None]:
chromadb-client==0.4.19.dev0
fastapi==0.108.0
langchain==0.0.353
langchain_core==0.1.4
langserve==0.0.37
mangum==0.17.0
uvicorn==0.25.0
openai==1.6.1
sse_starlette==1.8.2
huggingface_hub==0.19.4
opentelemetry-exporter-otlp-proto-grpc==1.22.0
opentelemetry-sdk==1.22.0
opentelemetry-api==1.22.0

Create a `Dockerfile` file in the same directory with the following instructions within:

In [None]:
# Pull from official python image as base
FROM python:3.11

# Set working dir
WORKDIR /code

# Copy python reqs to working dir
COPY ./requirements.txt /code/requirements.txt

# Install python reqs
RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir --upgrade -r /code/requirements.txt

# Copy python code to working dir
COPY ./app /code/app

# start the server
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Create a `/app/` folder and move the `main.py` file into it. Create a blank `__init__.py` file inside the `/app/` folder as well.

The folder structure within the project folder should look something like this:

In [None]:
./
├── Dockerfile
├── app
│    ├── __init__.py
│    └── main.py
└── requirements.txt

Finally you will need to make some changes to the `main.py` file as we will be starting LangServe using Docker `CMD` instead of within the Python Script itself.

Comment out the following code blocks (we're also commenting out `remote_runnable_host` as we're not going to be adding this to any additional Langchains once deployed):

In [None]:
# @markdown [OPTIONAL] Create remote runnable to allow for use in other langchains
# remote_runnable_host = "http://0.0.0.0:8000/" # @param
# remote_runnable = RemoteRunnable(remote_runnable_host)

# Start Server
# if __name__ == "__main__":
#     import uvicorn

#     uvicorn.run(app, host="0.0.0.0", port=8000)

Ensure that you are in the root of the project folder where the `Dockerfile` is located. You can check by running `ls`. Build your Docker image by running the following command (replace `name-of-image` with a name of your choosing).

In [None]:
docker build -t name-of-image .

If you are running an M[1-3] Mac using an ARM CPU, you will *need* to specify a build for `Linux amd64` architecture otherwise it will default to `arm64` and will not work on `amd64` architecture. This is going to need the use of Docker's experimental Buildkit `buildx` commands.

Start by going to your Docker Desktop Dashboard, click the gear icon in the top right corner to access settings and select "Docker Engine". You will need to modify your settings to look something like this (Apply & Reset):

In [None]:
{
  "builder": {
    "gc": {
      "defaultKeepStorage": "20GB",
      "enabled": true
    }
  },
  "experimental": true,
  "features": {
    "buildkit": true
  }
}

To build for `amd64` architecture you're going to need to use the `buildx` commands and push directly to ECR from Docker Hub (this takes a *long* time).

Create your ECR Repo first (refer below) and replace the `[AWS-account-number], [repo-region], [Repository-name]` variables without the square brackets in the code snippet below.

Again, reminder, you will need to be in the same project folder where your `Dockerfile` exists:

In [None]:
docker buildx create --use \
docker buildx build --platform linux/amd64,linux/arm64 --push -t [AWS-account-number].dkr.ecr.[repo-region].amazonaws.com/[Repository-name] .

Once done, you can test the Docker image is running correctly by running it with this command below (note that the container is mapped to `port 8000`). You can check within your browser by checking the address printed in the output and going to the `/docs/` page where the FastAPI swagger will be shown (where `0.0.0.0` is the default - this will be a different public IP once deployed):

`INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)`

In [None]:
docker run -p 8000:8000 name-of-image

---

**Setting Up AWS Credentials and push Docker Image to AWS ECR**

---

From here you can deploy the Docker image to the container host of your choice. To keep the image secure I recommend a private image repo such as AWS ECR (https://aws.amazon.com/ecr/).

To do so, you will need to ensure you have AWS CLI installed before beginning: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html

AWS Documentation: https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html

You can then configure AWS CLI to easily log in by using (to get your Access Key and Secret Key *you will need to create an IAM User Group and User*).

AWS Documentation: https://docs.aws.amazon.com/cli/latest/userguide/cli-authentication-user.html

In [None]:
aws configure

If you don't remember your credentials from above or have misplaced them you can find them in `~/.aws/credentials` on Linux and macOS systems, and `%USERPROFILE%\.aws\credentials` on Windows.

If you want to create new ones type this into your terminal (and replace `[IAM-user]` with your IAM username - string expected):

In [None]:
aws iam create-access-key [IAM-user]

The easiest way to push to AWS ECR, is create a repo and within the created repo will be a button to "View push commands".

There should be 4 commands. Copy and paste these commands in the terminal while you are still located in the project folder which contains the `Dockerfile`.

This will log you into AWS via the command line with a retrieved token, rebuild the Docker image (make sure you have commented out the blocks of code above), tag it to your created ECR repo and push the image.

---

**Deploying to EC2**

---

***Pre-requisites: Do you will a Domain Name, this demo does not go into registering nor managing a Domain Name. However, it is recommended that you purchase, register and manage your Domain Name using Cloudflare: https://developers.cloudflare.com/registrar/get-started/register-domain/***

---

Create your AWS CloudFormation (https://aws.amazon.com/cloudformation/) stack for your LangServe FastAPI Server on EC2 with the following YAML template (copy, paste and save as a YAML file then use the "Upload a template file" option when creating a new stack).

The choice of EC2 is made for this demo to persist the `/playground/` chat UI.

All parameters are *mandatory and must be filled out.*

*NOTE: This template assumes the `AWSRegion` resource is the same for both the EC2 instance AND the ECR Repo and that you will set your `VpcId` resource to be the same as the one set for the ChromaDB Vector DB already created above (you can create a new one prior if you wish)*

This YAML script automates the setup and deployment of a the Dockerized LangServe FastAPI server on an EC2 instance. It updates system packages, installs required dependencies, configuring AWS CLI, pulls and runs your LangServe Docker image from ECR, sets up nginx as a reverse proxy to the container with an auto-renewing SSL certificate for a custom domain [YOUR_DOMAIN_NAME], and pulls multiple Github Gists and a cron job to ensure the Docker container and SSL certificate stay updated after deployment.

**Please enter your custom domain name below without the HTTPS/HTTP prefix e.g. `api.robs.kitchen`**

In [None]:
AWSTemplateFormatVersion: '2010-09-09'
Description: Creates an EC2 instance with custom security group and docker container

Parameters:
  InstanceType:
    Type: String
    AllowedValues:
       - c5.xlarge
       - c5.2xlarge
       - c5.4xlarge
       - c5.9xlarge
       - c5.18xlarge
       - m5.xlarge
       - m5.2xlarge
       - m5.4xlarge
       - m5.12xlarge
       - m5.24xlarge
       - t2.nano
       - t2.micro
       - t2.small
       - t2.medium
       - t2.large
       - t2.xlarge
       - t2.2xlarge
       - t3.nano
       - t3.micro
       - t3.small
       - t3.medium
       - t3.large
       - t3.xlarge
       - t3.2xlarge
    Description: Amazon EC2 instance type

  AWSRegion:
    Description: AWS region where resources will be created
    Type: String
    AllowedValues:
      - ap-southeast-2
      - ap-southeast-1
      - ap-south-1
      - eu-north-1
      - eu-west-3
      - eu-west-2
      - eu-west-1
      - ap-northeast-3
      - ap-northeast-2
      - ap-northeast-1
      - ca-central-1
      - sa-east-1
      - eu-central-1
      - us-east-1
      - us-east-2
      - us-west-1
      - us-west-2

  EcrImageUri:
    Type: String
    Description: ECR Image URI for the docker container

  VpcId:
    Type: AWS::EC2::VPC::Id
    Description: VPC Id for the security group

  IAMAccessKey:
    Type: String
    Description: IAM Access Key to authenticate ECR Pull
  IAMSecretKey:
    Type: String
    Description: IAM Secret Key to authenticate ECR Pull

Resources:

  EC2ContainerServiceRole:
    Type: AWS::IAM::Role
    Properties:
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service: [ec2.amazonaws.com]
            Action: ['sts:AssumeRole']
      Policies:
        - PolicyName: ecr-permissions
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Action:
                  - "ecr:BatchGetImage"
                  - "ecr:GetDownloadUrlForLayer"
                  - "ecr:GetAuthorizationToken"
                Resource: "*"

  EC2InstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Roles:
        - !Ref EC2ContainerServiceRole

  EC2Instance:
    Type: 'AWS::EC2::Instance'
    Properties:
      ImageId: !FindInMap [Region2AMI, !Ref 'AWSRegion', AMI]
      InstanceType: !Ref InstanceType
      IamInstanceProfile: !Ref EC2InstanceProfile
      SecurityGroups:
        - !Ref InstanceSecurityGroup
      UserData:
        Fn::Base64: !Sub |
          #!/bin/bash
          sudo yum check-update
          sudo yum update -y
          sudo amazon-linux-extras install docker epel -y
          sudo amazon-linux-extras enable docker
          sudo yum-config-manager --enable epel
          sudo yum install amazon-ecr-credential-helper git nginx openssl certbot python-certbot-nginx -y
          sudo service docker start
          aws configure set output json
          aws configure set region ${AWSRegion}
          aws configure set aws_access_key_id ${IAMAccessKey}
          aws configure set aws_secret_access_key ${IAMSecretKey}
          aws ecr get-login-password --region ${AWSRegion} | sudo docker login --username AWS --password-stdin $(echo ${EcrImageUri} | cut -d'/' -f1)
          sudo docker pull ${EcrImageUri}
          sudo docker run -dit --restart unless-stopped -p 8000:8000 ${EcrImageUri}
          sudo git clone https://github.com/dr-robert-li/docker_boot.git /home/ec2-user/docker_boot/
          sudo chmod +x /home/ec2-user/docker_boot/run.sh
          sudo cp -v /home/ec2-user/docker_boot/docker_boot.service /etc/systemd/system
          sudo systemctl enable docker_boot.service
          sudo systemctl status docker_boot.service
          export PUBLICIP=$(TOKEN=$(curl -sX PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"); curl -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/public-ipv4;)
          export YOUR_DOMAIN_NAME="api.example.com" # @param {type:"string"}
          git clone https://gist.github.com/dr-robert-li/f53a0fc6f6a5e4a961b849f15bd5c0f0 /etc/nginx/templates.d
          sudo mv /etc/nginx/nginx.conf /etc/nginx/nginx.conf.original
          envsubst < /etc/nginx/templates.d/nginx.conf.template > /etc/nginx/nginx.conf
          sudo certbot --nginx -d ${YOUR_DOMAIN_NAME} --register-unsafely-without-email --agree-tos
          sudo service nginx start
          sudo service nginx reload
          (crontab -l ; echo "00 03 * * * certbot renew --agree-tos") | crontab -
  InstanceSecurityGroup:
    Type: 'AWS::EC2::SecurityGroup'
    Properties:
      GroupDescription: Enable SSH, HTTP and HTTPS access on ports 22, 8000, 80 and 443
      SecurityGroupIngress:
        - CidrIp: 0.0.0.0/0
          FromPort: 8000
          ToPort: 8000
          IpProtocol: tcp
        - CidrIpv6: ::/0
          FromPort: 8000
          ToPort: 8000
          IpProtocol: tcp
        - CidrIp: 0.0.0.0/0
          FromPort: 443
          ToPort: 443
          IpProtocol: tcp
        - CidrIpv6: ::/0
          FromPort: 443
          ToPort: 443
          IpProtocol: tcp
        - CidrIp: 0.0.0.0/0
          FromPort: 22
          ToPort: 22
          IpProtocol: tcp
      SecurityGroupEgress:
        - CidrIp: 0.0.0.0/0
          FromPort: 443
          ToPort: 443
          IpProtocol: tcp
        - CidrIp: 0.0.0.0/0
          FromPort: 80
          ToPort: 80
          IpProtocol: tcp
        - CidrIpv6: ::/0
          FromPort: 80
          ToPort: 80
          IpProtocol: tcp

Mappings:
  Region2AMI:
    ap-south-1:
      AMI: ami-03cb1380eec7cc118
    eu-north-1:
      AMI: ami-078e13ebe3b027f1c
    eu-west-3:
      AMI: ami-00575c0cbc20caf50
    eu-west-2:
      AMI: ami-0b026d11830afcbac
    eu-west-1:
      AMI: ami-06e0ce9d3339cb039
    ap-northeast-3:
      AMI: ami-0171e161a6e0c595c
    ap-northeast-2:
      AMI: ami-0eb14fe5735c13eb5
    ap-northeast-1:
      AMI: ami-08a8688fb7eacb171
    ca-central-1:
      AMI: ami-0843f7c45354d48b5
    sa-east-1:
      AMI: ami-0344e5787e2e93144
    ap-southeast-1:
      AMI: ami-0753e0e42b20e96e3
    ap-southeast-2:
      AMI: ami-047dcdc46ac4f2e6b
    eu-central-1:
      AMI: ami-004359656ecac6a95
    us-east-1:
      AMI: ami-0ed9277fb7eb570c9
    us-east-2:
      AMI: ami-064ff912f78e3e561
    us-west-1:
      AMI: ami-0746394790be7162e
    us-west-2:
      AMI: ami-098e42ae54c764c35

Outputs:
  InstanceId:
    Value: !Ref EC2Instance
    Description: Instance Id of newly created EC2 instance

The LangServe server will take a few minutes *after* the instance is stood up and passes checks. To see where it's up to connect the instance and run `sudo cat /var/log/cloud-init-output.log`.

Just like in the locally hosted version you can check it's working by going to the: `[public-ipv4]:8000/docs` and `[public-ipv4]:8000/playground` pages (make sure to substitute the `[public-ipv4]` variables with the correct Public IPv4 address you can find in the EC2 Instance Summary page. This is if you do NOT have a domain name and will bypass `nginx` altogether (`nginx` will not work in this instance)

If you do have a domain name and have applied it correctly then you can access the same at `[YOUR_DOMAIN_NAME]/docs` and `[YOUR_DOMAIN_NAME]/playground`, respectively.

For documentation on how to correctly point you A RECORD to the `public-ipv4` address on Cloudflare: https://developers.cloudflare.com/dns/manage-dns-records/how-to/create-dns-records/

This will vary depending on your chosen DNS host.

***NOTE: The LangServe service in this demo provides limited resiliency and security measures. In a production setting you may want to apply additional security measures such as blocking unrequired ports including port 8000 and forcing traffic through port 443 using TLS only. You may also want to deploy the an ECR cluster instead, as well for better resiliency***.

FastAPI Security Docs: https://fastapi.tiangolo.com/tutorial/security/first-steps/

# Make Production Ready and Integrate

You're now ready to accept API requests to your LLM!

Here is a WordPress plugin that will create a Gutenberg block to interface with the `/invoke/` endpoint of your created API server: https://github.com/dr-robert-li/recipe-bot-langserve-invoke - check the `README.md` file for instructions.

Alternatively, you can simply add the URL to the navigation within Appearance options in your Page/Site Editor for a quick and dirty playground.

Otherwise API requests will allow you to integrate the Embedded LLM with almost anything, try it out on your `/docs/` Swagger page.

# Recommended WordPress Plugins

https://aipower.org/

  * AI Toolkit for WordPress that provides SEO and content writing functionality, image generation, fine tuning, embedding and a choice of foundational model.
  * More information can be found in the docs: https://docs.aipower.org/docs/category/introduction

https://meowapps.com/ai-engine/

* Simpler to use AI toolkit that also provides for content generation, embeddings, forms and a chatbot.