In [2]:
!pip install langchain
!pip install openai
!pip install python-dotenv
!pip install faiss-cpu

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


In [3]:
from dotenv import load_dotenv
import os

load_dotenv()
API_KEY = os.environ.get("API_KEY")


# Loaders
To use data with an LLM, documents must first be loaded into a vector database. The first step is to load them into memory via a loader

In [5]:
from langchain.document_loaders import DirectoryLoader, TextLoader

loader = DirectoryLoader(
    "./FAQ", glob="**/*.txt", loader_cls=TextLoader, show_progress=True
)
docs = loader.load()

100%|██████████| 1/1 [00:00<00:00, 1634.57it/s]


# Text splitter

Texts are not loaded 1:1 into the database, but in pieces, so called "chunks". You can define the chunk size and the overlap between the chunks.

In [6]:

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100,
)

documents = text_splitter.split_documents(docs)
documents[0]

Document(page_content='General\nQ: What is Amazon EC2 Auto Scaling?', metadata={'source': 'FAQ/ec3_autoscaling.txt'})

# Embeddings

Texts are not stored as text in the database, but as vector representations. Embeddings are a type of word representation that represents the semantic meaning of words in a vector space.

In [7]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(openai_api_key=API_KEY)

Loading Vectors into VectorDB (FAISS)
As created by OpenAIEmbeddings vectors can now be stored in the database. The DB can be stored as .pkl file

In [8]:
from langchain.vectorstores.faiss import FAISS
import pickle

vectorstore = FAISS.from_documents(documents, embeddings)

with open("vectorstore.pkl", "wb") as f:
    pickle.dump(vectorstore, f)

Loading the database
Before using the database, it must of course be loaded again.

In [9]:
with open("vectorstore.pkl", "rb") as f:
    vectorstore = pickle.load(f)

Prompts
With an LLM you have the possibility to give it an identity before a conversation or to define how question and answer should look like.

In [10]:
from langchain.prompts import PromptTemplate

prompt_template = """You are a helpful assistant for our cloud provider.

{context}

Question: {question}
Answer here:"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

Chains
With chain classes you can easily influence the behavior of the LLM

In [11]:
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

chain_type_kwargs = {"prompt": PROMPT}

llm = OpenAI(openai_api_key=API_KEY)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(),
    chain_type_kwargs=chain_type_kwargs,
)

query = "how to ensure security?"
qa.run(query)

'\n\nYou can ensure security with Amazon EC2 Auto Scaling by using AWS Identity and Access Management (IAM) to control which users and services have access to your EC2 Auto Scaling resources. You can also use encryption and other security measures such as network access control lists (ACLs) to protect your resources. Additionally, you can use security scanning tools and vulnerability assessments to identify potential security risks and take steps to mitigate them.'

Memory
In the example just shown, each request stands alone. A great strength of an LLM, however, is that it can take the entire chat history into account when responding. For this, however, a chat history must be built up from the different questions and answers. With different memory classes this is very easy in Langchain.

In [12]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(
    memory_key="chat_history", return_messages=True, output_key="answer"
)

Use Memory in Chains
The memory class can now easily be used in a chain. This is recognizable, for example, by the fact that when one speaks of "it", the bot understands the rabbit in this context.

In [13]:
from langchain.chains import ConversationalRetrievalChain

qa = ConversationalRetrievalChain.from_llm(
    llm=OpenAI(model_name="text-davinci-003", temperature=0.7, openai_api_key=API_KEY),
    memory=memory,
    retriever=vectorstore.as_retriever(),
    combine_docs_chain_kwargs={"prompt": PROMPT},
)


query = "How long is the turnaround time?"
qa({"question": query})
qa({"question": "What does it depend on?"})

{'question': 'What does it depend on?',
 'chat_history': [HumanMessage(content='How long is the turnaround time?', additional_kwargs={}, example=False),
  AIMessage(content=' The turnaround time is within minutes. The majority of replacements happen within less than 5 minutes, and on average it is significantly less than 5 minutes. It depends on a variety of factors, including how long it takes to boot up the AMI of your instance.', additional_kwargs={}, example=False),
  HumanMessage(content='What does it depend on?', additional_kwargs={}, example=False),
  AIMessage(content=' The turnaround time for replacements is affected by the size of the instance, the type of instance, the availability of resources, and the current demand for those resources. Additionally, the time it takes to boot up the AMI of the instance can also affect the turnaround time.', additional_kwargs={}, example=False)],
 'answer': ' The turnaround time for replacements is affected by the size of the instance, the 