In [1]:
# install langchain (version 0.0.191)
!pip install langchain==0.0.316
# install chromadb
!pip install chromadb==0.3.29
# install tiktoken
!pip install tiktoken
# install beautifulsoup4
!pip install beautifulsoup4

Collecting langchain==0.0.316
  Downloading langchain-0.0.316-py3-none-any.whl.metadata (15 kB)
Collecting anyio<4.0 (from langchain==0.0.316)
  Downloading anyio-3.7.1-py3-none-any.whl.metadata (4.7 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain==0.0.316)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langsmith<0.1.0,>=0.0.43 (from langchain==0.0.316)
  Downloading langsmith-0.0.92-py3-none-any.whl.metadata (9.9 kB)
Collecting jsonpointer>=1.9 (from jsonpatch<2.0,>=1.33->langchain==0.0.316)
  Downloading jsonpointer-2.4-py2.py3-none-any.whl.metadata (2.5 kB)
Downloading langchain-0.0.316-py3-none-any.whl (1.9 MB)
   ---------------------------------------- 0.0/1.9 MB ? eta -:--:--
   - -------------------------------------- 0.1/1.9 MB 1.1 MB/s eta 0:00:02
   ------ --------------------------------- 0.3/1.9 MB 2.9 MB/s eta 0:00:01
   ---------------- ----------------------- 0.8/1.9 MB 5.4 MB/s eta 0:00:01
   ---------------------------------------- 

# Task 1: Load Data

To be able to embed and store data, we need to provide LangChain with Documents. This is easy to achieve in LangChain thanks to Document Loaders. In our case, we're targeting a "Read the docs" documentation, for which there is a loader ReadTheDocsLoader. In the folder rtdocs, you'll find all the HTML files from the [LangChain documentation](https://python.langchain.com/en/latest/index.html).

```bash
wget -r -A.html -P rtdocs https://python.langchain.com/en/latest/
```

In a bash console execute this code:
```bash
unzip contents.zip
```

Our first task is to load these HTML files as documents that we can use with langchain: we're going to use the ReadTheDocsLoader. It will read the directory containing all HTML files and transform them into Document objects.

`ReadTheDocsLoader` will read each HTML file, remove HTML tags to only keep the text and return it as a Document. At the end of this task, we'll have a variable raw_documents containing a list of Document: one Document per HTML file.

In [2]:
# Import ReadTheDocsLoader
from langchain.document_loaders import ReadTheDocsLoader

# Create a loader for the `rtdocs/python.langchain.com/en/latest` folder
loader = ReadTheDocsLoader("rtdocs/python.langchain.com/en/latest", features="html.parser", encoding='utf-8')

# Load the data
raw_documents = loader.load()

In [3]:
print("Size raw documents: ",len(raw_documents))

Size raw documents:  999


In [4]:
print(raw_documents[0])

page_content='.rst\n.pdf\nDeploying LLMs in Production\n Contents \nOutline\nDesigning a Robust LLM Application Service\nMonitoring\nFault tolerance\nZero down time upgrade\nLoad balancing\nMaintaining Cost-Efficiency and Scalability\nSelf-hosting models\nResource Management and Auto-Scaling\nUtilizing Spot Instances\nIndependent Scaling\nBatching requests\nEnsuring Rapid Iteration\nModel composition\nCloud providers\nInfrastructure as Code (IaC)\nCI/CD\nDeploying LLMs in Production#\nIn today’s fast-paced technological landscape, the use of Large Language Models (LLMs) is rapidly expanding. As a result, it’s crucial for developers to understand how to effectively deploy these models in production environments. LLM interfaces typically fall into two categories:\nCase 1: Utilizing External LLM Providers (OpenAI, Anthropic, etc.)In this scenario, most of the computational burden is handled by the LLM providers, while LangChain simplifies the implementation of business logic around these 

# Task 2: Slice the documents into smaller chunks

Now, we turned each HTML file into a Document. These files may be very long, and are potentially too large to embed fully. It's also a good practice to avoid embedding large documents:
- long documents often contain several concepts. Retrieval will be easier if each concept is indexed separately;
- retrieved documents will be injected in a prompt, so keeping them short will keep the prompt small.

LangChain has a collection of tools to do this:
[Text Splitters](https://python.langchain.com/en/latest/modules/indexes/text_splitters.html).

We'll be using the most straightfoward one and simplest to use:
the [Recursive Character Text Splitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html).

*The `recursive text splitter` will recursively reduce the input by splitting it by paragraph, then sentences, then words as needed until the chunk is small enough.*
​

In [5]:
# Import RecursiveCharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Create the text splitter
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

# Split the documents
documents = splitter.split_documents(raw_documents)

In [6]:
print("Size documents: ",len(documents))

Size documents:  7538


In [7]:
documents[0]

Document(page_content='.rst\n.pdf\nDeploying LLMs in Production\n Contents \nOutline\nDesigning a Robust LLM Application Service\nMonitoring\nFault tolerance\nZero down time upgrade\nLoad balancing\nMaintaining Cost-Efficiency and Scalability\nSelf-hosting models\nResource Management and Auto-Scaling\nUtilizing Spot Instances\nIndependent Scaling\nBatching requests\nEnsuring Rapid Iteration\nModel composition\nCloud providers\nInfrastructure as Code (IaC)\nCI/CD\nDeploying LLMs in Production#\nIn today’s fast-paced technological landscape, the use of Large Language Models (LLMs) is rapidly expanding. As a result, it’s crucial for developers to understand how to effectively deploy these models in production environments. LLM interfaces typically fall into two categories:', metadata={'source': 'rtdocs\\python.langchain.com\\en\\latest\\additional_resources\\deploy_llms.html'})

# Task 3: count tokens and get a cost estimate of embedding

We're ready to embed our documents. Before we do so, we'd like to get an idea of how big it is and how much it will cost to embed. To do so, we'll use the [`tiktoken`](https://github.com/openai/tiktoken) library. tiktoken allows to encode and decode strings of text into tokens. In our case, we're mostly interested in how many tokens our documents translate to.

> 💡 To better understand what a token is to GPT, head to [OpenAI's Tokenizer page](https://platform.openai.com/tokenizer) where you can see how a text translates to tokens.

Prices for different models in OpenAI can be found on their [pricing page](https://openai.com/pricing).

Prices for different models in Azure OpenAI can be found on their [pricing page]([Title](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/))

In [8]:
# Import tiktoken
import tiktoken

# Create an encoder
encoder = tiktoken.encoding_for_model("text-embedding-ada-002")

# Count tokens in each document
doc_tokens = [len(encoder.encode(doc.page_content)) for doc in documents]

# Calculate the sum of all token counts
total_tokens = sum(doc_tokens)

# Calculate a cost estimate
cost = (total_tokens/1000) * 0.0004
print(f"Total tokens: {total_tokens} - cost: ${cost:.2f}")

Total tokens: 1530817 - cost: $0.61


# Task 4: embed the documents and store embeddings in the vector database

We'll want to save the embeddings into a database. LangChain can take care of all that using a [Vector Store](https://python.langchain.com/en/latest/modules/indexes/vectorstores.html).

There are plenty of vector stores to choose from (see the [full list](https://python.langchain.com/en/latest/modules/indexes/vectorstores.html)). Today we'll use [Chroma](https://docs.trychroma.com/), but you could be using any other as they have the same interface in LangChain. Once again you'll need to try many of them to see which best fits your use case: some vector stores have specific features (like multimodality or multilingual), so be sure to check them out.

Chroma is simple to use and can be persisted to disk. If you do not whish to embed the full set of documents yourself, feel free to skip this step and use the provided folder `chroma-data-langchain-docs`: we've already embedded all documents and persisted it in this folder.

In [9]:
# set the environment variables needed for openai package to know to reach out to azure
import os

OPENAI_API_KEY="1c8ea8ebab1f40ab888906d7abcc8b4c"
OPENAI_API_BASE="https://clasebi.openai.azure.com/"
OPENAI_API_VERSION="2023-03-15-preview"
OPENAI_API_TYPE="azure"

os.environ["OPENAI_API_TYPE"] = OPENAI_API_TYPE
os.environ["OPENAI_API_BASE"] = OPENAI_API_BASE
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
os.environ["OPENAI_API_VERSION"] = OPENAI_API_VERSION

In [10]:
!pip install openai==0.28.1



In [11]:
# Import chroma
from langchain.vectorstores import Chroma

# Import OpenAIEmbeddings
from langchain.embeddings.openai import OpenAIEmbeddings

# Create the mebedding function
embedding_function = OpenAIEmbeddings(deployment="text-embedding-ada-002",chunk_size = 1)

In [12]:
# Texting the embedding function

input_text = "This is for demonstration."
outcome = embedding_function.embed_query(input_text)
print(outcome)
print(len(outcome))

[-0.012424956126922663, 0.010575136213262644, 0.0013741046479578078, -0.0091363871845665, -0.008970632254700215, 0.01422836451120851, -0.008141860399336523, 0.0015614072390755828, -0.006974948598804315, -0.023987988108656766, 0.00870542511197222, 0.008930851742084564, -0.013817293626244634, -0.003368131131740689, -0.003232212517658721, 0.004899702520693243, 0.016243938423412235, -0.016601968624888573, 0.01416206272552651, -0.029650161164692993, -0.012710053525958484, 0.013452633339332352, 0.005370445105903176, 0.012729944713588888, -0.029411476226353924, -0.0038587644389197367, 0.020275088110464846, -0.02686548616604905, 0.02290063919600102, -0.020885065283797292, 0.0019989991642751603, 0.005768255819995167, 0.013671429884008752, -0.04102754982289814, -0.011151962028092805, -0.01131108594120057, 0.012716683611394426, -0.022410005423160684, -0.0032869114209971764, 0.004617919698714105, -0.011748678099230792, 0.017397590053072558, 0.006364972356794444, -0.027077652625289512, -0.012100077

In [13]:
# Create a database from the documents and embedding function
db = Chroma.from_documents(documents=documents[:10], embedding=embedding_function, persist_directory="my-embeddings")



In [14]:
# Persist the data to disk
db.persist()

In [15]:
db.get().keys()

dict_keys(['ids', 'embeddings', 'documents', 'metadatas'])

In [16]:
db.get()['documents'][0]

'gpu="A10G",\n           python_version="python3.8",\n           python_packages=[\n               "diffusers[torch]>=0.10",\n               "transformers",\n               "torch",\n               "pillow",\n               "accelerate",\n               "safetensors",\n               "xformers",],\n           max_length="50",\n           verbose=False)\nllm._deploy()\nresponse = llm._call("Running machine learning on a remote GPU")\nprint(response)\nprevious\nBanana\nnext\nBedrock\nBy Harrison Chase\n    \n      © Copyright 2023, Harrison Chase.\n      \n  Last updated on Jun 06, 2023.'

## Alternative: use the provided embeddings

We have already executed the step above to embed all documents and stored the result in the `chroma-data-langchain-docs` folder. Instead of embedding all the documents yourself, you can use these embeddings at no cost.

The result of this step is the same as the step above, but will not call the OpenAI API and cost nothing.

In [17]:
# Import chroma
from langchain.vectorstores import Chroma

# Import OpenAIEmbeddings
from langchain.embeddings.openai import OpenAIEmbeddings

# Create the embedding function
embedding = OpenAIEmbeddings(deployment="text-embedding-ada-002",chunk_size = 1)

# Load the database from existing embeddings
db = Chroma(persist_directory="chroma-data-langchain-docs", embedding_function=embedding)

# Step 5: query the vector database

Now that we have a vector database, we can query it. A vector database stores embeddings (vectors) and allow to search through them using K-Nearest Neighbors algorithm (or a variation of it). When we query it the following will happen:
1. Embed the text query to obtain a vector. It is crucial that this embedding is made using the same embedding technique that was used to embed the documents;
2. Calculate the distance (or similarity) between the query vector and all other vectors;
3. Sort results by similarity;
4. Return the most similar documents.

To do this with LangChain, we can use the `.similarity_search_with_score()` method of the database.

In [18]:
db.get()['documents'][0]

'previous\nIntegrations\nnext\nAleph Alpha\nBy Harrison Chase\n    \n      © Copyright 2023, Harrison Chase.\n      \n  Last updated on Jun 06, 2023.'

In [19]:
# Call the `similarity_search_with_score` method on `db`
results = db.similarity_search_with_score("how do i load data from wikipedia?")

In [20]:
for (doc, score) in results:
    print('score', score)
    print(doc.page_content)
    print('-----------------')
    break

score 0.2894218862056732
.ipynb
.pdf
Wikipedia
 Contents 
Installation
Examples
Wikipedia#
Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system called MediaWiki. Wikipedia is the largest and most-read reference work in history.
This notebook shows how to load wiki pages from wikipedia.org into the Document format that we use downstream.
Installation#
First, you need to install wikipedia python package.
#!pip install wikipedia
Examples#
WikipediaLoader has these arguments:
query: free text which used to find documents in Wikipedia
optional lang: default=”en”. Use it to search in a specific language part of Wikipedia
optional load_max_docs: default=100. Use it to limit number of downloaded documents. It takes time to download all 100 documents, so use a small number for experiments. There is a hard limit of 300 for now.
-----------------


In [21]:
# Print the results
for (doc, score) in results:
    print('score', score)
    print(doc.page_content)
    print('-----------------')

score 0.2894218862056732
.ipynb
.pdf
Wikipedia
 Contents 
Installation
Examples
Wikipedia#
Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system called MediaWiki. Wikipedia is the largest and most-read reference work in history.
This notebook shows how to load wiki pages from wikipedia.org into the Document format that we use downstream.
Installation#
First, you need to install wikipedia python package.
#!pip install wikipedia
Examples#
WikipediaLoader has these arguments:
query: free text which used to find documents in Wikipedia
optional lang: default=”en”. Use it to search in a specific language part of Wikipedia
optional load_max_docs: default=100. Use it to limit number of downloaded documents. It takes time to download all 100 documents, so use a small number for experiments. There is a hard limit of 300 for now.
-----------------
score 0.3245

# Step 6: Create a QA chain

Let's put it all together into a chat-like application. We want the user to ask a question, then search for relevant documents. We'll then create a prompt that includes the documents and the question so GPT can answer it (if possible).

First, we'll query the database in a similar manner to previous step. We'll use `.similarity_search()`:

```python
question = "show an example of adding memory to a chain"
context_docs = db.similarity_search(question)
```

Next, we will create a prompt that contains the question and the relevant documents:

> You can think of a PromptTemplate as an fstring in python: values in curly brances are used as placeholder and will be replaced by values we pass when running the chain.

```python
prompt = PromptTemplate(
    template=
    """"Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
        <context>
        {context}
        </context>
Question: {question}
Helpful Answer:""",
    input_variables=["context", "question"]
)
```

To call the LLM with this prompt, we need to create an `LLMChain` and pass it an LLM and the prompt:

```python
llm = ChatOpenAI(temperature=0)
qa_chain = LLMChain(llm=llm, prompt=prompt)
```

We can now call our chain like so:

```python
qa_chain({"context": "<the context>", "question": "<the question>"})
```

This will return a dict with a `text` key containing the LLM response.

In [24]:
# Import
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.chat_models import AzureChatOpenAI
from langchain.schema import HumanMessage

# Set the question variable
question = "show an example of adding memory to a chain"

# Query the database as store the results as `context_docs`
context_docs = db.similarity_search(question)

# Create a prompt with 2 variables: `context` and `question`
prompt = PromptTemplate(
    template=""""Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

<context>
{context}
</context>

Question: {question}
Helpful Answer, formatted in markdown:""",
    input_variables=["context", "question"]
)

# Create an LLM with ChatOpenAI
llm = AzureChatOpenAI(
    openai_api_base=OPENAI_API_BASE,
    openai_api_version=OPENAI_API_VERSION,
    deployment_name="gpt-35-turbo",
    openai_api_key=OPENAI_API_KEY,
    openai_api_type=OPENAI_API_TYPE,
)

In [25]:
# Create the chain
qa_chain = LLMChain(llm=llm, prompt=prompt)

# Call the chain
result = qa_chain({
    "question": question,
    "context": "\n".join([doc.page_content for doc in context_docs])
})

# Print the result
print(result["text"])

To add memory to a chain, you can use the `ConversationChain` class from the `langchain.chains` module. In the given example, the memory is added to a conversation chain using the `ConversationBufferMemory` class from the `langchain.memory` module:

```
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

conversation = ConversationChain(
    llm=chat,
    memory=ConversationBufferMemory()
)
```

This creates a `ConversationChain` object called `conversation` with the specified language model (`llm`) and memory (`memory`). The `ConversationBufferMemory` class allows the chain to persist data across multiple calls.

You can then use the `run` method of the conversation chain to interact with it and access the memory:

```
conversation.run("Answer briefly. What are the first 3 colors of a rainbow?")
conversation.run("And the next 4?")
```

In the above example, the conversation chain is asked two questions. The first question is about the 