In [1]:
%%capture
!pip install langchain==0.1.1 openai==1.8.0 langchain-openai tiktoken faiss-cpu

In [2]:
import os
import getpass

In [3]:
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter Your OpenAI API Key:")

Enter Your OpenAI API Key:··········


In [4]:
!wget -O "golden_hymns_of_epictetus.txt" https://www.gutenberg.org/cache/epub/871/pg871.txt

--2024-01-26 04:19:40--  https://www.gutenberg.org/cache/epub/871/pg871.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 152365 (149K) [text/plain]
Saving to: ‘golden_hymns_of_epictetus.txt’


2024-01-26 04:19:41 (816 KB/s) - ‘golden_hymns_of_epictetus.txt’ saved [152365/152365]



In [5]:
filename = "/content/golden_hymns_of_epictetus.txt"

start_saving = False
stop_saving = False
lines_to_save = []

with open(filename, 'r') as file:
    for line in file:
        if "Are these the only works of Providence within us?" in line:
            start_saving = True
        if "*** END OF THE PROJECT GUTENBERG EBOOK THE GOLDEN SAYINGS OF EPICTETUS, WITH THE HYMN OF CLEANTHES ***" in line:
            stop_saving = True
            break
        if start_saving and not stop_saving:
            lines_to_save.append(line)

# Write the stored lines back to the file
with open(filename, 'w') as file:
    for line in lines_to_save:
        file.write(line)

In [6]:
word_count = 0

with open(filename, 'r') as file:
    for line in file:
        words = line.split()
        word_count += len(words)

print(f"The total number of words in the file is: {word_count}")

The total number of words in the file is: 23503


# 🔍 **Retrieval in LangChain Explained**

<img src="https://python.langchain.com/assets/images/data_connection-95ff2033a8faa5f3ba41376c0f6dd32a.jpg">

### 🌐 **Basic Concept**

Retrieval is like gathering resources to enhance an essay, helping language models access up-to-date, relevant information beyond their built-in knowledge.

💡 **Advantages**:
   - Adds new, fresh information.
   - Makes responses more relevant and informed.

📚 **Document Loaders**:
   - Function as "specialized librarians."
   - Organize content from various sources for language models.

📄 **Text Loader Fundamentals**:
   - Simple process: Converts text files into a usable format for language models.

🎯 **Presentation Style**:
   - Brief and informative, ideal for a concise summary.

In [7]:
from langchain.document_loaders import TextLoader
loader = TextLoader("/content/golden_hymns_of_epictetus.txt")
golden_sayings = loader.load()

In [8]:
type(golden_sayings)

list

In [9]:
type(golden_sayings[0])

langchain_core.documents.base.Document

# 🔄 **Document Loaders in LangChain**:

📋 **Wide Selection**: Numerous document loaders available. Check the [documentation](https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/document_loaders) for a full list.

👣 **Usage Steps**:
   1. Choose a Document Loader from LangChain.
   2. Create an instance of the Document Loader.
   3. Employ its `load()` method to convert files into LangChain documents.

### 🛠️ **Role of Document Transformers**

📐 **Customization for Models**: Adjust documents to suit your model's requirements, like trimming lengthy texts.

### ✂️ **Understanding Text Splitters**

🔢 **Function**: Divide long texts into smaller, coherent segments.

🔗 **Goal**: Keep related text together, fitting within the model's capacity.

### 🧩 **Using `RecursiveCharacterTextSplitter`**

🔄 **Methodology**:
   - Intelligently splits texts using multiple separators.

   - Recursively adjusts if segments are too large.

   - Ensures all parts are appropriately sized.

### 🌟 **Key Aspects of Splitting**

   - Chooses optimal separators for division.

   - Continually splits large chunks.

   - Balances chunk size by characters or tokens.

   - Maintains some overlap for context.

   - Tracks chunk starting points if needed.

🎯 **Presentation Style**

   - Focused on essential steps and features, great for a concise summary.

In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap = 50,
    length_function = len,
    add_start_index = True
)

In [11]:
texts = text_splitter.split_documents(golden_sayings)

In [12]:
print(texts[0])
print(texts[1])

page_content='Are these the only works of Providence within us? What words suffice to\npraise or set them forth? Had we but understanding, should we ever\ncease hymning and blessing the Divine Power, both openly and in secret,\nand telling of His gracious gifts? Whether digging or ploughing or\neating, should we not sing the hymn to God:—\n\n_Great is God_, for that He hath given us such instruments to till the\nground withal:\n\n\n_Great is God_, for that He hath given us hands and the power of\nswallowing and digesting; of unconsciously growing and breathing while\nwe sleep!\n\n\nThus should we ever have sung; yea and this, the grandest and divinest\nhymn of all:—\n\n_Great is God_, for that He hath given us a mind to apprehend these\nthings, and duly to use them!' metadata={'source': '/content/golden_hymns_of_epictetus.txt', 'start_index': 0}
page_content='What then! seeing that most of you are blinded, should there not be\nsome one to fill this place, and sing the hymn to God on be

# 🌐 **Text Embeddings Overview**

🔢 **Functionality**: Converts documents into numerical vectors in LangChain.

🤝 **Similarity Measure**: Vectors that are closer indicate more similar texts.

🔍 **Application**: Quickly identify documents with similar topics or content.

🎯 **Presentation Style**: Concise and clear, ideal for slides or quick explanations.

In [13]:
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

# 🛠️ **Creating a Vector Store Retriever**

1. **Load Documents**: Utilize a document loader for initial document retrieval.

2. **Split Texts**: Break down documents into smaller sections with a text splitter.

3. **Embedding Conversion**: Apply an embedding model to transform text chunks into vectors.

4. **Vector Store Creation**: Compile these vectors into a vector store.

🔍 **Outcome**: Your vector store is now set up to search and retrieve texts by content.

In [14]:
from langchain.vectorstores import FAISS
vectorstore = FAISS.from_documents(documents=texts, embedding=OpenAIEmbeddings())

# 🔎 **Vector Store as a Retriever**

1. **Search Engine Role**: The vector store functions like a document search engine.

2. **Similarity Searches**: Find documents similar to your provided text.

3. **Customization Options**: Specify match selectivity and desired number of top results.

✨ **Functionality**: Use `similarity_search` to pinpoint documents closely matching your specified text, with flexibility in refining search parameters.

In [15]:
query = "How can I practice mindfulness if I am always so busy and distracted?"

vectorstore.similarity_search(query)

[Document(page_content='One who has had fever, even when it has left him, is not in the same\ncondition of health as before, unless indeed his cure is complete.\nSomething of the same sort is true also of diseases of the mind.\nBehind, there remains a legacy of traces and blisters: and unless these\nare effectually erased, subsequent blows on the same spot will produce\nno longer mere blisters, but sores. If you do not wish to be prone to\nanger, do not feed the habit; give it nothing which may tend its\nincrease. At first, keep quiet and count the days when you were not\nangry: “I used to be angry every day, then every other day: next every\ntwo, next every three days!” and if you succeed in passing thirty days,\nsacrifice to the Gods in thanksgiving.\n\nLXXVI\n\nHow then may this be attained?—Resolve, now if never before, to approve\nthyself to thyself; resolve to show thyself fair in God’s sight; long\nto be pure with thine own pure self and God!\n\nLXXVII', metadata={'source': '/co

# Generate

In [18]:
from langchain.chains import RetrievalQA

from langchain.prompts import PromptTemplate

from langchain_openai import ChatOpenAI

template = """

Use the following pieces of context to answer the question at the end.

If you don't know the answer, just say 'Ah snap homie, I ain't gonna front. I don't know.`, don't try to make up an answer.

Use three sentences maximum, relevant analogies, and keep the answer as concise as possible.

Use the active voice, and speak directly to the reader using concise language.
{context}

Question: {question}

Helpful Answer:

"""

QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

llm = ChatOpenAI(model_name="gpt-4", temperature=0)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectorstore.as_retriever(),
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)


query = "What do grief, fear, envy, and desire stem from?"


result = qa_chain.invoke({"query": query})

result["result"]

"Grief, fear, envy, and desire stem from the mind's focus on external factors and the lack of inner tranquility. They are considered 'monsters' of the mind that can only be cast out by focusing on God, or a higher power, and following His commands. These emotions are also strengthened by corresponding acts, forming habits that can lead to mental distress if not checked by reason and self-discipline."

# 🛠️ **Using LCEL for Retrieval**

1. **Integrate Context and Question**: The prompt template includes placeholders for context and question.

2. **Preliminary Setup**
   - Set up a retriever with an in-memory store for document retrieval.

   - Runnable components can be chained or run separately.

3. **RunnableParallel for Input Preparation**

   - Use `RunnableParallel` to combine document search results and the user's question.

   - `RunnablePassthrough` passes the user's question unchanged.

4. **Workflow Steps**

   - **Step 1**: Create `RunnableParallel` with two entries: 'context' (document results) and 'question' (user's original query).

   - **Step 2**: Feed the dictionary to the prompt component, which constructs a prompt using the user's question and retrieved documents.

   - **Step 3**: Model component evaluates the prompt with OpenAI LLM

   - **Step 4**: `Output_parser` transforms response into a readable Python string.

🔄 **End-to-End Process**: From document retrieval and prompt creation to model evaluation and output parsing, the flow seamlessly integrates various components for an effective LLM-driven response.

In [20]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

output_parser = StrOutputParser()

setup_and_retrieval = RunnableParallel(
    {"context": vectorstore.as_retriever(), "question": RunnablePassthrough()}
)
chain = setup_and_retrieval | QA_CHAIN_PROMPT | llm | output_parser

chain.invoke(query)

'Grief, fear, envy, and desire stem from the mind. They are considered "monsters" of the mind that can only be cast out by focusing on God, according to the text. These emotions are also strengthened by corresponding acts, forming habits that can lead to mental distress if not checked by reason.'