Lesson 2

---

# First example of LangChain 

<h4>Focus: Why 🦜🔗 LangChain</h4>

In [1]:
from PyPDF2 import PdfReader
from langchain.llms import OpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain

from IPython.display import display, HTML

from util import local_settings

display(HTML(f"""
✅ Libraries loaded successfully <br>
✅ OpenAI Key loaded (...{local_settings.OPENAI_API_KEY[10:-15]}...)
"""))


## 🦜🔗 **Why do we need LangChain?**

- Most of LLms (GPT-3.5, AI21Labs, LLaMA,…) are not up to date
- They are **not** good at **Domain Knowledge** and **fail** when working with **Proprietary Data**
- Working with different LLMs may become a tedious task

**Notes**:
- LLM's don't always produce the same results. The results you see in this notebook may differ from the results you see in the class.

## For example, ask a simple question about a fictional character

In [2]:
llm = OpenAI(model_name ="text-davinci-003")

person = "Ferdinando Langchain"

prompt = f"Who is {person}? How many prizes he won? Only respond if you effectively find information about this person. Otherwise, respond that you are sorry and do not have information about this person."

response = llm(prompt, temperature = 0)

print("----- response -----")
print(response.replace("Langchain", ""))

----- response -----


Sorry, I do not have information about Ferdinando .


<font color="orange"><h3> 👉 Given that Ferdinando Langchain is a fictional character with a distinctive name, no records or information about him could be found.</h3></font>


As an illustration, consider the task of loading a PDF file containing details about Ferdinando LangChain and then storing this information in a Vector Database. This process allows for efficient organization and retrieval of pertinent information for subsequent analysis within the context of artificial intelligence.

### Load the text of a PDF file

In [7]:
pdf_data = PdfReader("./context/biographies.pdf")

pdf_text = ""

for i, page in enumerate(pdf_data.pages):
    text = page.extract_text()
    if text:
        pdf_text += text

print(len(pdf_text))

16005


### Split the text into chunks

In [8]:
text_splitter = CharacterTextSplitter(
    separator = "\n",
    chunk_size = 400,
    chunk_overlap = 100
)

final_data = text_splitter.split_text(pdf_text)

print(f"""
    # of Chunks: {len(final_data)}
    Chunk 0: {final_data[0]}
    Chunk 1: {final_data[1]}
""")


    # of Chunks: 59
    Chunk 0: Ferdinando Langchain: Pioneering AI Engineer and Visionary  
 
Ferdinando Langchain was born on a brisk winter day in 1985 in Milan, Italy. From an early 
age, he displayed an insatiable curiosity about technology and a keen interest in 
understanding the mysteries of artificial intelligence. His journey into AI began when he
    Chunk 1: understanding the mysteries of artificial intelligence. His journey into AI began when he 
stumbled upon an old computer in his father's study, sparking a fascination that would 
shape his future.  
 
After completing his undergraduate studies in computer science at the University of Milan, 
Ferdinando ventured to the United States to pursue a Ph.D. in Artificial Intelligence at the



### Generate Embeddings and store the texts and embeddings

In [11]:
embeddings = OpenAIEmbeddings()
document_searcher = FAISS.from_texts(final_data, embeddings)
chain = load_qa_chain(OpenAI(), chain_type="stuff")

prompt

'Who is Ferdinando Langchain? How many prizes he won? Only respond if you effectively find information about this person. Otherwise, respond that you are sorry and do not have information about this person.'

### What happened?

<img src="./img/llm_vector-db_bio.png" width="700px"></img>

### 🧠 Now, you just need to ask

To ask and obtain the proper response, you must search the documents related to the question in the vector store. Subsequently, pass both the retrieved documents and the prompt to the model through a chain.

In [None]:
person = "Ferdinando Langchain"

prompt = f"Who is {person}? How many prizes he won? Only respond if you effectively find information about this person. Otherwise, respond that you are sorry and do not have information about this person."

docs =  document_searcher.similarity_search(prompt)
result = chain.run(input_documents=docs, question=prompt)

print("--- 🤖 RESULT ---")
print(result)

--- 🤖 RESULT ---

Ferdinando Langchain is a pioneering AI engineer and visionary who was born in 1985 in Milan, Italy. He won three prestigious international prizes for his contributions to the field of artificial intelligence.


<font color="yellow">
<h1>👇 look here</h1>
Note that, in the result of the following cell, only documents (texts) that have an effective relationship with the question were returned. In this case, only will be passed the documents related to the question. As a result, we reduce considerably the number of tokens passed to the model. This is important, as it optimizes your costs by using API calls more efficiently.
</font>


In [None]:
print(docs)

[Document(page_content='Ferdinando Langchain: Pioneering AI Engineer and Visionary  \n \nFerdinando Langchain was born on a brisk winter day in 1985 in Milan, Italy. From an early \nage, he displayed an insatiable curiosity about technology and a keen interest in \nunderstanding the mysteries of artificial intelligence. His journey into AI began when he'), Document(page_content='As Ferdinando Langchain continued his journey in the evolving landscape of artificial \nintelligence, his impact on the field and society at large became increasingly profound. His \nvision of creating a beneficial and inclusive AGI echoed through the corridor s of innovation, \nleaving an indelible mark on the future of artificial intelligence.  \n \nMortimer Quicksilver - Steampunk Inventor'), Document(page_content="models that could generalize across a wide range of tasks earned him widespread accl aim. \nNotably, his research contributed to advancements in natural language processing, \ncomputer vision, and

In [None]:
person = "Zephyra Blazeheart"
prompt = f"Who is {person}? How many prizes he won? Only respond if you effectively find information about this person. Otherwise, respond that you are sorry and do not have information about this person."

docs =  document_searcher.similarity_search(prompt)
result = chain.run(input_documents=docs, question=prompt)

print("--- 🤖 RESULT ---")
print(result)

--- 🤖 RESULT ---

Zephyra Blazeheart is a scientist who was known for her work in astrophysics and research into the depths of the universe. She won the prestigious Astral Pioneer Award in recognition of her intellectual prowess and contributions to humankind's exploration of space.


In [None]:
print(docs)

[Document(page_content='prestigious Astral Pioneer Award. This recognition not only celebrated her intellectual \nprowess but also acknowledged her role in propelling humanity into a new era of \nastrophysica l discovery. Zephyra Blazeheart became a beacon of inspiration for aspiring \nscientists and stargazers alike, proving that even the most elusive cosmic mysteries could be'), Document(page_content="propelling her into the forefront of astrophysical exploration.  \nZephyra Blazeheart's journey into the vast cosmos began under the radiant skies of \nLuminara, a town with an otherworldly charm that seemed to inspire a sense of wonder in \nthose who called it home. From a young age, Zephyra was insatiable and curious  about the \nnight sky's celestial wonders ."), Document(page_content="across the cosmos.  \n As she continued her cosmic odyssey, Zephyra Blazeheart's name became synonymous with \nthe spirit of exploration and the relentless pursuit of knowledge. Her research not only \

<h3><font color="Yellow">👋 the end </font></h3>