# WELCOME

This notebook will guide you through two increasingly significant applications in the realm of Generative AI: RAG (Retrieval Augmented Generation) chatbots and text summarization for big text.

Through two distinct projects, you will explore these technologies and enhance your skills. Detailed descriptions of the projects are provided below.

## Project 1: Building a Chatbot with a PDF Document (RAG)

In this project, you will develop a chatbot using a provided PDF document from web page. You will utilize the Langchain framework along with a large language model (LLM) such as GPT or Gemini. The chatbot will leverage the Retrieval Augmented Generation (RAG) technique to comprehend the document's content and respond to user queries effectively.

### **Project Steps:**

- **1.PDF Document Upload:** Upload the provided PDF document from web page (https://aclanthology.org/N19-1423.pdf) (BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding).

- **2.Chunking:** Divide the uploaded PDF document into smaller segments (chunks). This facilitates more efficient information processing by the LLM.

- **3.ChromaDB Setup:**
  - Save ChromaDB to your Google Drive.

  - Retrieve ChromaDB from your Drive to begin using it in your project.

  - ChromaDB serves as a vector database to store embedding vectors generated from your document.

- **4.Embedding Vectors Creation:**
  - Convert the chunked document into embedding vectors. You can use either GPT or Gemini embedding models for this purpose.

  - If you choose the Gemini embedding model, set "task_type" to "retrieval_document" when converting the chunked document.

- **5.Chatbot Development:**
  - Utilize the **load_qa_chain** function from the Langchain library to build the chatbot.

  - This function will interpret user queries, retrieve relevant information from **ChromaDB**, and generate responses accordingly.



### Install Libraries

In [None]:
!pip install -qU langchain-community

In [None]:
!pip install -qU langchain-google-community

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/99.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m99.6/99.6 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
!pip install -qU langchain-openai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/74.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m74.5/74.5 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
!pip install -qU langchain-chroma

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.8/19.8 MB[0m [31m71.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.2/284.2 kB[0m [31m18.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m71.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m103.3/103.3 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.5/16.5 MB[0m [31m64.5 MB/s[0m eta [36m0:00:

In [None]:
!pip install -qU pypdfium2

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.5/48.5 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.8/2.8 MB[0m [31m35.7 MB/s[0m eta [36m0:00:00[0m
[?25h

### Access Google Drive

### Entering Your OpenAI or Google Gemini API Key.

In [None]:
import os
from google.colab import userdata

os.environ['OPENAI_API_KEY']=userdata.get('OPENAI_API_KEY')

### Loading PDF Document

In [None]:
import requests
from langchain_community.document_loaders import PyPDFium2Loader

def read_doc_from_url(url):
    # Download the PDF file
    response = requests.get(url)
    if response.status_code != 200:
        raise Exception(f"Failed to download PDF: {response.status_code}")

    # Save it temporarily
    temp_path = "temp_downloaded.pdf"
    with open(temp_path, "wb") as f:
        f.write(response.content)

    # Load with PyPDFium2Loader
    loader = PyPDFium2Loader(temp_path)
    pdf_documents = loader.load()
    return pdf_documents

In [None]:
# # create a pdf reader function
# from langchain_community.document_loaders import PyPDFium2Loader

# def read_doc(directory):
#     file_loader=PyPDFium2Loader(directory)
#     pdf_documents=file_loader.load()  # PyPDFium2Loader reads page by page
#     return pdf_documents

In [None]:
# Usage
url = "https://aclanthology.org/N19-1423.pdf"
pdf = read_doc_from_url(url)

In [None]:
#pdf = read_doc('/content/attention is all you need.pdf')

print(f"=" * 55)

print(f"The 'attention is all you need.pdf' file has {len(pdf)} pages.")

# The document consists of 16 pages

The 'attention is all you need.pdf' file has 16 pages.


In [None]:
pdf[1]

Document(metadata={'producer': 'pdfTeX-1.40.18', 'creator': 'LaTeX with hyperref package', 'creationdate': '2019-04-29T17:36:03+00:00', 'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding', 'author': 'Jacob Devlin ; Ming-Wei Chang ; Kenton Lee ; Kristina Toutanova', 'subject': 'N19-1 2019', 'keywords': '', 'moddate': '2019-04-29T17:36:03+00:00', 'source': 'temp_downloaded.pdf', 'total_pages': 16, 'page': 1}, page_content='4172\nword based only on its context. Unlike left-to\x02right language model pre-training, the MLM ob\x02jective enables the representation to fuse the left\nand the right context, which allows us to pre\x02train a deep bidirectional Transformer. In addi\x02tion to the masked language model, we also use\na “next sentence prediction” task that jointly pre\x02trains text-pair representations. The contributions\nof our paper are as follows:\n• We demonstrate the importance of bidirectional\npre-training for language representations.

### Document Splitter

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter


def chunk_data(docs, chunk_size=1000, chunk_overlap=200):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size,        # each chunk will be a maximum of 1000 characters long
                                                   chunk_overlap=chunk_overlap)  # each chunk will share a 200-character overlap with the previous chunk to maintain context
    pdf = text_splitter.split_documents(docs)
    return pdf
# This code splits documents into chunks using the RecursiveCharacterTextSplitter class from the langchain library.

In [None]:
pdf_doc = chunk_data(docs=pdf)

len(pdf_doc)

# divided into 53 pieces

83

In [None]:
pdf_doc[25:27]

### 1. Creating A Embedding Model


In [None]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large",
                              dimensions=3072)  # dimensions=256, 1024, 3072
embeddings

OpenAIEmbeddings(client=<openai.resources.embeddings.Embeddings object at 0x7e9853ec30b0>, async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x7e9853459970>, model='text-embedding-3-large', dimensions=3072, deployment='text-embedding-ada-002', openai_api_version=None, openai_api_base=None, openai_api_type=None, openai_proxy=None, embedding_ctx_length=8191, openai_api_key=SecretStr('**********'), openai_organization=None, allowed_special=None, disallowed_special=None, chunk_size=1000, max_retries=2, request_timeout=None, headers=None, tiktoken_enabled=True, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, retry_min_seconds=4, retry_max_seconds=20, http_client=None, http_async_client=None, check_embedding_ctx_length=True)

In [None]:
text = "This is a test document."

In [None]:
doc_result = embeddings.embed_documents([text])

# We convert the document into embedding vector

In [None]:
doc_result[0][:2]
# First 5 elements of 3072 dimensional embedding vector

[-0.014371714554727077, -0.027211420238018036]

In [None]:
len(doc_result[0])

3072

### 2. Convert the Each Chunk of The Split Document to Embedding Vectors
### 3. Storing of The Embedding Vectors to Vectorstore
### 4. Save the Vectorstore to Your Drive


In [None]:
from langchain_chroma import Chroma

index = Chroma.from_documents(documents=pdf_doc,
                              embedding=embeddings,
                              persist_directory="./vectorstore")  # persist_directory, saves in the directory

# Create a retriever object from our Chroma vector store (index).
retriever = index.as_retriever()  # for using agents #index.as_retriever(search_kwargs={"k": 4})

In [None]:
retriever = index.as_retriever(search_kwargs={"k": 4})

In [None]:
retriever.invoke("Can you explain me BERT relation with TPU?")


[Document(id='79bc2178-bfc3-4f61-aec2-413b8a3f4c00', metadata={'source': 'temp_downloaded.pdf', 'creator': 'LaTeX with hyperref package', 'subject': 'N19-1 2019', 'producer': 'pdfTeX-1.40.18', 'page': 2, 'author': 'Jacob Devlin ; Ming-Wei Chang ; Kenton Lee ; Kristina Toutanova', 'total_pages': 16, 'keywords': '', 'moddate': '2019-04-29T17:36:03+00:00', 'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding', 'creationdate': '2019-04-29T17:36:03+00:00'}, page_content='framework: pre-training and fine-tuning. Dur\x02ing pre-training, the model is trained on unlabeled\ndata over different pre-training tasks. For fine\x02tuning, the BERT model is first initialized with\nthe pre-trained parameters, and all of the param\x02eters are fine-tuned using labeled data from the\ndownstream tasks. Each downstream task has sep\x02arate fine-tuned models, even though they are ini\x02tialized with the same pre-trained parameters. The\nquestion-answering example in F

### Load Vectorstore(index) From Your Drive

In [None]:
# Initiates the Chroma class, which serves as our vector database.

loaded_index = Chroma(persist_directory="./vectorstore",
                      embedding_function=embeddings)

In [None]:
load_retriver = loaded_index.as_retriever(search_kwargs={"k": 4})

### Retrival the First 5 Chunks That Are Most Similar to The User Query from The Document

In [None]:
def retrieve_query(query, k=4):
    retriever = index.as_retriever(search_kwargs={"k": k})  # loaded_index
    return retriever.invoke(query)


In [None]:
our_query = "Can you explain me BERT relation with TPU?"

doc_search = retrieve_query(our_query, k=4)  # first two most similar texts are returned
doc_search

[Document(id='79bc2178-bfc3-4f61-aec2-413b8a3f4c00', metadata={'total_pages': 16, 'source': 'temp_downloaded.pdf', 'subject': 'N19-1 2019', 'producer': 'pdfTeX-1.40.18', 'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding', 'keywords': '', 'moddate': '2019-04-29T17:36:03+00:00', 'author': 'Jacob Devlin ; Ming-Wei Chang ; Kenton Lee ; Kristina Toutanova', 'creator': 'LaTeX with hyperref package', 'page': 2, 'creationdate': '2019-04-29T17:36:03+00:00'}, page_content='framework: pre-training and fine-tuning. Dur\x02ing pre-training, the model is trained on unlabeled\ndata over different pre-training tasks. For fine\x02tuning, the BERT model is first initialized with\nthe pre-trained parameters, and all of the param\x02eters are fine-tuned using labeled data from the\ndownstream tasks. Each downstream task has sep\x02arate fine-tuned models, even though they are ini\x02tialized with the same pre-trained parameters. The\nquestion-answering example in F

In [None]:
# from IPython.display import Markdown

#Markdown(doc_search.page_content)

### Generating an Answer Based on The Similar Chunks

In [None]:
from langchain.prompts import PromptTemplate, ChatPromptTemplate

template="""Use the following pieces of context to answer the user's question of "{question}".
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
"{context}" """

prompt_template = PromptTemplate(
    input_variables=['question','context'],
    template=template
    )

In [None]:
our_query = "Can you explain me BERT relation with TPU?"

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

llm=ChatOpenAI(model_name="gpt-4o-mini",
               temperature=0,
               top_p=1)

chain = prompt_template | llm | StrOutputParser()

output= chain.invoke({"question":our_query, "context":doc_search})  # first four most similar texts are returned
output

'BERT (Bidirectional Encoder Representations from Transformers) is a model architecture that utilizes a multi-layer bidirectional Transformer encoder. The training of BERTBASE and BERTLARGE models was performed on Cloud TPUs (Tensor Processing Units). Specifically, BERTBASE was trained on 4 Cloud TPUs (16 TPU chips total), while BERTLARGE was trained on 16 Cloud TPUs (64 TPU chips total). The use of TPUs significantly accelerated the training process, allowing the researchers to complete the pre-training in just 4 days. \n\nIn summary, the relationship between BERT and TPU is that TPUs were used as the hardware for training the BERT models, enabling efficient processing and faster training times.'

In [None]:
from IPython.display import Markdown

Markdown(output)

BERT (Bidirectional Encoder Representations from Transformers) is a model architecture that utilizes a multi-layer bidirectional Transformer encoder. The training of BERTBASE and BERTLARGE models was performed on Cloud TPUs (Tensor Processing Units). Specifically, BERTBASE was trained on 4 Cloud TPUs (16 TPU chips total), while BERTLARGE was trained on 16 Cloud TPUs (64 TPU chips total). The use of TPUs significantly accelerated the training process, allowing the researchers to complete the pre-training in just 4 days. 

In summary, the relationship between BERT and TPU is that TPUs were used as the hardware for training the BERT models, enabling efficient processing and faster training times.

In [None]:
#This function represents the complete workflow from receiving a user's question to generating a final, coherent answer.
def get_answers(query, k=4):
    from langchain_openai import ChatOpenAI
    from langchain_core.output_parsers import StrOutputParser
    from langchain.prompts import PromptTemplate
    from IPython.display import Markdown

    doc_search=retrieve_query(query, k=k)  # most similar texts are returned


    template="""Use the following pieces of context to answer the user's question of {question}.
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    ----------------
    {context}"""

    prompt_template = PromptTemplate(input_variables =['question', 'context'],
                                     template=template)


    llm=ChatOpenAI(model_name="gpt-4o-mini",
                   temperature=0,
                   top_p=1)

    chain = prompt_template | llm | StrOutputParser()

    output= chain.invoke({"question":query, "context":doc_search}) # first four most similar texts are returned
    return Markdown(output)

In [None]:
our_query = "Can you explain me BERT relation with TPU?"

answer = get_answers(our_query, k=5)
answer

BERT (Bidirectional Encoder Representations from Transformers) is a model architecture designed for natural language processing tasks, and it was trained using Cloud TPUs (Tensor Processing Units). Specifically, the training of BERTBASE was performed on 4 Cloud TPUs in a Pod configuration, totaling 16 TPU chips, while BERTLARGE was trained on 16 Cloud TPUs, amounting to 64 TPU chips. The use of TPUs allowed for efficient training of the model, which took 4 days to complete for each version. The document does not provide further details on the specific relationship between BERT and TPUs beyond their use in the training process.

In [None]:
our_query = "Can you explain me BERT relation with GPU?"

answer = get_answers(our_query, k=8)
answer

I don't know.

### Pipeline For RAG

In [None]:
# #This is a crucial step if you plan on using agents that rely on external search tools.
# #Here it is not used.

# import os
# from google.colab import userdata

# os.environ["TAVILY_API_KEY"] = userdata.get('TAVILY_API_KEY')

In [None]:
# #The following code snippet is setting up an external search tool for a LangChain agent.
# from langchain_community.tools.tavily_search import TavilySearchResults

# search_tool = TavilySearchResults(max_results=1,
#                                   search_depth="basic", #advanced
#                                   include_answer=True,
#                                   include_raw_content=True
#                                   )

In [None]:
from langchain_chroma import Chroma

# index=Chroma.from_documents(documents=pdf_doc,
#                            embedding=embeddings,
#                            persist_directory="./vectorstore") # persist_directory, saves in the directory

retriever_new = index.as_retriever(search_kwargs={"k": 5})

In [None]:
from langchain.tools.retriever import create_retriever_tool

retriever_tool = create_retriever_tool(retriever=retriever_new,  # This specifies the retriever object that the tool will use.
                                       # This gives the tool a unique and descriptive name.
                                       name="BERT_doc_inhalt_search",
                                       # This description tells the agent when it should use this specific tool.
                                       description="Search for information about BERT . For any questions about BERT. \
                                                    For any question about article of 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding',\
                                                    you must use this tool!.If you don't know the answer, just say that you don't know, don't try to make up an answer.")

In [None]:
retriever_tool.name

'BERT_doc_inhalt_search'

In [None]:
!pip install -qU langgraph

In [None]:
user_input = "Can you explain me BERT relation with TPU?"

In [None]:
from langchain_core.messages import SystemMessage, HumanMessage
from langgraph.prebuilt import create_react_agent

# Instantiate the gpt-4o-mini language model, which will act as the "brain" of our agent.
llm = ChatOpenAI(temperature=0.0,
                 model="gpt-4o-mini",
                 top_p=1)

# Define a system-level prompt that guides the agent's behavior.
prompt = "Make sure to use the BERT_doc_inhalt_search tool for questions, \
and If you don't know the answer, just say that you don't know, \
don't try to make up an answer."# the tavily_search_results_json tool for questions you don't know about."

# Create a list of the tools that the agent has access to.
tools = [retriever_tool]  # [retriever_tool, search_tool]

# Bind the list of tools to the LLM, making the LLM aware of what tools it can use and how to call them.
model_with_tools = llm.bind_tools(tools)

# Create the actual agent executor.
agent_executor = create_react_agent(model_with_tools, tools)  # system mesajı kullanıyorsan state_modifier tanımlamana gerek yok

# This is where the agent is actually invoked with a user's question (user_input).
# The agent will receive the question, decide whether to use one of its tools or generate a direct response, and then return the result.
response = agent_executor.invoke({"messages": [HumanMessage(content=user_input)]})  # SystemMessage(content=prompt),

response["messages"]

[HumanMessage(content='Can you explain me BERT relation with TPU?', additional_kwargs={}, response_metadata={}, id='8c4d2e6a-b46d-405c-88ad-ca2d86152c17'),
 AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_o2Qg78hzhFaVi2qVsBNcVNCI', 'function': {'arguments': '{"query":"BERT TPU"}', 'name': 'BERT_doc_inhalt_search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 20, 'prompt_tokens': 125, 'total_tokens': 145, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_51db84afab', 'id': 'chatcmpl-CBLFUVKeCvX0rCvbsalZlKUitbsQN', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--c337885b-1e7d-417c-b6b7-a48eca97c5eb-0', tool_calls=[{'name': 'BERT_doc_inhalt_search', 'args': {'qu

In [None]:
for chunk in agent_executor.stream(
    {"messages": [("human", user_input)]}
):
    print(chunk)
    print("----")

{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_fyHR77jcs78QGrNZUFgVyuXc', 'function': {'arguments': '{"query":"BERT TPU"}', 'name': 'BERT_doc_inhalt_search'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 20, 'prompt_tokens': 125, 'total_tokens': 145, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_62a23a81ef', 'id': 'chatcmpl-CBLFhWgU6LYgqvWmdGwgeltbZVhqV', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--130c31fd-f2d7-4562-93fa-e0b755546a41-0', tool_calls=[{'name': 'BERT_doc_inhalt_search', 'args': {'query': 'BERT TPU'}, 'id': 'call_fyHR77jcs78QGrNZUFgVyuXc', 'type': 'tool_call'}], usage_metadata={'input_tokens': 125, 'output_tokens'

In [None]:
response["messages"][-1].content

"BERT (Bidirectional Encoder Representations from Transformers) is a powerful model for natural language processing that was trained using Cloud TPUs (Tensor Processing Units). Specifically, the training of BERTBASE was performed on 4 Cloud TPUs in a Pod configuration, which consists of 16 TPU chips in total. The larger model, BERTLARGE, was trained on 16 Cloud TPUs, totaling 64 TPU chips. Each pre-training session took about 4 days to complete.\n\nThe use of TPUs is significant because they are designed to accelerate machine learning workloads, particularly those involving large-scale neural networks like BERT. The training process involves handling large amounts of data and performing complex computations, which TPUs are optimized for, allowing for faster training times compared to traditional GPUs.\n\nIn summary, BERT's relationship with TPUs lies in the efficient training of the model on these specialized hardware units, which enables the handling of extensive datasets and complex 