___

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" alt="CLRSWY"></p>

___

# WELCOME

This notebook will guide you through two increasingly significant applications in the realm of Generative AI: RAG (Retrieval Augmented Generation) chatbots and text summarization for big text.

Through two distinct projects, you will explore these technologies and enhance your skills. Detailed descriptions of the projects are provided below.

## Project 1: Building a Chatbot with a PDF Document (RAG)

In this project, you will develop a chatbot using a provided PDF document from web page. You will utilize the Langchain framework along with a large language model (LLM) such as GPT or Gemini. The chatbot will leverage the Retrieval Augmented Generation (RAG) technique to comprehend the document's content and respond to user queries effectively.

### **Project Steps:**

- **1.PDF Document Upload:** Upload the provided PDF document from web page (https://aclanthology.org/N19-1423.pdf) (BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding).

- **2.Chunking:** Divide the uploaded PDF document into smaller segments (chunks). This facilitates more efficient information processing by the LLM.

- **3.ChromaDB Setup:**
  - Save ChromaDB to your Google Drive.

  - Retrieve ChromaDB from your Drive to begin using it in your project.

  - ChromaDB serves as a vector database to store embedding vectors generated from your document.

- **4.Embedding Vectors Creation:**
  - Convert the chunked document into embedding vectors. You can use either GPT or Gemini embedding models for this purpose.

  - If you choose the Gemini embedding model, set "task_type" to "retrieval_document" when converting the chunked document.

- **5.Chatbot Development:**
  - Utilize the **load_qa_chain** function from the Langchain library to build the chatbot.

  - This function will interpret user queries, retrieve relevant information from **ChromaDB**, and generate responses accordingly.



### Install Libraries

In [1]:
!pip install -qU langchain-google-community

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m966.9 kB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m25.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m396.2/396.2 kB[0m [31m21.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m40.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m150.7/150.7 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
!pip install -qU langchain-community

In [3]:
!pip install -qU langchain-openai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/52.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.0/52.0 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/365.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m365.7/365.7 kB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m28.6 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/318.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m318.9/318.9 kB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
!pip install -qU langchain-chroma

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m559.5/559.5 kB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m34.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m93.5/93.5 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m273.8/273.8 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m16.1 MB/s[0m eta [36m0:00:00

In [5]:
!pip install -qU pypdfium2

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.5/48.5 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.8/2.8 MB[0m [31m26.1 MB/s[0m eta [36m0:00:00[0m
[?25h

### Access Google Drive

In [8]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Entering Your OpenAI or Google Gemini API Key.

In [11]:
import os
from google.colab import userdata
os.environ['OPENAI_API_KEY']=userdata.get('OPENAI_API_KEY')

### Loading PDF Document

In [8]:
# create a pdf reader function
from langchain_community.document_loaders import PyPDFium2Loader

def read_doc(directory):
    file_loader=PyPDFium2Loader(directory)
    pdf_documents=file_loader.load() # PyPDFium2Loader reads page by page
    return pdf_documents

In [9]:
pdf=read_doc('/content/drive/MyDrive/Colab Notebooks/Capstone_NLP/N19-1423.pdf')
len(pdf)



16

### Document Splitter

In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter


def chunk_data(docs, chunk_size=1000, chunk_overlap=200):
    text_splitter=RecursiveCharacterTextSplitter(chunk_size=chunk_size,
                                                 chunk_overlap=chunk_overlap)
    pdf=text_splitter.split_documents(docs)
    return pdf


In [11]:
pdf_doc=chunk_data(docs=pdf)
len(pdf_doc)

84

### 1. Creating A Embedding Model
### 2. Convert the Each Chunk of The Split Document to Embedding Vectors
### 3. Storing of The Embedding Vectors to Vectorstore
### 4. Save the Vectorstore to Your Drive

In [12]:
from langchain_openai import OpenAIEmbeddings

embeddings=OpenAIEmbeddings(model="text-embedding-3-large",
                            dimensions=3072) #dimensions=256, 1024, 3072
embeddings

OpenAIEmbeddings(client=<openai.resources.embeddings.Embeddings object at 0x7bd12c1e5d80>, async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x7bd12c20af50>, model='text-embedding-3-large', dimensions=3072, deployment='text-embedding-ada-002', openai_api_version='', openai_api_base=None, openai_api_type='', openai_proxy='', embedding_ctx_length=8191, openai_api_key=SecretStr('**********'), openai_organization=None, allowed_special=None, disallowed_special=None, chunk_size=1000, max_retries=2, request_timeout=None, headers=None, tiktoken_enabled=True, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, retry_min_seconds=4, retry_max_seconds=20, http_client=None, http_async_client=None, check_embedding_ctx_length=True)

In [16]:
from langchain_chroma import Chroma

index=Chroma.from_documents(documents=pdf_doc,
                            embedding=embeddings,
                            persist_directory="/content/drive/MyDrive/Colab Notebooks/Capstone_NLP/vectorstore") # persist_directory, saves in the directory

retriever=index.as_retriever()

In [17]:
retriever=index.as_retriever(search_kwargs={"k": 5})

In [18]:
retriever.invoke("What is BERT?")

[Document(metadata={'page': 0, 'source': '/content/drive/MyDrive/Colab Notebooks/Capstone_NLP/N19-1423.pdf'}, page_content='to create state-of-the-art models for a wide\r\nrange of tasks, such as question answering and\r\nlanguage inference, without substantial task\x02specific architecture modifications.\r\nBERT is conceptually simple and empirically\r\npowerful. It obtains new state-of-the-art re\x02sults on eleven natural language processing\r\ntasks, including pushing the GLUE score to\r\n80.5% (7.7% point absolute improvement),\r\nMultiNLI accuracy to 86.7% (4.6% absolute\r\nimprovement), SQuAD v1.1 question answer\x02ing Test F1 to 93.2 (1.5 point absolute im\x02provement) and SQuAD v2.0 Test F1 to 83.1\r\n(5.1 point absolute improvement).\r\n1 Introduction\r\nLanguage model pre-training has been shown to\r\nbe effective for improving many natural language\r\nprocessing tasks (Dai and Le, 2015; Peters et al.,\r\n2018a; Radford et al., 2018; Howard and Ruder,\r\n2018). These inclu

### Load Vectorstore(index) From Your Drive

In [19]:
loaded_index=Chroma(persist_directory="/content/drive/MyDrive/Colab Notebooks/Capstone_NLP/vectorstore",
                    embedding_function=embeddings)

### Retrival the First 5 Chunks That Are Most Similar to The User Query from The Document

In [23]:
def retrieve_query(query,k=5):
    retriever=index.as_retriever(search_kwargs={"k": k}) #loaded_index
    return retriever.invoke(query)


In [25]:
our_query = "What is fine-tuning?"

doc_search=retrieve_query(our_query, k=5) # first two most similar texts are returned
doc_search

[Document(metadata={'page': 4, 'source': '/content/drive/MyDrive/Colab Notebooks/Capstone_NLP/N19-1423.pdf'}, page_content='answering, and the [CLS] representation is fed\r\ninto an output layer for classification, such as en\x02tailment or sentiment analysis.\r\nCompared to pre-training, fine-tuning is rela\x02tively inexpensive. All of the results in the pa\x02per can be replicated in at most 1 hour on a sin\x02gle Cloud TPU, or a few hours on a GPU, starting\r\nfrom the exact same pre-trained model.7 We de\x02scribe the task-specific details in the correspond\x02ing subsections of Section 4. More details can be\r\nfound in Appendix A.5.\r\n4 Experiments\r\nIn this section, we present BERT fine-tuning re\x02sults on 11 NLP tasks.\r\n4.1 GLUE\r\nThe General Language Understanding Evaluation\r\n(GLUE) benchmark (Wang et al., 2018a) is a col\x02lection of diverse natural language understanding\r\ntasks. Detailed descriptions of GLUE datasets are\r\nincluded in Appendix B.1.\r\nTo fine-t

### Generating an Answer Based on The Similar Chunks

In [26]:
from langchain.prompts import PromptTemplate

template="""Use the following pieces of context to answer the user's question of {question}.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
{context}"""

prompt_template = PromptTemplate(
    input_variables =['question','context'],
    template = template
)

In [27]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

llm=ChatOpenAI(model_name="gpt-4o-mini",
               temperature=0,
               top_p=1)

chain = prompt_template | llm | StrOutputParser()

output= chain.invoke({"question":our_query, "context":doc_search}) # first four most similar texts are returned
output

"Fine-tuning is a process in machine learning where a pre-trained model is further trained on a specific task or dataset to improve its performance on that task. It is relatively inexpensive compared to pre-training and typically involves adjusting a few parameters, such as the classification layer weights, while keeping most of the model's hyperparameters the same as in pre-training. Fine-tuning allows the model to adapt to the nuances of the new task while leveraging the knowledge it gained during pre-training."

In [28]:
from IPython.display import Markdown

Markdown(output)

Fine-tuning is a process in machine learning where a pre-trained model is further trained on a specific task or dataset to improve its performance on that task. It is relatively inexpensive compared to pre-training and typically involves adjusting a few parameters, such as the classification layer weights, while keeping most of the model's hyperparameters the same as in pre-training. Fine-tuning allows the model to adapt to the nuances of the new task while leveraging the knowledge it gained during pre-training.

### Pipeline For RAG (If you want, you can use the gemini-1.5-pro model)

In [29]:
def get_answers(query):
    from langchain_openai import ChatOpenAI
    from langchain_core.output_parsers import StrOutputParser
    from langchain.prompts import PromptTemplate
    from IPython.display import Markdown

    doc_search=retrieve_query(query) # most similar texts are returned


    template="""Use the following pieces of context to answer the user's question of {question}.
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    ----------------
    {context}"""

    prompt_template = PromptTemplate(
    input_variables =['question','context'],
    template = template)


    llm=ChatOpenAI(model_name="gpt-4o-mini",
                  temperature=0,
                  top_p=1)

    chain = prompt_template | llm | StrOutputParser()

    output= chain.invoke({"question":query, "context":doc_search}) # first four most similar texts are returned
    return Markdown(output)

In [30]:
our_query = "What is BERT?"
answer = get_answers(our_query)
answer

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a language representation model introduced by Google AI Language. It is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. This allows BERT to achieve state-of-the-art results on various natural language processing tasks, such as question answering and language inference, without requiring substantial task-specific architecture modifications. BERT uses a "masked language model" pre-training objective, which involves randomly masking some tokens in the input and predicting the original tokens. Its architecture is based on a multi-layer bidirectional Transformer encoder.

In [31]:
our_query = "What is fine-tuning?"
answer = get_answers(our_query)
answer

Fine-tuning is a process in machine learning where a pre-trained model is further trained on a specific task or dataset. This involves adjusting the model's parameters to improve its performance on that particular task, often by adding a task-specific output layer. Fine-tuning is generally less resource-intensive than the initial pre-training phase and can be completed relatively quickly, allowing the model to adapt to new tasks while leveraging the knowledge it gained during pre-training.

## Project 2: Generating PDF Document Summaries

In this project, you will explore various methods for creating summaries from the provided PDF document. You will experiment with different chaining functions offered by the Langchain library to achieve this.

### **Project Steps:**
- **1.PDF Document Upload and Chunking:** As in the first project, upload the PDF document and divide it into smaller chunks. Consider splitting it by half-page or page.

- **2.Summarization Techniques:**

  - **Summary of the First 5 Pages (Stuff Chain):** Utilize the load_summarize_chain function with the parameter chain_type="stuff" to generate a concise summary of the first 5 pages of the PDF document.

  - **Short Summary of the Entire Document (Map Reduce Chain):** Employ chain_type="map_reduce" and refine parameters to create a brief summary of the entire document. This method generates individual summaries for each chunk and then combines them into a final summary.

  - **Detailed Summary with Bullet Points (Map Reduce Chain):** Use chain_type="map_reduce" to generate a detailed summary with at least 1000 tokens. Provide the LLM with the prompt "Summarize with 1000 tokens" and set the max_token parameter to a value greater than 1000. Add a title to the summary and present key points using bullet points.

### Important Notes:

- Models like GPT-4 and Gemini Pro models might excel in generating summaries based on token count. Consider prioritizing these models.

- For comprehensive information on Langchain and LLMs, refer to their respective documentation.
Best of luck!

### Install Libraries

In [2]:
!pip install -qU langchain-openai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.0/52.0 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m396.2/396.2 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m365.7/365.7 kB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m36.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m318.9/318.9 kB[0m [31m19.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [3]:
!pip install -qU langchain-community

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m26.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m38.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.3/49.3 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
!pip install -qU pypdfium2

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.5/48.5 kB[0m [31m526.7 kB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.8/2.8 MB[0m [31m30.6 MB/s[0m eta [36m0:00:00[0m
[?25h

### Loading PDF Document

In [5]:
from langchain_community.document_loaders import PyPDFium2Loader

def read_doc(directory):
    file_loader=PyPDFium2Loader(directory)
    pdf_documents=file_loader.load()
    return pdf_documents

In [9]:
pdf=read_doc('/content/drive/MyDrive/Colab Notebooks/Capstone_NLP/N19-1423.pdf')
len(pdf)



16

### Summarizing the First 5 Pages of The Document With Chain_Type of The 'stuff'

In [12]:
from langchain_openai import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain

llm = ChatOpenAI(temperature=0,
                 model_name='gpt-4o-mini',
                 max_tokens=1024)

In [13]:
chain = load_summarize_chain(
    llm,
    chain_type='stuff'
)
output_summary = chain.invoke(pdf[0:5])['output_text']

In [14]:
from IPython.display import Markdown
Markdown(output_summary)

The paper introduces BERT (Bidirectional Encoder Representations from Transformers), a novel language representation model developed by Google AI Language. BERT pre-trains deep bidirectional representations from unlabeled text by jointly considering both left and right contexts, overcoming limitations of previous unidirectional models. It employs a masked language model (MLM) and a next sentence prediction (NSP) task during pre-training, allowing it to achieve state-of-the-art results on eleven natural language processing tasks, including question answering and language inference. BERT's architecture is unified across tasks, requiring minimal task-specific modifications during fine-tuning. The model demonstrates significant improvements over existing approaches, achieving notable performance metrics on benchmarks like GLUE and SQuAD. The code and pre-trained models are publicly available for further research and application.

### Document Splitter

In [25]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter


def chunk_data(docs, chunk_size=10000, chunk_overlap=200):
    text_splitter=RecursiveCharacterTextSplitter(chunk_size=chunk_size,
                                                 chunk_overlap=chunk_overlap)
    chunks=text_splitter.split_documents(docs)
    return chunks

In [26]:
chunks=chunk_data(docs=pdf)

In [27]:
len(chunks)

16

### Make A Brief Summary of The Entire Document With Chain_Types of "map_reduce" and "refine"

map_reduce

---



In [28]:

llm = ChatOpenAI(temperature=0,
                 model_name='gpt-4o-mini',
                 max_tokens=1024)

In [29]:
chain = load_summarize_chain(llm,
                             chain_type="map_reduce")


output_summary = chain.invoke(chunks)["output_text"]
Markdown(output_summary)

The paper introduces BERT (Bidirectional Encoder Representations from Transformers), a novel language representation model that pre-trains deep bidirectional representations from unlabeled text by conditioning on both left and right context in all layers. BERT can be fine-tuned with an additional output layer to achieve state-of-the-art results across various NLP tasks, such as question answering and language inference, without significant task-specific modifications. BERT's effectiveness is demonstrated by achieving new benchmarks on eleven NLP tasks, including substantial improvements in GLUE score, MultiNLI accuracy, and SQuAD question answering metrics. The model employs a masked language model pre-training objective, enhancing its ability to incorporate context from both directions.

BERT's architecture is a multi-layer bidirectional Transformer encoder, which allows it to consider context from both directions. It uses special tokens ([CLS] and [SEP]) and WordPiece embeddings with a 30,000 token vocabulary. Pre-training involves masked language modeling and next sentence prediction tasks, while fine-tuning adapts the pre-trained model to specific tasks using labeled data. BERT comes in two sizes, BERTBASE and BERTLARGE, differing in the number of layers, hidden size, and attention heads.

The paper highlights BERT's superior performance on various benchmarks, including GLUE, SQuAD, and SWAG, significantly outperforming previous models like OpenAI GPT and ELMo. The study underscores the importance of deep bidirectionality and sufficient pre-training for achieving high performance in NLP tasks. BERT's code and pre-trained models are available online, facilitating further research and application in the field.

refine

---



In [30]:
chain = load_summarize_chain(llm,
                             chain_type="refine")

output_summary = chain.invoke(chunks)["output_text"]
Markdown(output_summary)

The paper introduces BERT (Bidirectional Encoder Representations from Transformers), a novel language representation model designed to pre-train deep bidirectional representations from unlabeled text by conditioning on both left and right context in all layers. Unlike previous models, BERT can be fine-tuned with minimal task-specific modifications to achieve state-of-the-art results across various natural language processing tasks, including question answering and language inference. BERT's effectiveness is demonstrated by its superior performance on eleven NLP tasks, significantly improving benchmarks like the GLUE score, MultiNLI accuracy, and SQuAD question answering tests. The model addresses limitations of prior unidirectional approaches by employing a masked language model (MLM) pre-training objective, enhancing its ability to incorporate context from both directions. Additionally, BERT uses a "next sentence prediction" task to jointly pre-train text-pair representations, further improving its performance. The paper highlights the importance of bidirectional pre-training and shows that BERT reduces the need for heavily-engineered task-specific architectures, setting new standards in NLP. The code and pre-trained models are available at https://github.com/google-research/bert.

BERT's architecture is a multi-layer bidirectional Transformer encoder, based on the original implementation described by Vaswani et al. (2017). The model comes in two sizes: BERTBASE with 12 layers, 768 hidden units, and 12 self-attention heads, totaling 110 million parameters, and BERTLARGE with 24 layers, 1024 hidden units, and 16 self-attention heads, totaling 340 million parameters. The bidirectional self-attention mechanism in BERT allows it to attend to both left and right context, unlike the unidirectional self-attention used in models like OpenAI's GPT. This unified architecture across different tasks minimizes the differences between the pre-trained and fine-tuned models, making BERT highly versatile and effective for a wide range of NLP applications.

To handle a variety of downstream tasks, BERT's input representation can unambiguously represent both single sentences and pairs of sentences in one token sequence. The model uses WordPiece embeddings with a 30,000 token vocabulary. Each sequence starts with a special classification token ([CLS]), and sentence pairs are separated by a special token ([SEP]). A learned embedding is added to each token to indicate sentence membership. BERT is pre-trained using two unsupervised tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP). MLM involves masking a percentage of input tokens at random and predicting them, while NSP involves predicting whether one sentence follows another in a text. These tasks enable BERT to understand context and relationships between sentences, enhancing its performance on tasks like QA and NLI.

The pre-training data for BERT includes the BooksCorpus (800M words) and English Wikipedia (2,500M words), focusing on document-level corpora to extract long contiguous sequences. Fine-tuning BERT is straightforward due to the self-attention mechanism, which allows the model to handle various downstream tasks by simply swapping out the appropriate inputs and outputs. For text pair tasks, BERT uses self-attention to encode concatenated text pairs, effectively including bidirectional cross attention. Fine-tuning is relatively inexpensive, with all results replicable in at most 1 hour on a single Cloud TPU or a few hours on a GPU. The paper presents fine-tuning results on 11 NLP tasks, including the GLUE benchmark, demonstrating BERT's superior performance.

BERTBASE and BERTLARGE outperform previous models on all GLUE tasks by a substantial margin, with BERTLARGE achieving an average accuracy improvement of 7.0% over the prior state of the art. BERT also excels in the SQuAD v1.1 question answering task, outperforming top leaderboard systems. Fine-tuning BERT on small datasets can be unstable, but using random restarts and data shuffling helps achieve the best performance. Overall, BERT sets new benchmarks in NLP, demonstrating the effectiveness of bidirectional pre-training and the versatility of its architecture.

BERT's performance on the SQuAD 1.1 and SQuAD 2.0 datasets further underscores its capabilities. In SQuAD 1.1, BERTLARGE (Single) achieves an F1 score of 90.9, while BERTLARGE (Ensemble) reaches 91.8, surpassing previous top systems. For SQuAD 2.0, BERTLARGE (Single) achieves an F1 score of 83.1, outperforming the previous best system by 5.1 F1 points. Additionally, BERTLARGE outperforms other models on the SWAG dataset, achieving an accuracy of 86.3 on the test set, which is 8.3% higher than OpenAI GPT. These results highlight BERT's robustness and adaptability across different NLP tasks.

Ablation studies reveal the importance of BERT's pre-training tasks. Removing the Next Sentence Prediction task

### Generate A Detailed Summary of The Entire Document With At Least 1000 Tokens. Also, Add A Title To The Summary And Present Key Points Using Bullet Points With Chain_Type of "map_reduce".

In [31]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='map_reduce'
)
chain

MapReduceDocumentsChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['text'], template='Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'), llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x7b079c23b520>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x7b079c273550>, root_client=<openai.OpenAI object at 0x7b079c23a620>, root_async_client=<openai.AsyncOpenAI object at 0x7b079c2397e0>, model_name='gpt-4o', temperature=0.0, openai_api_key=SecretStr('**********'), openai_proxy='', max_tokens=1024)), reduce_documents_chain=ReduceDocumentsChain(combine_documents_chain=StuffDocumentsChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['text'], template='Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'), llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x7b079c23b520>, async_client=<openai.resources.chat.completions.AsyncCo

In [35]:
from langchain import PromptTemplate

chunks_prompt="""
Please summarize the below text:
text:'{text}'
summary:
"""
map_prompt_template=PromptTemplate(input_variables=['text'],
                                   template=chunks_prompt)

In [36]:
from langchain import PromptTemplate

final_combine_prompt="""
Provide a final summary of the entire text with at least 1000 Tokens with important points.
Add a Generic  Title,
Start the precise summary with an introduction and provide Key Points Using Bullet Points
text: '{text}'
summary:
"""
final_combine_prompt_template=PromptTemplate(input_variables=['text'],
                                             template=final_combine_prompt)

In [37]:
chain = load_summarize_chain(
                            llm=llm,
                            chain_type='map_reduce',
                            map_prompt=map_prompt_template,
                            combine_prompt=final_combine_prompt_template
)
chain

MapReduceDocumentsChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['text'], template="\nPlease summarize the below text:\ntext:'{text}'\nsummary:\n"), llm=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x7b079c23b520>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x7b079c273550>, root_client=<openai.OpenAI object at 0x7b079c23a620>, root_async_client=<openai.AsyncOpenAI object at 0x7b079c2397e0>, model_name='gpt-4o', temperature=0.0, openai_api_key=SecretStr('**********'), openai_proxy='', max_tokens=1024)), reduce_documents_chain=ReduceDocumentsChain(combine_documents_chain=StuffDocumentsChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['text'], template="\nProvide a final summary of the entire text with at least 1000 Tokens with important points.\nAdd a Generic  Title,\nStart the precise summary with an introduction and provide Key Points Using Bullet Points\ntext: '{text}'\nsummary:\n"), llm=ChatO

In [38]:
output_summary = chain.invoke(chunks)["output_text"]
output_summary

"# Comprehensive Overview of BERT: Revolutionizing Language Representation Models\n\n## Introduction\nBERT (Bidirectional Encoder Representations from Transformers) is a transformative language representation model developed by Google AI Language. It has set new benchmarks in natural language processing (NLP) by leveraging bidirectional context in all layers, allowing it to achieve state-of-the-art results across various NLP tasks. This summary provides an in-depth overview of BERT's architecture, pre-training and fine-tuning processes, performance on benchmarks, and comparisons with other models. Additionally, it delves into the intricacies of BERT's pre-training process, examining the effects of the number of training steps and various masking procedures.\n\n## Key Points\n\n### BERT's Innovative Approach\n- **Bidirectional Context**: BERT captures context from both directions using a masked language model (MLM) objective, unlike previous models that used unidirectional or shallow bi

In [39]:
from IPython.display import Markdown

Markdown(output_summary)

# Comprehensive Overview of BERT: Revolutionizing Language Representation Models

## Introduction
BERT (Bidirectional Encoder Representations from Transformers) is a transformative language representation model developed by Google AI Language. It has set new benchmarks in natural language processing (NLP) by leveraging bidirectional context in all layers, allowing it to achieve state-of-the-art results across various NLP tasks. This summary provides an in-depth overview of BERT's architecture, pre-training and fine-tuning processes, performance on benchmarks, and comparisons with other models. Additionally, it delves into the intricacies of BERT's pre-training process, examining the effects of the number of training steps and various masking procedures.

## Key Points

### BERT's Innovative Approach
- **Bidirectional Context**: BERT captures context from both directions using a masked language model (MLM) objective, unlike previous models that used unidirectional or shallow bidirectional methods.
- **Next Sentence Prediction (NSP)**: BERT uses NSP to pre-train text-pair representations, enhancing its ability to understand sentence relationships and context.
- **Unified Architecture**: BERT employs a multi-layer bidirectional Transformer encoder, pre-trained on unlabeled data and fine-tuned on labeled data for various downstream tasks.

### Pre-Training and Fine-Tuning
- **Input Representations**: BERT can represent both single sentences and pairs of sentences in one token sequence using WordPiece embeddings with a 30,000 token vocabulary. Special tokens ([CLS] and [SEP]) are used for classification and sentence separation.
- **Pre-Training Tasks**: 
  - **Masked Language Modeling (MLM)**: Random tokens are masked and predicted, allowing for bidirectional context understanding.
  - **Next Sentence Prediction (NSP)**: The model predicts if one sentence follows another, aiding tasks like Question Answering and Natural Language Inference.
- **Fine-Tuning**: BERT is fine-tuned by adjusting inputs and outputs for specific tasks, with minimal differences between pre-trained and final downstream architectures. Fine-tuning is relatively quick, taking about an hour on a Cloud TPU.

### Performance on Benchmarks
- **GLUE Benchmark**:
  - BERTBASE and BERTLARGE outperform previous state-of-the-art models across all tasks.
  - BERTLARGE shows significant improvements, especially on tasks with limited training data, achieving an average accuracy improvement of 7.0% over prior state-of-the-art models.
  - On the MNLI task, BERT achieves a 4.6% absolute accuracy improvement.
  - BERTLARGE scores 80.5 on the official GLUE leaderboard, compared to OpenAI GPT's 72.8.
- **SQuAD v1.1**:
  - BERT models are fine-tuned using a specific method involving start and end vectors for answer span prediction.
  - BERT outperforms top leaderboard systems, achieving +1.5 F1 in ensembling and +1.3 F1 as a single system.
  - The single BERT model surpasses the top ensemble system in F1 score.
- **SQuAD 2.0 and SWAG**:
  - BERTLARGE (Single) achieved 78.7 EM and 81.9 F1 on the Dev set, and 80.0 EM and 83.1 F1 on the Test set, outperforming other systems that do not use BERT.
  - BERTLARGE achieved 86.6 accuracy on the Dev set and 86.3 on the Test set for SWAG, outperforming ESIM+ELMo and OpenAI GPT by significant margins.

### Impact of Pre-Training Tasks and Model Size
- **Pre-Training Tasks**:
  - Removing the NSP task slightly reduces performance on QNLI, MNLI, and SQuAD tasks.
  - Training a left-to-right (LTR) language model without NSP significantly degrades performance, especially on MRPC and SQuAD tasks.
  - The importance of bidirectional context is highlighted for tasks like QA, where token predictions benefit from both left and right context.
- **Model Size**:
  - Larger BERT models consistently show better accuracy across various tasks, even with small datasets like MRPC.
  - BERTBASE has 110M parameters and BERTLARGE has 340M parameters, with larger models leading to improvements in both large-scale and small-scale tasks.

### Comparisons with Other Models
- **BERT vs. OpenAI GPT and ELMo**:
  - BERT uses a bidirectional Transformer, OpenAI GPT uses a left-to-right Transformer, and ELMo uses concatenated left-to-right and right-to-left LSTMs.
  - BERT and OpenAI GPT are fine-tuning approaches, while ELMo is feature-based.
  - BERT's improvements are attributed to its bidirectionality and pre-training tasks.

### Fine-Tuning and Hyperparameters
- **Hyperparameters

# GPT modelini döküman dışına çıkmasın diye (döküman içerisinde geçen bilgi dışına) agent tool kullanılmadı.

___

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" alt="CLRSWY"></p>

___