___

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" alt="CLRSWY"></p>

___

# WELCOME

This notebook will guide you through two increasingly significant applications in the realm of Generative AI: RAG (Retrieval Augmented Generation) chatbots and text summarization for big text.

Through two distinct projects, you will explore these technologies and enhance your skills. Detailed descriptions of the projects are provided below.

## Project 1: Building a Chatbot with a PDF Document (RAG)

In this project, you will develop a chatbot using a provided PDF document from web page. You will utilize the Langchain framework along with a large language model (LLM) such as GPT or Gemini. The chatbot will leverage the Retrieval Augmented Generation (RAG) technique to comprehend the document's content and respond to user queries effectively.

### **Project Steps:**

- **1.PDF Document Upload:** Upload the provided PDF document from web page (https://aclanthology.org/N19-1423.pdf) (BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding).

- **2.Chunking:** Divide the uploaded PDF document into smaller segments (chunks). This facilitates more efficient information processing by the LLM.

- **3.ChromaDB Setup:**
  - Save ChromaDB to your Google Drive.

  - Retrieve ChromaDB from your Drive to begin using it in your project.

  - ChromaDB serves as a vector database to store embedding vectors generated from your document.

- **4.Embedding Vectors Creation:**
  - Convert the chunked document into embedding vectors. You can use either GPT or Gemini embedding models for this purpose.

  - If you choose the Gemini embedding model, set "task_type" to "retrieval_document" when converting the chunked document.

- **5.Chatbot Development:**
  - Utilize the **load_qa_chain** function from the Langchain library to build the chatbot.

  - This function will interpret user queries, retrieve relevant information from **ChromaDB**, and generate responses accordingly.



### Install Libraries

In [1]:
!pip install -qU langchain-google-community

In [2]:
!pip install -qU langchain-community

In [3]:
!pip install -qU langchain-openai

In [4]:
!pip install -qU langchain-chroma

In [5]:
!pip install -qU pypdfium2

### Access Google Drive

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Entering Your OpenAI or Google Gemini API Key.

In [7]:
import os
from google.colab import userdata
os.environ['OPENAI_API_KEY']=userdata.get('OPENAI_API_KEY')

### Loading PDF Document

In [8]:
# create a pdf reader function
from langchain_community.document_loaders import PyPDFium2Loader

def read_doc(directory):
    file_loader=PyPDFium2Loader(directory)
    pdf_documents=file_loader.load() # PyPDFium2Loader reads page by page
    return pdf_documents

In [9]:
pdf=read_doc('/content/N19-1423_BERT.pdf')
len(pdf)

# The document consists of 16 pages



16

In [10]:
pdf

[Document(metadata={'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding', 'author': 'Jacob Devlin ; Ming-Wei Chang ; Kenton Lee ; Kristina Toutanova', 'subject': 'N19-1 2019', 'keywords': '', 'creator': 'LaTeX with hyperref package', 'producer': 'pdfTeX-1.40.18', 'creationdate': '2019-04-29T17:36:03+00:00', 'moddate': '2019-04-29T17:36:03+00:00', 'source': '/content/N19-1423_BERT.pdf', 'total_pages': 16, 'page': 0}, page_content='Proceedings of NAACL-HLT 2019, pages 4171–4186\nMinneapolis, Minnesota, June 2 - June 7, 2019. \nc 2019 Association for Computational Linguistics\n4171\nBERT: Pre-training of Deep Bidirectional Transformers for\nLanguage Understanding\nJacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova\nGoogle AI Language\n{jacobdevlin,mingweichang,kentonl,kristout}@google.com\nAbstract\nWe introduce a new language representa\x02tion model called BERT, which stands for\nBidirectional Encoder Representations from\nTransformers. Unli

### Document Splitter

In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter


def chunk_data(docs, chunk_size=1000, chunk_overlap=200):
    text_splitter=RecursiveCharacterTextSplitter(chunk_size=chunk_size,
                                                 chunk_overlap=chunk_overlap)
    pdf=text_splitter.split_documents(docs)
    return pdf

In [12]:
pdf_doc=chunk_data(docs=pdf)
len(pdf_doc) # 83 chunks in total

83

### 1. Creating A Embedding Model
### 2. Convert the Each Chunk of The Split Document to Embedding Vectors
### 3. Storing of The Embedding Vectors to Vectorstore
### 4. Save the Vectorstore to Your Drive

In [13]:
from langchain_openai import OpenAIEmbeddings

embeddings=OpenAIEmbeddings(model="text-embedding-3-large",
                            dimensions=3072) #dimensions=256, 1024, 3072
embeddings

OpenAIEmbeddings(client=<openai.resources.embeddings.Embeddings object at 0x7ae8d12bcad0>, async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x7ae8d12d57d0>, model='text-embedding-3-large', dimensions=3072, deployment='text-embedding-ada-002', openai_api_version=None, openai_api_base=None, openai_api_type=None, openai_proxy=None, embedding_ctx_length=8191, openai_api_key=SecretStr('**********'), openai_organization=None, allowed_special=None, disallowed_special=None, chunk_size=1000, max_retries=2, request_timeout=None, headers=None, tiktoken_enabled=True, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, retry_min_seconds=4, retry_max_seconds=20, http_client=None, http_async_client=None, check_embedding_ctx_length=True)

In [14]:
text = "This is a test document."

In [15]:
doc_result = embeddings.embed_documents([text])

In [16]:
doc_result[0][:5]

[-0.014380057342350483,
 -0.027191713452339172,
 -0.020042717456817627,
 0.05730138346552849,
 -0.02226766012609005]

In [17]:
len(doc_result[0])

3072

### Load Vectorstore(index) From Your Drive

In [18]:
from langchain_chroma import Chroma

index=Chroma.from_documents(documents=pdf_doc,
                            embedding=embeddings,
                            persist_directory="./vectorstore") # persist_directory, saves in the directory

retriever=index.as_retriever()

In [19]:
retriever=index.as_retriever(search_kwargs={"k": 4})

In [20]:
loaded_index=Chroma(persist_directory="./vectorstore",
                    embedding_function=embeddings)

In [21]:
load_retriver=loaded_index.as_retriever(search_kwargs={"k": 4})

### Retrival the First 5 Chunks That Are Most Similar to The User Query from The Document

In [22]:
def retrieve_query(query,k=5):
    retriever=index.as_retriever(search_kwargs={"k": k}) #loaded_index
    return retriever.invoke(query)


In [23]:
our_query = "What are transformers?"

doc_search=retrieve_query(our_query, k=5) # first two most similar texts are returned
doc_search

[Document(id='9a55ec59-047b-433d-81c0-c1d135caa4c1', metadata={'author': 'Jacob Devlin ; Ming-Wei Chang ; Kenton Lee ; Kristina Toutanova', 'creationdate': '2019-04-29T17:36:03+00:00', 'creator': 'LaTeX with hyperref package', 'keywords': '', 'moddate': '2019-04-29T17:36:03+00:00', 'page': 2, 'producer': 'pdfTeX-1.40.18', 'source': '/content/N19-1423_BERT.pdf', 'subject': 'N19-1 2019', 'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding', 'total_pages': 16}, page_content='the tensor2tensor library.1 Because the use\nof Transformers has become common and our im\x02plementation is almost identical to the original,\nwe will omit an exhaustive background descrip\x02tion of the model architecture and refer readers to\nVaswani et al. (2017) as well as excellent guides\nsuch as “The Annotated Transformer.”2\nIn this work, we denote the number of layers\n(i.e., Transformer blocks) as L, the hidden size as\nH, and the number of self-attention heads as A.\n

### Generating an Answer Based on The Similar Chunks

In [24]:
from langchain.prompts import PromptTemplate, ChatPromptTemplate

template="""Use the following pieces of context to answer the user's question of "{question}".
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
"{context}" """

prompt_template = PromptTemplate(
    input_variables =['question','context'],
    template = template
)

In [25]:
our_query = "What are transformers?"

In [26]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

llm=ChatOpenAI(model_name="gpt-4o-mini",
               temperature=0,
               top_p=1)

chain = prompt_template | llm | StrOutputParser()

output= chain.invoke({"question":our_query, "context":doc_search}) # first 5 most similar texts are returned
output

'Transformers are a type of model architecture used in natural language processing (NLP) that utilize mechanisms such as self-attention to process and generate text. They were introduced in the paper "Attention is All You Need" by Vaswani et al. in 2017. Transformers are characterized by their ability to handle sequences of data (like sentences) in parallel, making them efficient for training on large datasets. They consist of layers (or blocks) that include self-attention heads and feed-forward networks, allowing them to capture complex relationships in the data. Notably, models like BERT (Bidirectional Encoder Representations from Transformers) leverage bidirectional self-attention, enabling them to consider context from both directions in a sentence.'

In [27]:
from IPython.display import Markdown

Markdown(output)

Transformers are a type of model architecture used in natural language processing (NLP) that utilize mechanisms such as self-attention to process and generate text. They were introduced in the paper "Attention is All You Need" by Vaswani et al. in 2017. Transformers are characterized by their ability to handle sequences of data (like sentences) in parallel, making them efficient for training on large datasets. They consist of layers (or blocks) that include self-attention heads and feed-forward networks, allowing them to capture complex relationships in the data. Notably, models like BERT (Bidirectional Encoder Representations from Transformers) leverage bidirectional self-attention, enabling them to consider context from both directions in a sentence.

### Pipeline For RAG

In [28]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import PromptTemplate
from langchain_core.language_models import BaseLanguageModel
from langchain_core.vectorstores import VectorStore

In [29]:
def ask_with_rag(question: str, llm: BaseLanguageModel, vectorstore: VectorStore) -> str:
    """
    Answers a question using a Retrieval-Augmented Generation (RAG) approach.

    Args:
        question (str): The question to answer.
        llm (BaseLanguageModel): The language model to use.
        vectorstore (VectorStore): The vector store for document retrieval.

    Returns:
        str: The generated answer.
    """
    # Create a retriever from the vectorstore
    retriever = vectorstore.as_retriever()

    # Define the prompt template
    prompt_template = PromptTemplate.from_template("""
    Answer the question based only on the following context:
    {context}

    Question: {input}
    """)

    # Create the document combination chain
    combine_docs_chain = create_stuff_documents_chain(llm, prompt_template)

    # Create the retrieval chain
    retrieval_chain = create_retrieval_chain(retriever, combine_docs_chain)

    # Invoke the chain with the input question
    response = retrieval_chain.invoke({"input": question})

    return response["answer"]


In [30]:
# Assuming llm and vectorstore are already initialized
question = "What are the key benefits of using BERT in NLP tasks?"
answer = ask_with_rag(question, llm, loaded_index)
print("Answer:", answer)

Answer: The key benefits of using BERT in NLP tasks include:

1. **State-of-the-Art Performance**: BERT achieves new state-of-the-art results on eleven natural language processing tasks, significantly improving scores on benchmarks such as GLUE, MultiNLI, and SQuAD.

2. **Bidirectional Context**: BERT utilizes a masked language model (MLM) objective that allows it to fuse both left and right context, enabling the creation of deep bidirectional representations, which enhances understanding of language.

3. **Task Flexibility**: BERT can be applied to a wide range of tasks, such as question answering and language inference, without requiring substantial modifications to the architecture for specific tasks.

4. **Effective Pre-training**: The use of pre-trained representations reduces the need for extensive task-specific training, making it easier to adapt to various NLP tasks.

5. **Joint Pre-training**: BERT incorporates a "next sentence prediction" task that helps in jointly pre-traini

In [31]:
Markdown(answer)

The key benefits of using BERT in NLP tasks include:

1. **State-of-the-Art Performance**: BERT achieves new state-of-the-art results on eleven natural language processing tasks, significantly improving scores on benchmarks such as GLUE, MultiNLI, and SQuAD.

2. **Bidirectional Context**: BERT utilizes a masked language model (MLM) objective that allows it to fuse both left and right context, enabling the creation of deep bidirectional representations, which enhances understanding of language.

3. **Task Flexibility**: BERT can be applied to a wide range of tasks, such as question answering and language inference, without requiring substantial modifications to the architecture for specific tasks.

4. **Effective Pre-training**: The use of pre-trained representations reduces the need for extensive task-specific training, making it easier to adapt to various NLP tasks.

5. **Joint Pre-training**: BERT incorporates a "next sentence prediction" task that helps in jointly pre-training text-pair representations, further improving its performance on tasks that involve understanding relationships between sentences.

## Project 2: Generating PDF Document Summaries

In this project, you will explore various methods for creating summaries from the provided PDF document. You will experiment with different chaining functions offered by the Langchain library to achieve this.

### **Project Steps:**
- **1.PDF Document Upload and Chunking:** As in the first project, upload the PDF document and divide it into smaller chunks. Consider splitting it by half-page or page.

- **2.Summarization Techniques:**

  - **Summary of the First 5 Pages (Stuff Chain):** Utilize the load_summarize_chain function with the parameter chain_type="stuff" to generate a concise summary of the first 5 pages of the PDF document.

  - **Short Summary of the Entire Document (Map Reduce Chain):** Employ chain_type="map_reduce" and refine parameters to create a brief summary of the entire document. This method generates individual summaries for each chunk and then combines them into a final summary.

  - **Detailed Summary with Bullet Points (Map Reduce Chain):** Use chain_type="map_reduce" to generate a detailed summary with at least 1000 tokens. Provide the LLM with the prompt "Summarize with 1000 tokens" and set the max_token parameter to a value greater than 1000. Add a title to the summary and present key points using bullet points.

### Important Notes:

- Models like GPT-4o and Gemini Pro models might excel in generating summaries based on token count. Consider prioritizing these models.

- For comprehensive information on Langchain and LLMs, refer to their respective documentation.
Best of luck!

### Install Libraries

In [32]:
!pip install -qU langchain-openai

In [33]:
!pip install -qU langchain-community

In [34]:
!pip install -qU pypdfium2

### Loading PDF Document

In [35]:
from langchain_community.document_loaders import PyPDFium2Loader

def read_doc(directory):
    file_loader=PyPDFium2Loader(directory)
    pdf_documents=file_loader.load()
    return pdf_documents

In [36]:
pdf=read_doc('/content/N19-1423_BERT.pdf')
len(pdf)



16

### Summarizing the First 5 Pages of The Document With Chain_Type of The 'stuff'

In [37]:
from langchain_openai import ChatOpenAI
from langchain.chains.summarize import load_summarize_chain

llm = ChatOpenAI(temperature=0,
                 model_name='gpt-4o-mini',
                 max_tokens=1024)

In [38]:
chain = load_summarize_chain(
    llm,
    chain_type='stuff'
)
output_summary = chain.invoke(pdf)#['output_text']

### Document Splitter

In [39]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter


def chunk_data(docs, chunk_size=1000, chunk_overlap=200):
    text_splitter=RecursiveCharacterTextSplitter(chunk_size=chunk_size,
                                                 chunk_overlap=chunk_overlap)
    pdf=text_splitter.split_documents(docs)
    return pdf

In [40]:
pdf_doc=chunk_data(docs=pdf)
len(pdf_doc)

83

In [41]:
pdf_doc[:5] #The first 5 pages

[Document(metadata={'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding', 'author': 'Jacob Devlin ; Ming-Wei Chang ; Kenton Lee ; Kristina Toutanova', 'subject': 'N19-1 2019', 'keywords': '', 'creator': 'LaTeX with hyperref package', 'producer': 'pdfTeX-1.40.18', 'creationdate': '2019-04-29T17:36:03+00:00', 'moddate': '2019-04-29T17:36:03+00:00', 'source': '/content/N19-1423_BERT.pdf', 'total_pages': 16, 'page': 0}, page_content='Proceedings of NAACL-HLT 2019, pages 4171–4186\nMinneapolis, Minnesota, June 2 - June 7, 2019. \nc 2019 Association for Computational Linguistics\n4171\nBERT: Pre-training of Deep Bidirectional Transformers for\nLanguage Understanding\nJacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova\nGoogle AI Language\n{jacobdevlin,mingweichang,kentonl,kristout}@google.com\nAbstract\nWe introduce a new language representa\x02tion model called BERT, which stands for\nBidirectional Encoder Representations from\nTransformers. Unli

### Make A Brief Summary of The Entire Document With Chain_Types of "map_reduce" and "refine"

In [42]:
from langchain.chains.summarize import load_summarize_chain

llm = ChatOpenAI(temperature=0,
                 model_name='gpt-4o-mini',
                 max_tokens=1024)

In [43]:
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=0)
chunks = text_splitter.split_documents(pdf)

In [44]:
len(chunks)

16

In [45]:
chunks[0]

Document(metadata={'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding', 'author': 'Jacob Devlin ; Ming-Wei Chang ; Kenton Lee ; Kristina Toutanova', 'subject': 'N19-1 2019', 'keywords': '', 'creator': 'LaTeX with hyperref package', 'producer': 'pdfTeX-1.40.18', 'creationdate': '2019-04-29T17:36:03+00:00', 'moddate': '2019-04-29T17:36:03+00:00', 'source': '/content/N19-1423_BERT.pdf', 'total_pages': 16, 'page': 0}, page_content='Proceedings of NAACL-HLT 2019, pages 4171–4186\nMinneapolis, Minnesota, June 2 - June 7, 2019. \nc 2019 Association for Computational Linguistics\n4171\nBERT: Pre-training of Deep Bidirectional Transformers for\nLanguage Understanding\nJacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova\nGoogle AI Language\n{jacobdevlin,mingweichang,kentonl,kristout}@google.com\nAbstract\nWe introduce a new language representa\x02tion model called BERT, which stands for\nBidirectional Encoder Representations from\nTransformers. Unlik

In [46]:
%%time
chain = load_summarize_chain(llm,
                             chain_type="map_reduce")


output_summary = chain.invoke(chunks)["output_text"]
Markdown(output_summary)

CPU times: user 900 ms, sys: 95.6 ms, total: 995 ms
Wall time: 57.5 s


The paper presents BERT (Bidirectional Encoder Representations from Transformers), a groundbreaking language representation model developed by Google AI Language. Unlike previous unidirectional models, BERT utilizes a masked language model (MLM) and next sentence prediction (NSP) for pre-training, allowing it to consider both left and right contexts. This bidirectional approach leads to state-of-the-art performance on various natural language processing (NLP) tasks, including question answering and language inference, with minimal task-specific modifications. BERT significantly outperforms prior models on benchmarks like GLUE and SQuAD, demonstrating its effectiveness in enhancing language understanding.

The model consists of two main phases: pre-training on unlabeled data and fine-tuning on labeled data for specific tasks. BERT's architecture includes a multi-layer bidirectional Transformer encoder and employs a unique input representation that accommodates both single and paired sentences. The pre-training tasks enhance its performance across multiple NLP tasks, and the model has been shown to benefit from increased size and effective fine-tuning strategies.

Results indicate that BERT models, particularly BERTLARGE, achieve substantial improvements over previous state-of-the-art systems in various benchmarks, including SQuAD 1.1 and 2.0. The paper also discusses the importance of pre-training tasks, model size, and the effectiveness of both fine-tuning and feature-based approaches in leveraging BERT for diverse NLP applications. Overall, BERT represents a significant advancement in the field of NLP, showcasing the power of bidirectional context in language representation.

In [47]:
chain

MapReduceDocumentsChain(verbose=False, llm_chain=LLMChain(verbose=False, prompt=PromptTemplate(input_variables=['text'], input_types={}, partial_variables={}, template='Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'), llm=ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7ae8cb91e490>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x7ae8cb926810>, root_client=<openai.OpenAI object at 0x7ae8d2bcb0d0>, root_async_client=<openai.AsyncOpenAI object at 0x7ae8cb91e650>, model_name='gpt-4o-mini', temperature=0.0, model_kwargs={}, openai_api_key=SecretStr('**********'), max_tokens=1024), output_parser=StrOutputParser(), llm_kwargs={}), reduce_documents_chain=ReduceDocumentsChain(verbose=False, combine_documents_chain=StuffDocumentsChain(verbose=False, llm_chain=LLMChain(verbose=False, prompt=PromptTemplate(input_variables=['text'], input_types={}, partial_variables={}, template='Writ

In [48]:
%%time
chain = load_summarize_chain(llm,
                             chain_type="refine")

output_summary = chain.invoke(chunks)["output_text"]

CPU times: user 1.14 s, sys: 144 ms, total: 1.28 s
Wall time: 3min 9s


In [49]:
chain

RefineDocumentsChain(verbose=False, initial_llm_chain=LLMChain(verbose=False, prompt=PromptTemplate(input_variables=['text'], input_types={}, partial_variables={}, template='Write a concise summary of the following:\n\n\n"{text}"\n\n\nCONCISE SUMMARY:'), llm=ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7ae8cb91e490>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x7ae8cb926810>, root_client=<openai.OpenAI object at 0x7ae8d2bcb0d0>, root_async_client=<openai.AsyncOpenAI object at 0x7ae8cb91e650>, model_name='gpt-4o-mini', temperature=0.0, model_kwargs={}, openai_api_key=SecretStr('**********'), max_tokens=1024), output_parser=StrOutputParser(), llm_kwargs={}), refine_llm_chain=LLMChain(verbose=False, prompt=PromptTemplate(input_variables=['existing_answer', 'text'], input_types={}, partial_variables={}, template="Your job is to produce a final summary.\nWe have provided an existing summary up to a certa

### Generate A Detailed Summary of The Entire Document With At Least 1000 Tokens. Also, Add A Title To The Summary And Present Key Points Using Bullet Points With Chain_Type of "map_reduce".

In [50]:
%%time
chain = load_summarize_chain(llm,
                             chain_type="map_reduce")


output_summary = chain.invoke(chunks)["output_text"]
Markdown(output_summary)

CPU times: user 397 ms, sys: 48.1 ms, total: 445 ms
Wall time: 49.4 s


The paper presents BERT (Bidirectional Encoder Representations from Transformers), a groundbreaking language representation model developed by Google AI Language. Unlike previous unidirectional models, BERT utilizes a masked language model (MLM) and next sentence prediction (NSP) for pre-training, allowing it to consider both left and right contexts. This bidirectional approach leads to significant improvements in various natural language processing (NLP) tasks, achieving state-of-the-art results on benchmarks like GLUE, MultiNLI, and SQuAD with minimal task-specific modifications. BERT's architecture includes a multi-layer bidirectional Transformer encoder and employs a unique input representation for handling single and paired sentences. The model is available in two sizes, BERTBASE and BERTLARGE, and demonstrates superior performance across multiple tasks, particularly in scenarios with limited training data. The paper also discusses the impact of pre-training tasks, model size, and fine-tuning strategies, highlighting BERT's effectiveness in enhancing language understanding and its competitive edge over previous models like ELMo and OpenAI GPT.

In [51]:
from langchain import PromptTemplate

chunks_prompt="""
Summarize the following portion of a document:

{text}

Keep it concise and capture the most important points.
"""
map_prompt_template=PromptTemplate(input_variables=['text'],
                                   template=chunks_prompt)

In [52]:
# prompt for combined summaries
final_combine_prompt="""
Based on the following summarized sections:

{text}

Please generate:
- A clear and informative **title**
- A **detailed summary** at least 1000 tokens
- Key information in **bullet points**

Structure your response with the title at the top, then the summary, then bullet points.
"""
final_combine_prompt_template=PromptTemplate(input_variables=['text'],
                                             template=final_combine_prompt)

In [53]:
chain = load_summarize_chain(
                            llm=llm,
                            chain_type='map_reduce',
                            map_prompt=map_prompt_template,#every chunk
                            combine_prompt=final_combine_prompt_template #combined all summarization
)
chain

MapReduceDocumentsChain(verbose=False, llm_chain=LLMChain(verbose=False, prompt=PromptTemplate(input_variables=['text'], input_types={}, partial_variables={}, template='\nSummarize the following portion of a document:\n\n{text}\n\nKeep it concise and capture the most important points.\n'), llm=ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x7ae8cb91e490>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x7ae8cb926810>, root_client=<openai.OpenAI object at 0x7ae8d2bcb0d0>, root_async_client=<openai.AsyncOpenAI object at 0x7ae8cb91e650>, model_name='gpt-4o-mini', temperature=0.0, model_kwargs={}, openai_api_key=SecretStr('**********'), max_tokens=1024), output_parser=StrOutputParser(), llm_kwargs={}), reduce_documents_chain=ReduceDocumentsChain(verbose=False, combine_documents_chain=StuffDocumentsChain(verbose=False, llm_chain=LLMChain(verbose=False, prompt=PromptTemplate(input_variables=['text'], input_types

In [54]:
output_summary = chain.invoke(chunks)["output_text"]
output_summary

"# BERT and Hyperparameter Tuning: Transforming Natural Language Processing\n\n## Summary\n\nThe documents provide an in-depth exploration of BERT (Bidirectional Encoder Representations from Transformers), a groundbreaking language representation model developed by Google AI Language, and the associated hyperparameter tuning and fine-tuning strategies for advanced language models, including BERT, ELMo, and OpenAI GPT. BERT has revolutionized the field of natural language processing (NLP) by introducing a unique approach to pre-training deep bidirectional representations from unlabeled text, which significantly enhances the model's understanding of language and context.\n\nBERT's architecture is built on a multi-layer bidirectional Transformer encoder, which allows it to process input sequences while capturing the full context of words in a sentence. This is achieved through a masked language model (MLM) approach, where 15% of the input tokens are randomly masked, and the model is train

In [55]:
Markdown(output_summary)

# BERT and Hyperparameter Tuning: Transforming Natural Language Processing

## Summary

The documents provide an in-depth exploration of BERT (Bidirectional Encoder Representations from Transformers), a groundbreaking language representation model developed by Google AI Language, and the associated hyperparameter tuning and fine-tuning strategies for advanced language models, including BERT, ELMo, and OpenAI GPT. BERT has revolutionized the field of natural language processing (NLP) by introducing a unique approach to pre-training deep bidirectional representations from unlabeled text, which significantly enhances the model's understanding of language and context.

BERT's architecture is built on a multi-layer bidirectional Transformer encoder, which allows it to process input sequences while capturing the full context of words in a sentence. This is achieved through a masked language model (MLM) approach, where 15% of the input tokens are randomly masked, and the model is trained to predict these masked tokens. This innovative method enables BERT to develop a deeper understanding of language, making it particularly effective for token-level tasks such as question answering.

The BERT framework consists of two main steps: pre-training and fine-tuning. During the pre-training phase, BERT is trained on a large corpus of unlabeled data, including BooksCorpus and English Wikipedia, using both the MLM and another task called Next Sentence Prediction (NSP). The NSP task trains the model to understand the relationship between sentences by predicting whether a second sentence follows the first. This dual-task approach allows BERT to learn rich contextual representations that are beneficial for downstream tasks.

Fine-tuning BERT is a straightforward process that requires minimal architectural changes. The model can be initialized with pre-trained parameters and adjusted using labeled data for specific tasks. This flexibility allows BERT to achieve state-of-the-art results on eleven NLP tasks, including GLUE, MultiNLI, and SQuAD benchmarks. Notably, BERT's performance is particularly impressive on tasks with limited training data, showcasing its ability to generalize well from pre-trained representations.

The document also discusses the input/output representations used in BERT, which combine token, segment, and position embeddings. Each input sequence begins with a special classification token ([CLS]), and sentence pairs are differentiated using a separator token ([SEP]). The use of WordPiece embeddings with a 30,000 token vocabulary further enhances BERT's ability to handle diverse language inputs.

Results from the GLUE benchmark demonstrate BERT's superiority over previous state-of-the-art models, with BERTLARGE achieving an impressive score of 80.5, significantly higher than OpenAI GPT's 72.8. In the SQuAD v1.1 benchmark, BERTLARGE outperformed the top leaderboard system by +1.5 F1 in ensembling and +1.3 F1 as a single system. The adaptation of BERT for SQuAD 2.0, which accounts for questions without answers, resulted in a notable 5.1 F1 improvement over previous best systems.

The findings emphasize the importance of pre-training tasks and model size on BERT's performance. Removing the NSP task significantly degrades performance, particularly on tasks like QNLI, MNLI, and SQuAD. Additionally, larger BERT models consistently show improved accuracy across tasks, indicating that scaling model size leads to significant performance gains. However, the results also highlight mixed outcomes regarding the size of pre-trained bidirectional language models on downstream tasks. While larger hidden dimensions can improve performance, further increases may not yield additional benefits. The authors suggest that fine-tuning models directly on downstream tasks with a small number of additional parameters can leverage larger pre-trained representations, even with limited data.

A comparison of fine-tuning and feature-based approaches using BERT on the CoNLL-2003 Named Entity Recognition (NER) task reveals that BERTLARGE performs competitively with state-of-the-art methods. The best feature-based method, which concatenates representations from the top four hidden layers, is only slightly less effective than full fine-tuning.

The second document focuses on hyperparameter tuning and fine-tuning strategies for BERT, ELMo, and OpenAI GPT. It emphasizes the significance of hyperparameter optimization in the context of model performance, particularly when working with large datasets. The analysis reveals that models trained on datasets exceeding 100,000 labeled examples exhibit reduced sensitivity to hyperparameter choices, suggesting that the scale of data can mitigate the impact of suboptimal settings. This finding underscores the importance of conducting exhaustive hyperparameter searches, especially given the rapid fine-tuning capabilities of these models.

A critical comparison between BERT and GPT highlights their architectural differences, which play a pivotal role in their respective performances. BERT's bi-directional architecture and the implementation of two distinct pre-training tasks—masked language modeling (MLM) and next sentence prediction (NSP)—contribute significantly to its superior performance in various NLP tasks. The document outlines key distinctions between the two models, including the nature of their training data, the introduction of special tokens, batch size configurations, and learning

___

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" alt="CLRSWY"></p>

___