In [1]:
from langchain.docstore.document import Document
from PyPDF2 import PdfReader
import os

In [2]:
base_path = 'F:/Super AI SS4/AI-builder/Large-Language-Models-(LLMs)/try-it/Text-Summarization/Dataset/'
pdf_files = ['document1.pdf', 'document2.pdf', 'document3.pdf']

In [3]:
def read_pdf(file_path):
    with open(file_path, 'rb') as file:
        pdf = PdfReader(file)
        text = ''
        for page in pdf.pages:
            text += page.extract_text()
    return text

In [4]:
documents = []
for pdf_file in pdf_files:
    file_path = os.path.join(base_path, pdf_file)
    text = read_pdf(file_path)
    documents.append(Document(page_content=text, metadata={"source": pdf_file}))

In [5]:
documents

[Document(page_content="Climate change is one of the most pressing global challenges of our time. It refers to \nlong-term shifts in temperature and weather patterns across the Earth. While these \nchanges can occur naturally, the rapid warming we've seen over the past century is \nprimarily due to human activities.  \n \nThe main driver of current climate change is the burning of fossil fuels like coal, oil, \nand natural gas. These activities release greenhouse gases, primarily carbon dioxide \n(CO2), into the atmosphere. Greenhouse gases trap heat from the sun, causing the \nEarth's average temperature to rise – a phenomenon known as global warming.  \n \nThe evidence for climate change is unequivocal. Global temperature records show \nthat the Earth has warmed by approximately 1°C since pre -industrial times. This may \nseem small, but it represents a significant change in the Earth's energy balance and \nhas far-reaching consequences.  \n \nOne of the most visible effects of clima

In [6]:
first_file_path = os.path.join(base_path, pdf_files[0])
text = read_pdf(first_file_path)
print(text)

Climate change is one of the most pressing global challenges of our time. It refers to 
long-term shifts in temperature and weather patterns across the Earth. While these 
changes can occur naturally, the rapid warming we've seen over the past century is 
primarily due to human activities.  
 
The main driver of current climate change is the burning of fossil fuels like coal, oil, 
and natural gas. These activities release greenhouse gases, primarily carbon dioxide 
(CO2), into the atmosphere. Greenhouse gases trap heat from the sun, causing the 
Earth's average temperature to rise – a phenomenon known as global warming.  
 
The evidence for climate change is unequivocal. Global temperature records show 
that the Earth has warmed by approximately 1°C since pre -industrial times. This may 
seem small, but it represents a significant change in the Earth's energy balance and 
has far-reaching consequences.  
 
One of the most visible effects of climate change is the melting of polar ice a

In [15]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

In [16]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)

all_chunks = []
for pdf_file in pdf_files:
    file_path = os.path.join(base_path, pdf_file)
    text = read_pdf(file_path)
    doc = Document(page_content=text, metadata={"source": pdf_file})
    chunks = text_splitter.split_documents([doc])
    all_chunks.extend(chunks)

In [17]:
print(f"Number of all chunks = {len(all_chunks)}")
all_chunks

Number of all chunks = 18


[Document(page_content="Climate change is one of the most pressing global challenges of our time. It refers to \nlong-term shifts in temperature and weather patterns across the Earth. While these \nchanges can occur naturally, the rapid warming we've seen over the past century is \nprimarily due to human activities.  \n \nThe main driver of current climate change is the burning of fossil fuels like coal, oil, \nand natural gas. These activities release greenhouse gases, primarily carbon dioxide \n(CO2), into the atmosphere. Greenhouse gases trap heat from the sun, causing the \nEarth's average temperature to rise – a phenomenon known as global warming.  \n \nThe evidence for climate change is unequivocal. Global temperature records show \nthat the Earth has warmed by approximately 1°C since pre -industrial times. This may \nseem small, but it represents a significant change in the Earth's energy balance and \nhas far-reaching consequences.", metadata={'source': 'document1.pdf'}),
 Docu

In [18]:
embeddings = OllamaEmbeddings(model="nomic-embed-text", show_progress=True)
vector_db = Chroma.from_documents(
    documents=all_chunks,
    embedding=embeddings,
    collection_name="multi-doc-summary"
)

OllamaEmbeddings: 100%|██████████| 18/18 [00:40<00:00,  2.23s/it]


In [19]:
from langchain_community.llms import Ollama
from langchain.chains.summarize import load_summarize_chain

In [20]:
llm = Ollama(model="llama3")
chain = load_summarize_chain(llm, chain_type="map_reduce")

In [21]:
summary = chain.run(all_chunks)

print(summary)

Here is a concise summary:

Climate change refers to long-term shifts in temperature and weather patterns caused by human activities, primarily the burning of fossil fuels. It has far-reaching consequences, including melting polar ice, rising sea levels, altered precipitation patterns, and increased frequency and severity of extreme weather events.

The impacts are widespread, affecting ecosystems, biodiversity, agriculture, energy, and economies worldwide. Climate change is causing:

* Ecosystem disruptions and biodiversity loss
* Increased risk of natural disasters and economic losses
* Changes in global food systems and water availability
* Threats to human settlements and infrastructure
* Displacement and migration

To address climate change, we must:

* Mitigate its effects through reduced emissions and cleaner energy
* Develop strategies for natural systems to adapt to existing changes
* Transition to a green economy with opportunities for innovation, job creation, and new market