PDF Summarizer📄

Overview

This project is a Python-based Retrieval-Augmented Generation (RAG) pipeline that intelligently summarizes .pdf and .txt documents using a lightweight sentence embedding model and a locally hosted LLM (ollama). It leverages semantic search over chunked document text to extract the most relevant context before querying the language model.

⚙ Features

✅ Supports both .txt and .pdf inputs
✅ Automatically filters out irrelevant sections like References or Bibliography
✅ Splits large documents into context-friendly chunks
✅ Uses sentence-transformers (all-MiniLM-L6-v2) for vector similarity search
✅ Summarizes based on top-k relevant chunks using Ollama + Gemma3:1b
✅ Outputs concise, context-aware answers to a given question
✅ Saves results as .txt

📚 Tech Stack

Python
SentenceTransformers
PyPDF2
Ollama
NumPy
Pandas

🚀 How It Works

File Reading
Preprocessing
Chunking
Embedding
Retrieval
RAG Summarization
Output

🛠️ Usage

pip install -r requirements.txt
ollama run gemma3:1b
python pdf-summaizer.py

Don't forget to star me on GitHub and follow me! Thanks :)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE.md		NOTICE.md
README.md		README.md
pdf-summaizer.py		pdf-summaizer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Summarizer📄

Overview

⚙ Features

📚 Tech Stack

🚀 How It Works

🛠️ Usage

About

Uh oh!

Releases

Packages

Languages

License

bandirevanth/Briefly

Folders and files

Latest commit

History

Repository files navigation

PDF Summarizer📄

Overview

⚙ Features

📚 Tech Stack

🚀 How It Works

🛠️ Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages