📚 DocuChat - Chat with Your PDF Documents

Upload any PDF document and ask questions about it using AI. Get instant answers backed by the content of your documents!

🎯 What is DocuChat?

DocuChat is an intelligent document assistant that lets you have conversations with your PDF files. Instead of reading through hundreds of pages, simply upload your PDF and ask questions in plain English!

Example:

Upload a research paper → Ask "What are the main findings?"
Upload a manual → Ask "How do I reset the device?"
Upload a contract → Ask "What are the payment terms?"

✨ Features

📄 Upload PDF Files - Supports any PDF document up to 10MB
💬 Ask Questions - Get instant answers from your documents
🎯 Accurate Responses - AI answers based only on your document content
📊 Source Citations - See which parts of the document were used to answer
🚀 Fast & Private - Everything runs on your computer
💾 No Cloud Required - Your documents stay on your machine

🎬 How It Works (Simple Explanation)

You upload a PDF → DocuChat reads and understands it
You ask a question → DocuChat searches for relevant information
AI generates an answer → Based only on your document's content
You get the answer → With references to where it found the information

Think of it as having a super-smart assistant who has read your entire document and can instantly answer any question about it!

📋 What You Need Before Starting

Required Software

Node.js - The engine that runs DocuChat
- Download from: https://nodejs.org/
- Choose the "LTS" (Long Term Support) version
- Version needed: 18 or higher
Ollama - The AI brain that understands and answers questions
- Download from: https://ollama.ai/
- It's free and runs on your computer

System Requirements

Operating System: Windows, Mac, or Linux
RAM: At least 8GB (16GB recommended)
Storage: 5GB free space
Internet: Only needed for initial setup

🚀 Installation Guide (Step-by-Step)

Step 1: Install Node.js

Go to https://nodejs.org/
Click the big green button that says "LTS"
Download and run the installer
Follow the installation wizard (click "Next" through the steps)
Verify installation:
- Open Terminal (Mac/Linux) or Command Prompt (Windows)
- Type: node --version
- You should see something like v20.x.x

Step 2: Install Ollama

Go to https://ollama.ai/
Click "Download"
Choose your operating system (Mac/Windows/Linux)
Install the application
Ollama will start automatically after installation

Step 3: Download AI Models

These are the AI "brains" DocuChat needs to work:

Open Terminal/Command Prompt and run:

# This downloads the model that understands document content
ollama pull nomic-embed-text

# This downloads the model that answers questions
ollama pull llama3.2:3b

Wait time: Each download takes 5-10 minutes depending on your internet speed.

Step 4: Download DocuChat

Option A - If you have Git:

git clone https://github.com/your-repo/docuchat.git
cd docuchat

Option B - Download ZIP:

Download the ZIP file from the repository
Extract it to a folder (like Documents/docuchat)
Open Terminal/Command Prompt
Navigate to that folder:
```
cd path/to/docuchat
```

Step 5: Install DocuChat Dependencies

In the Terminal/Command Prompt (make sure you're in the docuchat folder):

npm install

This will take 2-3 minutes. You'll see lots of text scrolling - that's normal!

Step 6: Create Upload Folder

mkdir uploads

This creates a folder where uploaded PDFs are temporarily stored.

▶️ How to Start DocuChat

Starting the Application

Make sure Ollama is running (it usually runs in the background)
Open Terminal/Command Prompt
Navigate to DocuChat folder:
```
cd path/to/docuchat
```
Start DocuChat:
```
npm start
```

Look for this message:

🚀 DocuChat server running on http://localhost:3000

Open your web browser and go to:
```
http://localhost:3000
```

You should see the DocuChat interface! 🎉

📖 How to Use DocuChat

Uploading Your First PDF

Click the "Choose File" or "Upload PDF" button
Select a PDF from your computer (max 10MB)
Click "Upload"
Wait for processing (you'll see a progress indicator)
Success! Your document is now ready for questions

Processing Time:

Small PDF (< 10 pages): 10-30 seconds
Medium PDF (10-50 pages): 30-90 seconds
Large PDF (50+ pages): 1-3 minutes

Asking Questions

Type your question in the chat box
Click "Send" or press Enter
Wait for the answer (usually 5-15 seconds)
Read the response with source references

Good Question Examples:

"What is the main topic of this document?"
"Can you summarize the key points?"
"What does it say about [specific topic]?"
"Who are the authors?"
"What are the conclusions?"

Tips for Better Answers:

✅ Be specific in your questions
✅ Ask one thing at a time
✅ Use keywords from the document
❌ Don't ask about things not in the document
❌ Avoid overly broad questions

Understanding Responses

Each answer includes:

The Answer - AI-generated response based on your document
Sources - Snippets from the document that were used
Page Numbers - Where the information was found
Confidence - How relevant each source is

🎯 Real-World Examples

Example 1: Research Paper

Document: Scientific study on climate change

Questions you can ask:

"What was the research methodology?"
"What are the main findings?"
"What data was collected?"
"What are the limitations of this study?"

Example 2: Product Manual

Document: Smartphone user guide

Questions you can ask:

"How do I take a screenshot?"
"What are the technical specifications?"
"How do I reset the device?"
"What's the battery life?"

Example 3: Legal Document

Document: Service agreement

Questions you can ask:

"What are the payment terms?"
"What is the cancellation policy?"
"What are my obligations?"
"What is the contract duration?"

🛠️ Troubleshooting

Problem: "Cannot connect to Ollama"

Solution:

Make sure Ollama is installed
Check if Ollama is running (look for Ollama icon in system tray)
Try restarting Ollama
On Mac/Linux, run: ollama serve

Problem: "Models not found"

Solution: Run these commands again:

ollama pull nomic-embed-text
ollama pull llama3.2:3b

Problem: "Port 3000 already in use"

Solution: Something else is using port 3000. Either:

Stop the other application
Or change DocuChat's port:
```
PORT=3001 npm start
```
Then access at: http://localhost:3001

Problem: "File too large" when uploading PDF

Solution: The PDF is over 10MB. Try:

Compress the PDF using online tools
Split the PDF into smaller parts
Or contact support to increase the limit

Problem: Answers are not accurate

Possible Reasons:

Question too vague - Be more specific
Information not in document - AI can only use what's in the PDF
Complex document - Try asking about smaller sections
Poor PDF quality - Scanned images don't work well

Solutions:

Rephrase your question
Ask about specific sections
Make sure PDF has selectable text (not scanned images)

Problem: Server won't start

Error: "node: command not found"

Solution: Install Node.js (see Step 1)

Error: "npm: command not found"

Solution: Reinstall Node.js (npm comes with it)

Error: Module not found

Solution: Run npm install again

💡 Tips for Best Results

PDF Quality

✅ Good PDFs:

Text-based PDFs (you can select and copy text)
Well-formatted documents
Clear structure with headings

❌ Problematic PDFs:

Scanned images (unless OCR processed)
Password-protected files
Corrupted files
PDFs with lots of images and little text

Question Types

✅ Questions that work well:

Factual questions ("What is...?", "Who...?", "When...?")
Summary requests ("Summarize...", "What are the key points...?")
Specific lookups ("What does it say about X?")
Comparisons ("What's the difference between...?")

❌ Questions that don't work well:

Questions about things not in the document
Requests for opinions or predictions
Math calculations (unless explicitly in the document)
Questions requiring external knowledge

🔒 Privacy & Security

Your Data is Safe

✅ Everything runs locally on your computer
✅ No cloud uploads - PDFs never leave your machine
✅ No tracking - We don't collect any data
✅ Temporary storage - Files are deleted after processing
✅ Open source - You can review all code

What Gets Stored

Temporary: Uploaded PDFs (deleted after processing)
In Memory: Document chunks and AI embeddings (cleared when you close the app)
Never Stored: Your questions, answers, or any personal data

Network Usage

During Setup: Downloads AI models from Ollama
During Use: Zero internet needed (everything is local)

📊 System Monitoring

Checking If Everything Is Running

1. Check Ollama:

Look for Ollama icon in system tray (Windows/Mac)
Or run: ollama list (should show your models)

2. Check DocuChat:

Open http://localhost:3000/api/health
Should see: {"status":"ok","message":"DocuChat API is running"}

3. Check Document Count:

Open http://localhost:3000/api/chat/stats
Shows how many documents are loaded

Performance Tips

If DocuChat is slow:

Close other applications - AI needs RAM
Use smaller PDFs - Break large documents into sections
Restart DocuChat - Clears memory
Upgrade RAM - 16GB+ recommended for large documents

📱 Advanced Usage (Optional)

Using from Command Line

You can test DocuChat using command-line tools:

Check if server is running:

curl http://localhost:3000/api/health

Upload a PDF:

curl -X POST http://localhost:3000/api/upload \
  -F "pdf=@/path/to/your/document.pdf"

Ask a question:

curl -X POST http://localhost:3000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"question": "What is this document about?"}'

Customization Options

Change AI Model: Edit backend/services/chatService.js and change:

this.chatModel = "llama3.2:3b"; // Try other Ollama models

Change Chunk Size: Edit backend/services/pdfProcessor.js and modify:

chunkSize: 800,      // Make larger for more context
chunkOverlap: 200,   // Adjust overlap

Change Number of Results: Edit backend/services/chatService.js:

this.topK = 3; // Increase to retrieve more context

🏗️ Technical Architecture (For Developers)

Technology Stack

Backend:

Node.js + Express.js
LangChain (RAG framework)
Ollama (AI models)
pdf-parse (PDF processing)

AI Models:

nomic-embed-text - Text embeddings (768 dimensions)
llama3.2:3b - Language model for answers

Components:

PDF Processor - LangChain PDFLoader + RecursiveCharacterTextSplitter
Embedding Service - Converts text to vectors
Vector Store - In-memory similarity search
Chat Service - RAG pipeline with prompt templates

How RAG Works

PDF Upload → Text Extraction → Chunking → Embedding → Vector Store
                                                            ↓
User Question → Embedding → Similarity Search → Top K Chunks
                                                            ↓
                                    Chunks + Question → LLM → Answer

API Endpoints

See backend/API_DOCUMENTATION.md for complete API reference.

❓ Frequently Asked Questions

General Questions

Q: Is DocuChat free? A: Yes! It's completely free and open-source.

Q: Do I need internet? A: Only for initial setup (downloading models). After that, it works offline.

Q: What PDF size is supported? A: Up to 10MB by default. Can be configured for larger files.

Q: Can I upload multiple PDFs? A: Yes! Upload them one at a time. They'll all be searchable together.

Q: Does it work with scanned PDFs? A: Only if the PDF has been OCR-processed (text is selectable).

Technical Questions

Q: What language is it written in? A: JavaScript (Node.js for backend, vanilla JS for frontend).

Q: Can I use different AI models? A: Yes! Any Ollama-compatible model works.

Q: Is my data encrypted? A: Everything runs locally, so data never leaves your computer.

Q: Can I deploy this to a server? A: Yes! See deployment documentation (for technical users).

Usage Questions

Q: Why does the first question take longer? A: The AI models need to "warm up" on first use.

Q: Can I ask follow-up questions? A: Currently, each question is independent. Conversation memory is a planned feature.

Q: What if my document is in another language? A: Ollama models support many languages! Upload and try.

Q: Can I chat with images in PDFs? A: Currently only text is processed. Image recognition is a future feature.

🆘 Getting Help

Need Support?

Check Troubleshooting section above
Read API Documentation in backend/API_DOCUMENTATION.md
Check LangChain Integration guide in LANGCHAIN_INTEGRATION.md
Search existing issues on GitHub
Ask the community in Discussions
Report a bug via GitHub Issues

Reporting Issues

When reporting a problem, include:

Operating system (Windows/Mac/Linux)
Node.js version (node --version)
Ollama version (ollama --version)
Error messages (copy the exact text)
Steps to reproduce the problem
Screenshots (if helpful)

🗺️ Roadmap (Future Features)

Coming Soon

Conversation memory (remember previous questions)
Support for Word documents (.docx)
Support for Excel files (.xlsx)
Multiple language support
Dark mode
Export chat history

Planned Features

Your Ideas?

We'd love to hear your suggestions! Open a discussion on GitHub.

🙏 Credits & Acknowledgments

DocuChat is built with amazing open-source technologies:

LangChain - RAG framework
Ollama - Local AI models
Express.js - Web server
pdf-parse - PDF processing
Node.js - Runtime environment

Special thanks to:

The Ollama team for making AI accessible
The LangChain community
All our contributors

📄 License

This project is licensed under the ISC License.

🚀 Quick Start Checklist

Use this checklist for first-time setup:

📚 Additional Documentation

API Documentation: backend/API_DOCUMENTATION.md
cURL Examples: backend/CURL_EXAMPLES.md
LangChain Integration: LANGCHAIN_INTEGRATION.md

Made with ❤️ for everyone who hates reading long documents

DocuChat - Chat with your PDFs, not through them!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
backend		backend
mocks		mocks
.gitignore		.gitignore
PLAN.md		PLAN.md
package-lock.json		package-lock.json
package.json		package.json
readme.md		readme.md
test-document.txt		test-document.txt

darwin808/DocuChat

Folders and files

Latest commit

History

Repository files navigation

📚 DocuChat - Chat with Your PDF Documents

🎯 What is DocuChat?

✨ Features

🎬 How It Works (Simple Explanation)

📋 What You Need Before Starting

Required Software

System Requirements

🚀 Installation Guide (Step-by-Step)

Step 1: Install Node.js

Step 2: Install Ollama

Step 3: Download AI Models

Step 4: Download DocuChat

Step 5: Install DocuChat Dependencies

Step 6: Create Upload Folder

▶️ How to Start DocuChat

Starting the Application

📖 How to Use DocuChat

Uploading Your First PDF

Asking Questions

Understanding Responses

🎯 Real-World Examples

Example 1: Research Paper

Example 2: Product Manual

Example 3: Legal Document

🛠️ Troubleshooting

Problem: "Cannot connect to Ollama"

Problem: "Models not found"

Problem: "Port 3000 already in use"

Problem: "File too large" when uploading PDF

Problem: Answers are not accurate

Problem: Server won't start

💡 Tips for Best Results

PDF Quality

Question Types

🔒 Privacy & Security

Your Data is Safe

What Gets Stored

Network Usage

📊 System Monitoring

Checking If Everything Is Running

Performance Tips

📱 Advanced Usage (Optional)

Using from Command Line

Customization Options

🏗️ Technical Architecture (For Developers)

Technology Stack

How RAG Works

API Endpoints

❓ Frequently Asked Questions

General Questions

Technical Questions

Usage Questions

🆘 Getting Help

Need Support?

Reporting Issues

🗺️ Roadmap (Future Features)

Coming Soon

Planned Features

Your Ideas?

🙏 Credits & Acknowledgments

📄 License

🚀 Quick Start Checklist

📚 Additional Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages