Skip to content

HavelCS/DocuLens

Repository files navigation

🧠 DocuLens

A complete AI-powered document analysis platform with web interface and CLI tools

Python 3.7+ Flask LLMware License: MIT

Transform your document collection into an intelligent, searchable knowledge base using AI. Upload documents, get instant analysis, and query your content with natural language.

Demo

🚀 Quick Start

Web Application (Recommended)

# 1. Install dependencies
pip install -r requirements.txt

# 2. Start the web server
python app.py

# 3. Open your browser
open http://localhost:3000

Command Line Interface

# Run the CLI analyzer
python document_analyzer.py

✨ Features

🌐 Web Application

  • 📁 Document Upload: Drag & drop interface for .txt, .md, and .pdf files (up to 16MB)
  • 📊 Real-time Analysis: Instant document statistics and keyword extraction
  • 🔍 AI-Powered Search: Natural language querying with LLMware's capabilities
  • 📈 Analytics Dashboard: Comprehensive document library statistics
  • 🎨 Modern UI: Responsive Bootstrap-based interface with dark mode support
  • 🤖 Model Integration: Access to LLMware's model catalog

💻 Command Line Interface

  • Document Parsing: Parse and analyze text documents
  • Batch Processing: Handle multiple documents simultaneously
  • Query Engine: Search through document collections
  • Statistics Export: Generate detailed analysis reports

🧠 AI Capabilities

  • Natural Language Processing: Understand context and meaning
  • Semantic Search: Find relevant content beyond keyword matching
  • Document Chunking: Intelligent text segmentation
  • Keyword Extraction: Automatic identification of important terms
  • Statistics Generation: Generate summary statistics for document libraries

📁 Project Structure

llmware-project/
├── 🌐 Web Application
│   ├── app.py                    # Flask web server
│   ├── templates/
│   │   ├── base.html            # Base template
│   │   ├── index.html           # Homepage
│   │   ├── upload.html          # File upload
│   │   ├── analyze.html         # Analysis view
│   │   └── query.html           # Search interface
│   └── static/
│       ├── css/style.css        # Custom styling
│       └── js/main.js           # JavaScript utilities
├── 💻 Command Line Tools
│   └── document_analyzer.py     # CLI analyzer
├── 📄 Sample Documents
│   ├── ai_research.txt          # AI research paper
│   └── tech_trends.txt          # Tech trends report
├── ⚙️ Configuration
│   ├── requirements.txt         # Python dependencies
│   ├── README.md               # This file
│   └── WEB_APP_README.md       # Detailed web app docs
└── 📋 Other
    ├── uploads/                 # File upload storage
    └── test_app.py              # Web app tests

Installation

  1. Install the required dependencies:
pip install -r requirements.txt

Usage

Run the document analyzer:

python document_analyzer.py

The script will:

  1. Create a new LLMware library
  2. Add all .txt files from the documents/ folder
  3. Parse and analyze each document
  4. Perform sample queries on the document collection
  5. Display library statistics
  6. List available LLMware models

Sample Output

The script provides detailed analysis including:

  • Word and character counts for each document
  • Keyword detection (AI, technology, computing, machine learning)
  • Document previews
  • Query results for various search terms
  • Library statistics

Adding Your Own Documents

To analyze your own documents:

  1. Place .txt files in the documents/ directory
  2. Run the script again

LLMware Features Demonstrated

  • Library Management: Creating and managing document collections
  • Document Parsing: Processing text files and extracting content
  • Text Querying: Searching through document collections
  • Model Catalog: Accessing available LLMware models

Requirements

  • Python 3.7+
  • LLMware 0.4.2+
  • Dependencies listed in requirements.txt

License

This project is for educational and demonstration purposes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published