DocAssembler: AI-Powered Document Research & Assembly System

An intelligent document assembly system that combines targeted web crawling, AI-driven research compilation, and smart template-based document generation. DocAssembler helps users create comprehensive documents by automatically gathering, analyzing, and synthesizing information from various web sources.

🎯 Core Features

1. Intelligent Web Documentation Crawler

Crawls entire website directories to gather documentation
Focuses on process documentation, API specs, software instructions
Supports various documentation types:
- Technical documentation
- API documentation
- Process instructions
- Wiki pages
- Social media profiles
- Knowledge bases

2. AI Research Compilation

Tag-based research gathering
Accepts user summaries and topic keywords
Performs deep web searches on individual tags
Analyzes tag relationships and domain contexts
Generates comprehensive research reports in PDF/Markdown
Similar to OpenAI's DeepResearch functionality

3. Smart Template-Based Document Assembly

Semi-automated document completion
User provides partial information
AI completes missing sections
Supported templates include:
- Software Requirements Specification (SRS)
- Executive Summaries
- CREST Data Problem Reports
- RFP Proposals
- Report Abstracts
- Plot Summaries
- Research Synopses

Project Components

Web Crawler (packages/webcrawler):
- Intelligent web crawling with domain/subdomain support
- Respects robots.txt and implements rate limiting
- Concurrent crawling with proper session management
- Content extraction and relationship mapping
Documentation Generator (packages/docgen):
- Markdown and HTML processing
- Table of contents generation
- PDF output support
- Template-based document generation
- Metadata handling
Web Interface (services/web):
- React/Vite-based web application
- Real-time processing feedback
- Document preview and editing
- Configuration management

Database Recommendation

DocAssembler stores raw text in a traditional relational database and vectors in a vector database. By default we recommend MySQL for document metadata and ChromaDB for vector search. Example setup scripts are located in scripts/databases/.

Getting Started

Prerequisites

Python 3.12+
Node.js 20+
Docker (optional)

Installation

Clone the repository:

git clone https://github.com/cloudcurio/doc_assembler_web.git
cd doc_assembler_web

Set up Python packages:

# Install Poetry
curl -sSL https://install.python-poetry.org | python3 -

# Install webcrawler package
cd packages/webcrawler
poetry install

# Install docgen package
cd ../docgen
poetry install

Set up web interface:
```
cd ../../services/web
npm install
```

Development Setup

Install pre-commit hooks:

pip install pre-commit
pre-commit install

Configure environment:

cp .env.example .env
# Edit .env with your settings

Running Tests

# Run webcrawler tests
cd packages/webcrawler
poetry run pytest

# Run docgen tests
cd ../docgen
poetry run pytest

# Run web interface tests
cd ../../services/web
npm test

Usage Examples

1. Documentation Crawling

from webcrawler.core.config import CrawlerConfig
from webcrawler.core.crawler import Crawler

# Configure and run documentation crawler
config = CrawlerConfig(
    start_url="https://docs.example.com",
    doc_types=["api", "wiki", "technical"],
    content_filters=["documentation", "guide", "manual"]
)

async with Crawler(config) as crawler:
    docs = await crawler.gather_documentation()
    print(f"Found {len(docs)} documentation pages")

2. Research Compilation

from airesearch.core.researcher import Researcher
from airesearch.models.topic import ResearchTopic

# Configure research parameters
topic = ResearchTopic(
    tags=["kubernetes", "service mesh", "istio"],
    context="cloud native architecture",
    depth="technical"
)

# Generate research report
researcher = Researcher()
report = await researcher.compile_research(
    topic=topic,
    output_format="pdf",
    include_citations=True
)

3. Template-Based Document

from docgen.core.assembler import DocumentAssembler
from docgen.models.template import Template

# Create SRS document from template
assembler = DocumentAssembler()
srs_doc = await assembler.create_document(
    template=Template.SRS,
    initial_content={
        "project_name": "MyProject",
        "project_scope": "Cloud-based service...",
    },
    auto_complete=True
)

Docker Support

Build and run using Docker:

# Build images
docker compose build

# Run services
docker compose up -d

Configuration Options

Documentation Crawler

doc_types: Types of documentation to gather
- api, wiki, technical, process, social
content_filters: Content type filters
depth: Crawling depth configuration
extract_assets: Include images and diagrams
rate_limits: Domain-specific rate limiting

AI Research

search_depth: Research depth level
tag_relationships: Tag correlation settings
source_quality: Source validation rules
citation_style: Citation format
analysis_level: Research analysis depth

Document Templates

templates/: Customizable document templates
- SRS Template
- Executive Summary Template
- RFP Template
- Research Report Template
completion_rules/: AI completion guidelines
style_guides/: Document styling rules

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Website: cloudcurio.cc
Email: dev@cloudcurio.cc

React + Vite

This template provides a minimal setup to get React working in Vite with HMR and some ESLint rules.

Currently, two official plugins are available:

@vitejs/plugin-react uses Babel for Fast Refresh
@vitejs/plugin-react-swc uses SWC for Fast Refresh

Expanding the ESLint configuration

If you are developing a production application, we recommend using TypeScript with type-aware lint rules enabled. Check out the TS template for information on how to integrate TypeScript and typescript-eslint in your project.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github		.github
apps		apps
infrastructure		infrastructure
packages		packages
scripts		scripts
services/web		services/web
terraform		terraform
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
DIFF_20250618T234736Z.md		DIFF_20250618T234736Z.md
GOALS.md		GOALS.md
Makefile		Makefile
PROJECT_LOG.md		PROJECT_LOG.md
PROJECT_PLAN_MODERN.md		PROJECT_PLAN_MODERN.md
README.md		README.md
RECOMMENDATIONS_20250618T234736Z.md		RECOMMENDATIONS_20250618T234736Z.md
SECURITY.md		SECURITY.md
SRS.md		SRS.md
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.old.yml		docker-compose.old.yml
docker-compose.yml		docker-compose.yml
mcp_server_design.md		mcp_server_design.md
package.json		package.json
project_tasks.md		project_tasks.md
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DocAssembler: AI-Powered Document Research & Assembly System

🎯 Core Features

1. Intelligent Web Documentation Crawler

2. AI Research Compilation

3. Smart Template-Based Document Assembly

Project Components

Database Recommendation

Getting Started

Prerequisites

Installation

Development Setup

Running Tests

Usage Examples

1. Documentation Crawling

2. Research Compilation

3. Template-Based Document

Docker Support

Configuration Options

Documentation Crawler

AI Research

Document Templates

Contributing

License

Contact

React + Vite

Expanding the ESLint configuration

About

Uh oh!

Releases

Packages

Uh oh!

Languages

cbwinslow/doc_assembler_web

Folders and files

Latest commit

History

Repository files navigation

DocAssembler: AI-Powered Document Research & Assembly System

🎯 Core Features

1. Intelligent Web Documentation Crawler

2. AI Research Compilation

3. Smart Template-Based Document Assembly

Project Components

Database Recommendation

Getting Started

Prerequisites

Installation

Development Setup

Running Tests

Usage Examples

1. Documentation Crawling

2. Research Compilation

3. Template-Based Document

Docker Support

Configuration Options

Documentation Crawler

AI Research

Document Templates

Contributing

License

Contact

React + Vite

Expanding the ESLint configuration

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages