TextScript - AI-Powered Style Article Generator

Generate high-quality articles in any author's writing style

TextScript analyzes the writing style from web articles and uses AI to generate new content that matches that style perfectly. Perfect for content creators, marketers, and writers who want to maintain consistent voice across their content.

✨ What It Does

Fetches & Analyzes: Reads articles from URLs you provide
Learns Style: Uses LLM to understand the author's unique writing patterns
Researches Topics (optional): Gathers current facts, stats, and quotes via Perplexity Sonar
Generates Content: Creates new articles on your topic in that exact style
Caches Profiles: Saves style profiles to save time and API costs

Quick Start

1. Install Dependencies

poetry install

2. Configure Environment

cp .env.example .env
# Edit .env and add your OpenRouter API key

3. Prepare Input Files

Create links.txt with URLs (one per line):

https://example.com/article1
https://example.com/article2

Create topic.txt with your topic:

The future of artificial intelligence

4. Run

poetry run python -m src.ugly_script

The generated article will be:

Printed to stdout with beautiful colored output
Automatically saved to output.txt

🎯 Key Features

Core Functionality

🎨 Style Analysis: Extracts and analyzes writing style from any web articles
✍️ Article Generation: Creates original content matching the analyzed style
🔬 Research Enhancement (NEW): Optional research stage using Perplexity Sonar for current facts & data
⚡ Smart Caching: Saves style profiles to avoid redundant API calls (saves tokens!)
📝 Custom Prompts: Full control over prompt templates in prompts/ folder
⚙️ Configurable Limits: Fine-tune content length, URL count, and timeouts

Advanced Features

Single or Multiple URLs: Works with 1 to 10+ source articles
Automatic Truncation: Intelligently limits content to fit LLM context windows
Beautiful Logging: Colored console output with Loguru
Error Handling: Graceful handling of network errors, timeouts, and invalid URLs
Unicode Support: Full support for non-ASCII characters in articles

🏗️ Architecture & Design

Workflow

Input Files → URL Fetching → Text Extraction → Style Analysis → Caching
                                                      ↓
                                               Style Profile
                                                      ↓
                                         [Optional: Research Enhancement]
                                          Style Hints → Perplexity Research
                                                      ↓
Output Files ← Article Generation ← Custom Prompts + Research Data

Key Design Decisions

All major architectural choices are documented with AICODE-NOTE comments:

requests vs urllib: Chose requests for cleaner error handling
MD5 for cache keys: Fast, collision-resistant, no security needed
python-dotenv: Automatic .env parsing with environment variable support
Truncation strategy: Simple slice truncation for MVP speed
BeautifulSoup filtering: Removes script/style/nav for clean content
OpenRouter API: Single endpoint for multiple LLM providers
Loguru formatting: Color-coded, time-stamped CLI output

📊 Real-World Examples

Example 1: Without Research (Baseline)

Input (Paul Graham essays):

https://www.paulgraham.com/think.html
https://www.paulgraham.com/avg.html

Topic: "The Future of AI in Software Development"

Output: 826-word article matching Paul Graham's conversational, question-driven style with phrases like "Think about it...", "Imagine...", "Let's reflect..."

Performance:

First run: ~27 seconds (fetch + analysis + generation)
Second run: ~17 seconds (cached style = 37% faster!)

Example 2: With Research Enhancement ⭐ NEW

Same Input (Paul Graham essays + same topic)

With RESEARCH_ENABLED=true:

Research Stage Output:

Style hints extracted: balanced, mixed, varied
5 facts gathered from current sources
4 expert quotes with attribution
Research time: ~19 seconds (Perplexity Sonar)

Enhanced Article Output (786 words):

"...by 2025, 90% of software professionals have integrated AI into their
daily workflows, marking a 14% increase from the previous year..."

"...the 2025 DORA Report reveals that 41% of all code was AI-generated
in 2024, which amounts to an astonishing 256 billion lines of code..."

"...analysts from a16z suggest that AI tools can increase developer
productivity by approximately 20%, while best-in-class AI integration
may contribute up to $3 trillion annually to the global economy..."

"As Casey Ciniello from Infragistics puts it, 'AI is accelerating
innovation, but that innovation must be accompanied by governance,
testing, and ethical considerations.'"

Key Improvements:

✅ Concrete data: 6+ specific statistics with numbers
✅ Current information: 2024-2025 data points
✅ Expert attribution: Named sources and studies
✅ Style preserved: Still matches Paul Graham's reflective style
✅ Fresh facts: Research NOT cached (always current)

Performance:

First run with research: ~61 seconds (fetch + analysis + research + generation)
Second run with research: ~51 seconds (cached style, fresh research = 16% faster)
Research overhead: +19 seconds for significantly richer content

🚀 How It Works

1. Content Collection

# Fetches HTML from URLs
# Extracts clean text with BeautifulSoup
# Applies configurable limits:
#   - MAX_URLS: 10 (default)
#   - MAX_CONTENT_PER_URL: 5000 chars
#   - MAX_TOTAL_CONTENT: 8000 chars

2. Style Analysis

# Sends content to LLM with style_analysis.txt prompt
# LLM analyzes:
#   - Tone and voice
#   - Sentence structure
#   - Vocabulary choices
#   - Rhetorical devices
# Result cached to style_profiles/{hash}.txt

3. Research Enhancement (Optional - NEW)

# IF RESEARCH_ENABLED=true:
#
# Step 3a: Extract Style Hints
# - Analyzes style profile with LLM
# - Extracts content preferences:
#   - content_depth: descriptive/concrete/balanced
#   - technical_level: technical/simple/mixed
#   - preferred_sources: academic/practical/varied
#   - focus_areas: key topics to emphasize
#
# Step 3b: Perform Research
# - Uses Perplexity Sonar (RESEARCH_MODEL) for real-time search
# - Gathers current facts, statistics, expert quotes
# - Tailors research to match extracted style hints
# - Parses response into structured data:
#   - facts_and_stats: list of concrete data points
#   - quotes_and_sources: expert quotes with attribution
#   - full_research_text: comprehensive synthesis
#
# NOTE: Research is NOT cached (always fresh!)

4. Article Generation

# Loads article_generation.txt prompt
# Combines: topic + style profile [+ research data if enabled]
# LLM generates article matching style
# IF research enabled:
#   - Integrates facts naturally into narrative
#   - Adds expert quotes with attribution
#   - Includes current statistics (2024-2025)
# Saves to output.txt

🔬 Research Enhancement (US7)

NEW: Enrich your articles with real-time facts, statistics, and expert quotes!

How It Works

When RESEARCH_ENABLED=true, TextScript adds an optional research stage that:

Extracts Style Hints: Analyzes your style profile to understand content preferences
- Content depth: descriptive vs concrete
- Technical level: simple vs technical
- Preferred sources: academic vs practical
- Focus areas: key topics to emphasize
Performs Research: Uses Perplexity Sonar to gather current information
- Facts and statistics with numbers/data
- Expert quotes and sources with attribution
- Full research synthesis tailored to style
Enriches Articles: Integrates research seamlessly into generated content
- Facts woven naturally into narrative
- Quotes attributed properly
- Current, up-to-date information (2023-2025)

Enable Research

Edit .env:

RESEARCH_ENABLED=true
RESEARCH_MODEL=perplexity/sonar-pro  # or perplexity/sonar for faster/cheaper

Example

Topic: "The Impact of AI on Software Development"

Without Research (old behavior):

Article based purely on style analysis
No specific facts or data points
Generic content

With Research (new feature):

✅ "AI coding assistants increased developer productivity by 55% (GitHub, 2024)"
✅ "According to Stanford researchers, AI reduces debugging time by 40%"
✅ Current industry statistics and expert quotes
✅ Specific examples and case studies

Why Perplexity Sonar?

Real-time search: Access to current web information
Specialized: Optimized for research and fact-finding
Cost-effective: Cheaper than GPT-4 for research tasks
Attribution: Returns sources for credibility

Important Notes

Research results are NOT cached (always fresh data)
Style profiles are still cached (saves API costs)
Research adds ~3-5 seconds to generation time
Requires separate API costs (Sonar queries via OpenRouter)

Configuration

See .env.example for all available options:

Content Limits

MAX_URLS: Maximum number of URLs to process (default: 10)
MAX_CONTENT_LENGTH_PER_URL: Max characters per URL (default: 5000)
MAX_TOTAL_CONTENT_LENGTH: Max total content length (default: 8000)
URL_FETCH_TIMEOUT: Timeout for URL fetching in seconds (default: 30)

Research Settings (NEW)

RESEARCH_ENABLED: Enable research enhancement stage (default: false)
RESEARCH_MODEL: Model for research (default: perplexity/sonar-pro)
- Options: perplexity/sonar-pro (best), perplexity/sonar (faster/cheaper), openai/gpt-4o-mini (fallback)

Project Structure

text-script/
├── src/
│   ├── ugly_script.py             # Main entry point
│   ├── config.py                  # Configuration management
│   ├── url_fetcher.py             # URL fetching and content extraction
│   ├── prompt_manager.py          # Prompt template management
│   ├── style_cache.py             # Style profile caching
│   ├── llm_client.py              # OpenRouter API client
│   ├── style_hints_extractor.py   # Extract style hints (NEW)
│   ├── research_client.py         # Perplexity research (NEW)
│   └── models.py                  # Data models
├── tests/                         # Test suite (121 tests)
├── prompts/                       # Customizable prompt templates
│   ├── style_analysis.txt
│   ├── article_generation.txt
│   ├── research_style_hints.txt   # NEW
│   └── research.txt               # NEW
├── links.txt                      # Input: URLs to analyze
└── topic.txt                      # Input: Article topic

🧪 Testing

Comprehensive test suite with 121 tests covering all functionality:

# Run all tests
poetry run pytest
# ====== 121 passed in 1.19s ======

# Run with coverage
poetry run pytest --cov=src --cov-report=term-missing

# Run specific test file
poetry run pytest tests/test_style_cache.py -v

Test Coverage by Module

Module	Coverage	Tests	Status
`config.py`	100%	18	✅
`llm_client.py`	100%	6	✅
`style_hints_extractor.py`	100%	11	✅ (NEW)
`research_client.py`	100%	14	✅ (NEW)
`url_fetcher.py`	90%	33	✅
`prompt_manager.py`	90%	16	✅
`models.py`	88%	11	✅
`style_cache.py`	86%	15	✅
Overall	~70%	121	✅

Test Categories

Unit Tests: Configuration, models, utilities
Integration Tests: File I/O, caching, URL fetching
API Tests: LLM client with mocked responses
End-to-End: Complete workflow simulation

📈 Development Stats

Total Tasks: 112/112 completed (100%) ✅
User Stories: 7/7 delivered (100%) ✅
Development Time: ~6 hours
Commits: 17 feature commits
Lines of Code: ~2,400 (src + tests)
Test-Driven Development: All features have tests
Test Coverage: ~70% with 121 comprehensive tests

Delivered User Stories

✅ US1: Generate articles in author's style (MVP)
✅ US2: Save generated content to file
✅ US3: Handle single URL input
✅ US4: Customize LLM prompts
✅ US5: Configure content limits
✅ US6: Reuse cached style profiles
✅ US7: Research-enhanced article generation (NEW)

🛠️ Troubleshooting

Common Issues

"links.txt not found"

# Create the file in project root
echo "https://example.com/article" > links.txt

"topic.txt not found"

# Create the file in project root
echo "Your topic here" > topic.txt

"OPENAI_API_KEY is required"

# Add your API key to .env
echo "OPENAI_API_KEY=your_key_here" >> .env

"Timeout fetching URL"

# Increase timeout in .env
echo "URL_FETCH_TIMEOUT=60" >> .env

Getting Help

Check the test files for usage examples
Review AICODE comments for design decisions
See tasks.md for implementation details

🤝 Contributing

This project was developed using:

Test-Driven Development (TDD): Tests written before implementation
AICODE Documentation: All design decisions documented in code
Speckit Workflow: Specification → Planning → Tasks → Implementation

📄 License

MIT

Built with ❤️ using Claude Code

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.claude/commands		.claude/commands
.specify		.specify
prompts		prompts
specs/001-style-article-generator		specs/001-style-article-generator
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml

Qusto/TextScript

Folders and files

Latest commit

History

Repository files navigation