Generate high-quality articles in any author's writing style
TextScript analyzes the writing style from web articles and uses AI to generate new content that matches that style perfectly. Perfect for content creators, marketers, and writers who want to maintain consistent voice across their content.
- Fetches & Analyzes: Reads articles from URLs you provide
- Learns Style: Uses LLM to understand the author's unique writing patterns
- Researches Topics (optional): Gathers current facts, stats, and quotes via Perplexity Sonar
- Generates Content: Creates new articles on your topic in that exact style
- Caches Profiles: Saves style profiles to save time and API costs
poetry installcp .env.example .env
# Edit .env and add your OpenRouter API keyCreate links.txt with URLs (one per line):
https://example.com/article1
https://example.com/article2
Create topic.txt with your topic:
The future of artificial intelligence
poetry run python -m src.ugly_scriptThe generated article will be:
- Printed to stdout with beautiful colored output
- Automatically saved to
output.txt
- 🎨 Style Analysis: Extracts and analyzes writing style from any web articles
- ✍️ Article Generation: Creates original content matching the analyzed style
- 🔬 Research Enhancement (NEW): Optional research stage using Perplexity Sonar for current facts & data
- ⚡ Smart Caching: Saves style profiles to avoid redundant API calls (saves tokens!)
- 📝 Custom Prompts: Full control over prompt templates in
prompts/folder - ⚙️ Configurable Limits: Fine-tune content length, URL count, and timeouts
- Single or Multiple URLs: Works with 1 to 10+ source articles
- Automatic Truncation: Intelligently limits content to fit LLM context windows
- Beautiful Logging: Colored console output with Loguru
- Error Handling: Graceful handling of network errors, timeouts, and invalid URLs
- Unicode Support: Full support for non-ASCII characters in articles
Input Files → URL Fetching → Text Extraction → Style Analysis → Caching
↓
Style Profile
↓
[Optional: Research Enhancement]
Style Hints → Perplexity Research
↓
Output Files ← Article Generation ← Custom Prompts + Research Data
All major architectural choices are documented with AICODE-NOTE comments:
- requests vs urllib: Chose requests for cleaner error handling
- MD5 for cache keys: Fast, collision-resistant, no security needed
- python-dotenv: Automatic .env parsing with environment variable support
- Truncation strategy: Simple slice truncation for MVP speed
- BeautifulSoup filtering: Removes script/style/nav for clean content
- OpenRouter API: Single endpoint for multiple LLM providers
- Loguru formatting: Color-coded, time-stamped CLI output
Input (Paul Graham essays):
https://www.paulgraham.com/think.html
https://www.paulgraham.com/avg.html
Topic: "The Future of AI in Software Development"
Output: 826-word article matching Paul Graham's conversational, question-driven style with phrases like "Think about it...", "Imagine...", "Let's reflect..."
Performance:
- First run: ~27 seconds (fetch + analysis + generation)
- Second run: ~17 seconds (cached style = 37% faster!)
Same Input (Paul Graham essays + same topic)
With RESEARCH_ENABLED=true:
Research Stage Output:
- Style hints extracted:
balanced, mixed, varied - 5 facts gathered from current sources
- 4 expert quotes with attribution
- Research time: ~19 seconds (Perplexity Sonar)
Enhanced Article Output (786 words):
"...by 2025, 90% of software professionals have integrated AI into their
daily workflows, marking a 14% increase from the previous year..."
"...the 2025 DORA Report reveals that 41% of all code was AI-generated
in 2024, which amounts to an astonishing 256 billion lines of code..."
"...analysts from a16z suggest that AI tools can increase developer
productivity by approximately 20%, while best-in-class AI integration
may contribute up to $3 trillion annually to the global economy..."
"As Casey Ciniello from Infragistics puts it, 'AI is accelerating
innovation, but that innovation must be accompanied by governance,
testing, and ethical considerations.'"
Key Improvements:
- ✅ Concrete data: 6+ specific statistics with numbers
- ✅ Current information: 2024-2025 data points
- ✅ Expert attribution: Named sources and studies
- ✅ Style preserved: Still matches Paul Graham's reflective style
- ✅ Fresh facts: Research NOT cached (always current)
Performance:
- First run with research: ~61 seconds (fetch + analysis + research + generation)
- Second run with research: ~51 seconds (cached style, fresh research = 16% faster)
- Research overhead: +19 seconds for significantly richer content
# Fetches HTML from URLs
# Extracts clean text with BeautifulSoup
# Applies configurable limits:
# - MAX_URLS: 10 (default)
# - MAX_CONTENT_PER_URL: 5000 chars
# - MAX_TOTAL_CONTENT: 8000 chars# Sends content to LLM with style_analysis.txt prompt
# LLM analyzes:
# - Tone and voice
# - Sentence structure
# - Vocabulary choices
# - Rhetorical devices
# Result cached to style_profiles/{hash}.txt# IF RESEARCH_ENABLED=true:
#
# Step 3a: Extract Style Hints
# - Analyzes style profile with LLM
# - Extracts content preferences:
# - content_depth: descriptive/concrete/balanced
# - technical_level: technical/simple/mixed
# - preferred_sources: academic/practical/varied
# - focus_areas: key topics to emphasize
#
# Step 3b: Perform Research
# - Uses Perplexity Sonar (RESEARCH_MODEL) for real-time search
# - Gathers current facts, statistics, expert quotes
# - Tailors research to match extracted style hints
# - Parses response into structured data:
# - facts_and_stats: list of concrete data points
# - quotes_and_sources: expert quotes with attribution
# - full_research_text: comprehensive synthesis
#
# NOTE: Research is NOT cached (always fresh!)# Loads article_generation.txt prompt
# Combines: topic + style profile [+ research data if enabled]
# LLM generates article matching style
# IF research enabled:
# - Integrates facts naturally into narrative
# - Adds expert quotes with attribution
# - Includes current statistics (2024-2025)
# Saves to output.txtNEW: Enrich your articles with real-time facts, statistics, and expert quotes!
When RESEARCH_ENABLED=true, TextScript adds an optional research stage that:
-
Extracts Style Hints: Analyzes your style profile to understand content preferences
- Content depth: descriptive vs concrete
- Technical level: simple vs technical
- Preferred sources: academic vs practical
- Focus areas: key topics to emphasize
-
Performs Research: Uses Perplexity Sonar to gather current information
- Facts and statistics with numbers/data
- Expert quotes and sources with attribution
- Full research synthesis tailored to style
-
Enriches Articles: Integrates research seamlessly into generated content
- Facts woven naturally into narrative
- Quotes attributed properly
- Current, up-to-date information (2023-2025)
Edit .env:
RESEARCH_ENABLED=true
RESEARCH_MODEL=perplexity/sonar-pro # or perplexity/sonar for faster/cheaperTopic: "The Impact of AI on Software Development"
Without Research (old behavior):
- Article based purely on style analysis
- No specific facts or data points
- Generic content
With Research (new feature):
- ✅ "AI coding assistants increased developer productivity by 55% (GitHub, 2024)"
- ✅ "According to Stanford researchers, AI reduces debugging time by 40%"
- ✅ Current industry statistics and expert quotes
- ✅ Specific examples and case studies
- Real-time search: Access to current web information
- Specialized: Optimized for research and fact-finding
- Cost-effective: Cheaper than GPT-4 for research tasks
- Attribution: Returns sources for credibility
- Research results are NOT cached (always fresh data)
- Style profiles are still cached (saves API costs)
- Research adds ~3-5 seconds to generation time
- Requires separate API costs (Sonar queries via OpenRouter)
See .env.example for all available options:
MAX_URLS: Maximum number of URLs to process (default: 10)MAX_CONTENT_LENGTH_PER_URL: Max characters per URL (default: 5000)MAX_TOTAL_CONTENT_LENGTH: Max total content length (default: 8000)URL_FETCH_TIMEOUT: Timeout for URL fetching in seconds (default: 30)
RESEARCH_ENABLED: Enable research enhancement stage (default: false)RESEARCH_MODEL: Model for research (default: perplexity/sonar-pro)- Options:
perplexity/sonar-pro(best),perplexity/sonar(faster/cheaper),openai/gpt-4o-mini(fallback)
- Options:
text-script/
├── src/
│ ├── ugly_script.py # Main entry point
│ ├── config.py # Configuration management
│ ├── url_fetcher.py # URL fetching and content extraction
│ ├── prompt_manager.py # Prompt template management
│ ├── style_cache.py # Style profile caching
│ ├── llm_client.py # OpenRouter API client
│ ├── style_hints_extractor.py # Extract style hints (NEW)
│ ├── research_client.py # Perplexity research (NEW)
│ └── models.py # Data models
├── tests/ # Test suite (121 tests)
├── prompts/ # Customizable prompt templates
│ ├── style_analysis.txt
│ ├── article_generation.txt
│ ├── research_style_hints.txt # NEW
│ └── research.txt # NEW
├── links.txt # Input: URLs to analyze
└── topic.txt # Input: Article topic
Comprehensive test suite with 121 tests covering all functionality:
# Run all tests
poetry run pytest
# ====== 121 passed in 1.19s ======
# Run with coverage
poetry run pytest --cov=src --cov-report=term-missing
# Run specific test file
poetry run pytest tests/test_style_cache.py -v| Module | Coverage | Tests | Status |
|---|---|---|---|
config.py |
100% | 18 | ✅ |
llm_client.py |
100% | 6 | ✅ |
style_hints_extractor.py |
100% | 11 | ✅ (NEW) |
research_client.py |
100% | 14 | ✅ (NEW) |
url_fetcher.py |
90% | 33 | ✅ |
prompt_manager.py |
90% | 16 | ✅ |
models.py |
88% | 11 | ✅ |
style_cache.py |
86% | 15 | ✅ |
| Overall | ~70% | 121 | ✅ |
- Unit Tests: Configuration, models, utilities
- Integration Tests: File I/O, caching, URL fetching
- API Tests: LLM client with mocked responses
- End-to-End: Complete workflow simulation
- Total Tasks: 112/112 completed (100%) ✅
- User Stories: 7/7 delivered (100%) ✅
- Development Time: ~6 hours
- Commits: 17 feature commits
- Lines of Code: ~2,400 (src + tests)
- Test-Driven Development: All features have tests
- Test Coverage: ~70% with 121 comprehensive tests
- ✅ US1: Generate articles in author's style (MVP)
- ✅ US2: Save generated content to file
- ✅ US3: Handle single URL input
- ✅ US4: Customize LLM prompts
- ✅ US5: Configure content limits
- ✅ US6: Reuse cached style profiles
- ✅ US7: Research-enhanced article generation (NEW)
"links.txt not found"
# Create the file in project root
echo "https://example.com/article" > links.txt"topic.txt not found"
# Create the file in project root
echo "Your topic here" > topic.txt"OPENAI_API_KEY is required"
# Add your API key to .env
echo "OPENAI_API_KEY=your_key_here" >> .env"Timeout fetching URL"
# Increase timeout in .env
echo "URL_FETCH_TIMEOUT=60" >> .env- Check the test files for usage examples
- Review AICODE comments for design decisions
- See tasks.md for implementation details
This project was developed using:
- Test-Driven Development (TDD): Tests written before implementation
- AICODE Documentation: All design decisions documented in code
- Speckit Workflow: Specification → Planning → Tasks → Implementation
MIT
Built with ❤️ using Claude Code