An AI-powered, agentic content creation pipeline that transforms topic lists into fully-researched, SEO-optimized, human-readable articles β with Human-in-the-Loop (HITL) approval built in. Generates a 1,500-word SEO article in approximately 90 seconds (depending on LLM latency)
- Overview
- Key Features
- Architecture
- Project Structure
- Tech Stack
- Prerequisites
- Installation
- Configuration
- Running the App
- How to Use
- Pipeline Flow
- Contributing
- License
Full Content Machine 2026 is a fully automated, multi-agent RAG (Retrieval-Augmented Generation) system built with LangGraph, LangChain, ChromaDB, and Streamlit. You upload a simple CSV/Excel content calendar, and the machine produces polished, SEO-optimized articles β one at a time, with a human review checkpoint at the outline stage.
It's designed for content teams, SEO agencies, and solo creators who want to scale output without sacrificing quality.
- π CSV/Excel Content Calendar β Bulk import topics & keywords
- π§ Multi-Agent RAG Pipeline β Separate agents for query generation, retrieval, outlining, drafting, and editing
- β Human-in-the-Loop (HITL) β Review and approve or regenerate outlines before drafting begins
- π Knowledge Base Ingestion β Upload PDFs/TXT files into ChromaDB collections for contextual research
- βοΈ AI Writing & Editing β GPT-powered drafting + editorial refinement pass
- π¦ HTML Export β Articles exported as clean, publish-ready HTML files
- π LangGraph Orchestration β Stateful, inspectable graph-based agent workflow
- ποΈ Streamlit UI β Clean, interactive web interface β no code required to run
Content Calendar (CSV/Excel)
β
βΌ
βββββββββββββββββββββ
β Query Generator β β Generates search queries from topic + keyword
ββββββββββ¬βββββββββββ
β
βΌ
βββββββββββββββββββββ
β Retriever β β Fetches context from ChromaDB vector store
ββββββββββ¬βββββββββββ
β
βΌ
βββββββββββββββββββββ
β Outline Generator β β Creates structured article outline (GPT-4o)
ββββββββββ¬βββββββββββ
β
[HITL Review] β Human approves or rejects outline
β
βΌ
βββββββββββββββββββββ
β Article Writer β β Drafts the full article (async, GPT-4o)
ββββββββββ¬βββββββββββ
β
βΌ
βββββββββββββββββββββ
β Article Editor β β Refines tone, clarity, SEO
ββββββββββ¬βββββββββββ
β
βΌ
βββββββββββββββββββββ
β HTML Exporter β β Saves final article to /exports
βββββββββββββββββββββ
Content Machine/
βββ app.py # Main Streamlit application entry point
βββ requirements.txt # Python dependencies (pinned versions)
βββ .env # API keys and secrets (NOT committed to git)
βββ .gitignore # Files excluded from version control
βββ README.md # This file
β
βββ src/ # Core application modules
β βββ __init__.py
β βββ config.py # App-wide configuration & ChromaDB collection names
β βββ db.py # ChromaDB initialization & collection management
β βββ ingestion.py # PDF/TXT ingestion into vector store
β βββ query_generator.py # Converts CSV rows into search queries
β βββ retriever.py # RAG retrieval from ChromaDB
β βββ graph.py # LangGraph pipeline definition & state management
β βββ writer.py # AI article drafting agent
β βββ editor.py # AI article editing/refinement agent
β βββ exporter.py # HTML export utilities
β
βββ chroma_db/ # Local ChromaDB vector store (auto-generated, git-ignored)
βββ exports/ # Generated HTML articles (git-ignored)
| Layer | Technology |
|---|---|
| UI | Streamlit |
| Orchestration | LangGraph |
| LLM Framework | LangChain |
| LLM Provider | OpenAI GPT-4o |
| Vector Database | ChromaDB |
| Embeddings | OpenAI text-embedding-ada-002 |
| Data Processing | Pandas |
| PDF Parsing | pypdf, pypdfium2 |
| Async Runtime | asyncio + nest_asyncio |
| Language | Python 3.11+ |
- Python 3.11 or higher
- An OpenAI API key (get one here)
pipor a virtual environment manager (venv,conda, etc.)- Git (for version control)
git clone https://github.com/YOUR_USERNAME/content-machine-2026.git
cd content-machine-2026python -m venv .myenv
source .myenv/bin/activate # macOS / Linux
# .myenv\Scripts\activate # Windowspip install -r requirements.txtCreate a .env file in the project root (this is git-ignored for security):
cp .env.example .env # if example exists, otherwise create manuallyAdd your secrets to .env:
OPENAI_API_KEY=sk-...your-key-here...
# Add any other API keys or config values here
β οΈ Never commit your.envfile. It is already listed in.gitignore.
# Make sure your virtual environment is active
source .myenv/bin/activate
# Launch the Streamlit app
streamlit run app.pyThe app will open at http://localhost:8501 in your browser.
Use the sidebar to upload PDF or TXT files into a ChromaDB collection. This gives the AI contextual research material to draw from when writing articles.
Upload a CSV or Excel file with at least two columns:
- Topic β The article title or subject
- Keyword β The primary SEO keyword to target
Example CSV:
Topic,Keyword,Search Intent
How to Start a Podcast,start a podcast,Informational
Best Running Shoes 2026,best running shoes,Commercial
Select which columns correspond to Topic and Keyword using the dropdowns.
Click π Run Content Machine. For each topic, the pipeline will:
- Generate research queries
- Retrieve relevant context from the vector store
- Generate an article outline β then pause for your review
- β Approve β Continue to drafting, editing, and export
- β Reject & Regenerate β Add feedback and get a revised outline
Finished articles are saved as .html files in the /exports folder.
Upload Calendar β Query Gen β RAG Retrieval β Outline
β
[Human Review]
β β
Approve Reject + Feedback
β β
Draft Article Regenerate Outline
β
Edit Article
β
Export HTML
β
Next Topic in Queue
Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.
- Fork the repo
- Create your feature branch:
git checkout -b feature/your-feature - Commit your changes:
git commit -m 'Add some feature' - Push to the branch:
git push origin feature/your-feature - Open a Pull Request
This project is licensed under the MIT License. See the LICENSE file for details.
Built with β€οΈ using LangGraph, LangChain, OpenAI, ChromaDB & Streamlit.