Skip to content

arul-buk/autoblog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

@stayboba/autoblog

Automated blog content pipeline that writes, optimizes, and publishes SEO-compliant blog posts on autopilot.

You describe your product, define your topic areas, and set a schedule. The pipeline discovers trending topics, researches real keyword data, writes full blog posts, removes AI writing patterns, generates cover images, and translates to multiple languages β€” all from a single config file.

Built for SaaS companies, content teams, and agencies that want a repeatable, quality-controlled blog pipeline running via GitHub Actions or manual execution.


Table of Contents


πŸ“‹ What You Need Before Starting

Accounts and Keys

What Where to get it Required? Cost
Gemini API key aistudio.google.com/apikey Yes Free tier available; paid for heavy usage
DataForSEO account app.dataforseo.com/register Optional ~$0.04-0.20 per post for real keyword data

Without DataForSEO: The pipeline still works. Gemini runs an intelligent keyword strategy β€” analyzing your topic against existing blog posts to find keyword gaps and suggest seed keywords. You get gap-aware keyword guidance, just without search volume numbers. You can add DataForSEO later for data-driven enrichment.

Technical Requirements

What Minimum version
Node.js v20.0.0+
npm v9+
Git Any recent version
A website framework that renders markdown Next.js, Hugo, Jekyll, Gatsby, Astro, etc.

Cost Per Blog Post

Pipeline mode What's included Cost
Full pipeline (6 translations) Research + keyword strategy + DataForSEO + write + humanize + image + 6 translations ~$0.12-0.17
English + polished Research + keyword strategy + write + humanize + image ~$0.05
Budget mode Research + write + image only ~$0.02-0.04

πŸš€ Setup (5 Minutes)

Step 1 β€” Install

npm install @stayboba/autoblog

Or clone the repo directly:

git clone https://github.com/arul-buk/autoblog.git
cd autoblog
npm install

Step 2 β€” Create your config

cp autoblog.config.example.mjs autoblog.config.mjs

Open autoblog.config.mjs and fill in three sections (explained in detail in the next section):

  1. Your product β€” name, URL, description, features
  2. Your authors β€” who writes the blog (can be fictional personas)
  3. Your topics β€” what the blog covers

Step 3 β€” Set your API key

Create a .env file in your project root (loaded automatically):

GEMINI_API_KEY=your-gemini-api-key

Or export directly:

export GEMINI_API_KEY="your-gemini-api-key"

Step 4 β€” Run it

npx autoblog --help        # see all options
npx autoblog --dry-run     # preview without saving files (recommended first time)
npx autoblog               # generate and save one blog post
npx autoblog --batch 5     # generate 5 posts at once (for seeding a new blog)

What you get

After running, you'll find:

_posts/
β”œβ”€β”€ your-topic-slug.md              # Full blog post (frontmatter + body)
β”œβ”€β”€ es/your-topic-slug.md           # Spanish translation
β”œβ”€β”€ fr/your-topic-slug.md           # French translation
β”œβ”€β”€ ...                              # Other configured languages

public/images/blog/
└── your-topic-slug.png             # AI-generated cover image (16:9)

πŸ“ Controlling What the Blog Writes About

This is the most important section. Everything the pipeline writes β€” topics, tone, product mentions, sources, audience β€” is controlled through autoblog.config.mjs. Here's exactly where each decision lives.


What is the blog about? β†’ product section

This is the foundation. The pipeline injects your product context into every prompt, so the LLM knows what it's writing for.

product: {
  // Your product/brand name β€” appears in blog posts
  name: 'AcmeSaaS',

  // Your website β€” used for links in the content
  url: 'https://acme.com',

  // One-sentence description β€” tells the LLM what category you're in
  description: 'Project management tool for remote engineering teams',

  // Features the LLM can reference in articles
  // Be specific β€” the LLM weaves these into posts contextually
  features: [
    'Async Standups β€” automated daily standups across time zones',
    'Sprint Analytics β€” track velocity without manual calculation',
    'Slack Integration β€” manage tasks without leaving Slack',
  ],

  // Tone instruction β€” one sentence that shapes the voice
  tone: 'Technical but approachable. Write for engineering managers, not executives.',
}

What this controls: Every blog post will reference your product naturally, mention relevant features in context, and link back to your site. The tone instruction affects vocabulary, complexity, and style across all generated content.


What topics does it cover? β†’ topics.clusters section

Topic clusters define the content pillars your blog writes about. Each cluster has a name and a list of Google Search queries the pipeline uses to find trending topics.

topics: {
  clusters: [
    {
      name: 'Remote Work',              // Category name (matches authors)
      queries: [                        // Search queries for topic discovery
        'remote team management challenges 2026',
        'async communication best practices',
        'remote engineering team productivity',
      ],
    },
    {
      name: 'Competitor',
      queries: [
        'Jira alternative for remote teams',
        'Linear vs Asana comparison',
        'best project management tool review 2026',
      ],
    },
    {
      name: 'Engineering Culture',
      queries: [
        'sprint retrospective techniques',
        'engineering team burnout prevention',
        'developer experience metrics',
      ],
    },
  ],
}

Tips for writing good search queries:

  • Include the current year for recency: "remote work trends 2026"
  • Mix broad and specific: "project management" (broad) + "async standup tools for distributed teams" (specific)
  • Include competitor names if you want comparison content: "Jira vs Linear", "Asana alternative"
  • Include question formats: "how to run sprint retros remotely"
  • Aim for 3-8 queries per cluster, 3-6 clusters total

What this controls: The pipeline searches Google for these queries, finds trending articles and discussions, and generates blog post ideas from the results. It will never write about topics outside your clusters unless you add them.


What geographic angles should it cover? β†’ topics.regionalContexts

Optional. Adds geographic diversity to topic research.

topics: {
  regionalContexts: [
    { region: 'United States', focus: 'tech layoffs, return-to-office mandates' },
    { region: 'Europe', focus: 'GDPR implications for project management tools' },
    { region: 'India', focus: 'growing IT outsourcing market, time zone challenges' },
  ],
}

What this controls: The pipeline considers these regional angles when generating topics, producing content relevant to different markets.


Who writes the posts? β†’ authors section

Define author personas. The pipeline automatically picks the best author for each topic based on category matching.

authors: [
  {
    name: 'Alex Rivera',
    role: 'Engineering Lead',
    image: '/images/authors/alex.png',
    categories: ['Remote Work', 'Engineering Culture'],  // writes about these topics
  },
  {
    name: 'Priya Sharma',
    role: 'Product Analyst',
    image: '/images/authors/priya.png',
    categories: ['Competitor', 'Product'],
  },
],
fallbackAuthor: 'Alex Rivera',  // used when no category match

What this controls: Each post gets an appropriate byline. Author names, roles, and images appear in the frontmatter. The pipeline won't randomly assign authors β€” it matches by category.


What NOT to include β†’ topics.clusters (by exclusion) + product.tone

The pipeline only writes about topics that match your search queries. If you don't include queries about a subject, it won't write about it.

To explicitly steer away from certain content:

product: {
  // The tone instruction can include "don't" guidance
  tone: 'Technical but approachable. Never write about pricing. ' +
        'Avoid mentioning specific customer names. ' +
        'Do not compare on price β€” compare on features only.',
}

You can also add negative guidance through the content calendar:

schedule: {
  calendar: [
    {
      date: '2026-06-01',
      topic: 'Linear vs AcmeSaaS Feature Comparison',
      notes: 'Do NOT mention pricing. Focus only on feature differences. ' +
             'Acknowledge Linear strengths honestly. Do not bash competitors.',
    },
  ],
}

How do I match a specific writing style? β†’ product.styleGuide

If you have a human-written blog post that captures your ideal tone and style, or brand voice rules you want every post to follow, you can feed these into the pipeline.

product: {
  // ... name, url, features, etc.

  styleGuide: {
    // Option 1: Inline brand voice rules
    voice: `
      Write in second person ("you", not "parents").
      Short paragraphs β€” 2-3 sentences max.
      No jargon. If you must use a technical term, define it immediately.
      Start sections with a direct answer, then elaborate.
      Use "but" and "and" to start sentences occasionally.
    `,

    // Option 2: Load from a file
    // voiceFile: './style/voice-rules.md',

    // Option 3: Provide a reference post to match
    // referencePost: '...full text of a blog post...',
    // referencePostFile: './style/reference-post.md',
  },
}

Two types of guidance:

Type What it does When to use
voice Prescriptive rules the writer follows You can articulate your style as rules ("short paragraphs", "second person", "no jargon")
referencePost The pipeline studies a sample post and matches its rhythm, vocabulary, and structure You have a post that "sounds right" but can't articulate why

How it works in the pipeline:

  1. Writer step β€” style guide is injected into the generation prompt, so the initial draft is already closer to your target style
  2. Humanizer step β€” after removing AI patterns, the humanizer applies a second transformation toward your reference style. Temperature increases from 0.3 β†’ 0.7 to allow more creative rewriting

File vs inline: For both voice and referencePost, you can provide content inline in the config or point to a file. File takes precedence when both are set. Use files when the content is long or shared across projects.

Omitting styleGuide: Zero behavior change. The pipeline runs exactly as before.


What sources should it use? β†’ Built into the research step

The pipeline uses Gemini with Google Search grounding for topic research. This means:

  • It searches real, current Google results (not just LLM training data)
  • It finds recent news articles, blog posts, and social media discussions
  • It prioritizes content from the last 7 days (configurable via topics.recencyDays)

You control source quality through your search queries:

  • Broad queries β†’ pulls from mainstream tech publications, news sites
  • Specific queries β†’ pulls from niche blogs, industry reports, Reddit/Twitter discussions
  • Academic queries β†’ add terms like "study", "research", "data" to your queries

The writer prompt instructs the LLM to attribute all statistics with source and year inline. Vague attribution ("studies show", "experts say") is explicitly prohibited.

topics: {
  recencyDays: 7,       // only consider sources from last N days
  maxCandidates: 5,     // how many topic ideas to generate before deduplication
}

What SEO keywords should it target? β†’ seo section

Without DataForSEO (seo.enabled: false): Gemini still runs an intelligent keyword strategy β€” analyzing your topic against all existing blog posts to find keyword gaps and suggest seed keywords. The writer gets gap-aware keyword guidance, just without volume/difficulty numbers.

With DataForSEO (seo.enabled: true): The pipeline gets real search volume, keyword difficulty, related terms, and People Also Ask questions before writing. This data is injected into the writer prompt.

seo: {
  enabled: true,
  apiLogin: process.env.DATAFORSEO_LOGIN,
  apiPassword: process.env.DATAFORSEO_PASSWORD,
  location: 2840,            // US search data (see config example for other country codes)
  maxDifficulty: 60,         // skip keywords harder than this (0-100 scale)
  minSearchVolume: 100,      // ignore keywords with fewer monthly searches
  maxRelatedKeywords: 10,    // how many secondary keywords to pass to the writer
}

What this controls: The blog targets real search queries with known volume. Posts include primary and secondary keywords naturally, and FAQ sections use real "People Also Ask" questions.


When should it publish? β†’ schedule section

Two levels of control:

Level 1 β€” How often (cron expression):

schedule: {
  cron: '17 8 */3 * *',   // every 3 days at 8:17 UTC
  postsPerRun: 1,          // posts per execution
}

Level 2 β€” What to write on specific days (content calendar):

schedule: {
  calendar: [
    // Just steer toward a category β€” research finds the specific topic
    { date: '2026-06-01', category: 'Remote Work' },

    // Specify an exact topic β€” skips research entirely
    { date: '2026-06-04', topic: 'How to Run Async Standups That Actually Work' },

    // Full editorial control
    {
      date: '2026-06-08',
      topic: 'Linear vs AcmeSaaS: 2026 Feature Comparison',
      category: 'Competitor',
      keywords: ['linear alternative', 'best project management tool'],
      notes: 'Objective comparison. Acknowledge Linear strengths.',
      priority: 'high',  // skip deduplication (intentional overlap with existing post)
    },
  ],
}

Days without calendar entries β†’ the pipeline discovers trending topics automatically.
Days with calendar entries β†’ the pipeline follows your instructions.


βš™οΈ How the Pipeline Works

9 steps, executed in sequence. Each step can be turned on or off independently.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Schedule │──>β”‚ Research  │──>β”‚ Dedupe  │──>β”‚  Keywords   │──>β”‚  Write  β”‚
β”‚ (calendar)β”‚   β”‚ (Gemini+  β”‚   β”‚ (Gemini β”‚   β”‚(Gemini+DFSEO)β”‚   β”‚(Gemini) β”‚
β”‚          β”‚   β”‚  Google)  β”‚   β”‚semantic)β”‚   β”‚             β”‚   β”‚         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
                                                                     β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚Translate β”‚<──│  Image   β”‚<──│ Validate β”‚<──│ Humanize  β”‚<β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚(GeminiΓ—N)β”‚   β”‚ (Gemini) β”‚   β”‚ (local)  β”‚   β”‚ (Gemini)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
# Step What happens API calls Toggle
1 Schedule Checks content calendar for today. Uses calendar entry if found, otherwise proceeds to research. 0 steps.calendar
2 Research Searches Google (via Gemini grounding) for trending topics matching your clusters. Returns 10-15 candidates ranked by recency. 1 steps.research
3 Dedupe Sends candidates + all existing post titles to Gemini. Catches semantic duplicates even with different wording. 1 steps.dedupe
4 Keywords Gemini analyzes topic + existing blog content to find keyword gaps, then calls DataForSEO for volumes, difficulty, related terms, SERP competitors, PAA. When DataForSEO is unavailable, Gemini-only keyword guidance is still provided to the writer. 1 Gemini + 4 DataForSEO steps.keywordResearch
5 Write Gemini generates the full post: YAML frontmatter + HTML/markdown body. Includes product context, keyword data, GEO/AEO rules. 1 Always on
6 Humanize Second Gemini pass removes AI writing patterns (significance inflation, promotional language, filler, structural tells). 1 steps.humanize
7 Validate Local quality check: word count, frontmatter fields, readability score, GEO/AEO compliance score. Zero API calls. 0 steps.validate
8 Image Gemini generates a 16:9 conceptual cover illustration. Saves as PNG. 1 steps.image
9 Translate Translates to each configured language. Brand names preserved. Partial success: saves what succeeds. N steps.translate

How the Keyword Step Works

The keyword step has two layers:

Layer 1 β€” Gemini Keyword Strategy (always runs)

Gemini receives the selected topic + metadata from all existing blog posts (titles, keywords) + your SEO constraints from config. It returns:

  • Primary seed keywords β€” 3-5 specific phrases to research
  • Question keywords β€” 2-3 question-format keywords for FAQ sections
  • Gap analysis β€” what keyword gap this post fills vs existing content
  • Avoid keywords β€” keywords the blog already covers (prevents cannibalization)

This replaces naive string-splitting of the topic title. The output is gap-aware and considers your entire blog history.

Layer 2 β€” DataForSEO Enrichment (only when seo.enabled: true)

The Gemini-selected seeds are sent to DataForSEO for real search volume, difficulty scores, related terms, SERP competitors, and People Also Ask questions. This data-driven layer adds volume/difficulty numbers to guide the writer's keyword density and FAQ section.

When DataForSEO is unavailable: The writer still gets the Gemini keyword strategy output β€” gap-aware keyword guidance with null volume/difficulty. The writer uses these keywords naturally without density targets.


πŸ” Content Quality and SEO Compliance

Humanization β€” Removing AI Writing Patterns

Every post passes through an AI pattern removal step based on Wikipedia's "Signs of AI writing". The humanizer targets:

Pattern Examples removed
Significance inflation "pivotal", "testament", "key turning point", "evolving landscape"
Promotional language "boasts", "vibrant", "nestled", "breathtaking", "renowned"
AI vocabulary (Tier 1) "delve", "tapestry", "leverage", "paradigm shift", "myriad"
AI vocabulary (Tier 2) "robust", "seamless", "cutting-edge", "transformative", "bolster"
Structural tells Uniform section lengths, repeated "takeaway" patterns, rule-of-three overuse
Hedging and filler "It's important to note that...", "In order to", "Due to the fact that"
Generic conclusions "The future looks bright", "Exciting times ahead", "Only time will tell"

GEO/AEO Compliance β€” Optimized for AI Search Engines

Every post is structured for both traditional Google search and AI-powered search (Google AI Overviews, ChatGPT, Perplexity, Bing Copilot).

Structure Why it matters How it's checked
TL;DR section AI overviews extract and cite summary blocks Validator checks for presence
Key Takeaways (4-6 bullets) Perplexity and ChatGPT cite bullet lists Validator checks for presence
Question-based headings (3+) Matches People Also Ask queries Validator counts question headings
Direct-answer paragraphs AI engines extract the first sentence after headings Validator detects filler openers
FAQ section Powers FAQ rich snippets and AI answer boxes Validator checks for presence
Entity definitions AI engines extract definitions Enforced in writer prompt
Attributed statistics AI engines penalize unattributed claims Enforced in writer prompt
Schema-ready frontmatter BlogPosting + FAQPage JSON-LD data Validator checks for schema and qa fields

The validator outputs a GEO/AEO score (0-100):

Step 7/9: Validating post quality...
  Readability: Grade 7.2 (1,247 words)
  GEO/AEO score: 86/100
  βœ“ Validation passed

Schema Markup

The pipeline generates schema-ready frontmatter β€” your website renders it as JSON-LD:

# Generated by autoblog in frontmatter:
schema:
  type: "BlogPosting"
  headline: "How to Run Async Standups That Actually Work"
  description: "Async standups eliminate timezone pain..."
  wordCount: 1247
  keywords: "async standups, remote standup tool"
qa:
  - question: "How do async standups work?"
    answer: "Team members post updates at any time during their workday..."
  - question: "Are async standups better than live standups?"
    answer: "For distributed teams across 3+ time zones, async standups..."

Your website template converts schema β†’ BlogPosting JSON-LD and qa β†’ FAQPage JSON-LD. The autoblog does not inject <script> tags into the markdown β€” that's your website's responsibility.

Readability Scoring

Flesch-Kincaid grade level, calculated locally (zero API calls):

readability: {
  targetGrade: { min: 6, max: 10 },  // 6th-10th grade reading level
  warnOnly: true,                     // warn but don't block
}
Grade range Audience
5-6 Broad consumer, very simple
7-8 Marketing content (recommended default)
9-10 Informed consumers, some technical depth
11-12 Professional/technical audience
13+ Academic, B2B enterprise

πŸ”§ Tech Stack and Architecture

Runtime Dependency

Package Purpose
@google/generative-ai (^0.21.0) Gemini API client for text, image, and Google Search grounding

That's it. One dependency. Everything else uses Node.js built-ins (fetch, fs, path, url).

External APIs

API Auth method What it provides
Gemini API API key via SDK Topic research, keyword strategy, writing, humanization, translation, image generation
DataForSEO REST API Basic auth (login:password) Keyword volumes, difficulty, related keywords, SERP competitors, PAA questions

Models Used

Model Used for Configurable via
gemini-2.5-flash All text generation config.models.text
gemini-2.5-flash-image Cover image generation config.models.image

Module Map

bin/autoblog.mjs ─── CLI entry point (--dry-run, --batch, --config, --init-strategy)
        β”‚
        β–Ό
lib/pipeline.mjs ─── Orchestrates 16 steps in sequence
        β”‚
        β”œβ”€β”€ lib/config.mjs ──────────── Loads config + .autoblog-strategy.json, merges defaults
        β”œβ”€β”€ lib/prompts.mjs ─────────── All Gemini prompts consolidated (12 prompt builders)
        β”œβ”€β”€ lib/retry.mjs ───────────── Exponential backoff (rate_limit / network / bad_output / fatal)
        β”œβ”€β”€ lib/scheduler.mjs ───────── Content calendar resolution
        β”œβ”€β”€ lib/strategy-balancer.mjs ── Content diversity gap analysis + balancing directive
        β”œβ”€β”€ lib/strategy-wizard.mjs ─── Interactive --init-strategy wizard (readline + Gemini)
        β”œβ”€β”€ lib/local-content.mjs ───── Template-based geo page generation
        β”œβ”€β”€ lib/gsc.mjs ─────────────── Google Search Console mining + schedule frequency
        β”œβ”€β”€ lib/context.mjs ─────────── Context persistence + performance feedback loop + GA4
        β”œβ”€β”€ lib/topics.mjs ──────────── Gemini + Google Search topic discovery
        β”œβ”€β”€ lib/deduper.mjs ─────────── Semantic deduplication via Gemini
        β”œβ”€β”€ lib/keyword-research.mjs ── Intelligent keyword strategy (Gemini) + DataForSEO
        β”œβ”€β”€ lib/linker.mjs ──────────── Internal linking (keyword-to-slug index)
        β”œβ”€β”€ lib/writer.mjs ──────────── Blog post generation with GEO/AEO rules
        β”œβ”€β”€ lib/style-guide.mjs ─────── Style guide resolver (voice + reference post)
        β”œβ”€β”€ lib/humanizer.mjs ───────── AI pattern removal + style matching
        β”œβ”€β”€ lib/meta-optimizer.mjs ──── CTR-optimized titles (optional)
        β”œβ”€β”€ lib/cross-reviewer.mjs ──── Cross-model quality review (optional)
        β”œβ”€β”€ lib/validator.mjs ───────── Quality gate + GEO/AEO scoring (0 API calls)
        β”œβ”€β”€ lib/readability.mjs ─────── Flesch-Kincaid grade level (0 API calls)
        β”œβ”€β”€ lib/schema-embedder.mjs ─── JSON-LD BlogPosting + FAQPage embedding (optional)
        β”œβ”€β”€ lib/image-generator.mjs ─── Cover image via Gemini image model
        β”œβ”€β”€ lib/translator.mjs ──────── Multi-language with brand name preservation
        └── lib/publisher.mjs ───────── CMS publishing β€” 5 adapters (optional)

DataForSEO Endpoints

Endpoint What it returns Approx. cost
/dataforseo_labs/google/keyword_overview/live Search volume + difficulty for seed keywords ~$0.01
/dataforseo_labs/google/related_keywords/live Expanded related terms ~$0.05
/dataforseo_labs/google/serp_competitors/live Top-ranking domains ~$0.05
/dataforseo_labs/google/keyword_suggestions/live Question-format keywords for FAQ ~$0.05

Prompt Architecture

All 9 Gemini prompt builders are consolidated in a single file: lib/prompts.mjs. This is a pure-function module with zero imports from other lib files β€” each function takes a destructured object and returns a string.

Function Used by Purpose
buildResearchPrompt topics.mjs Topic discovery via Google Search grounding
buildDedupePrompt deduper.mjs Semantic deduplication against existing posts
buildKeywordStrategyPrompt keyword-research.mjs Intelligent seed keyword selection + gap analysis
buildWriterPrompt writer.mjs Full blog post generation with GEO/AEO rules
buildStyleGuideBlock (used by buildWriterPrompt) Style guide injection into writer prompt
buildHumanizationPrompt humanizer.mjs AI pattern removal system instruction
buildHumanizationUserPrompt humanizer.mjs Humanization user message with content
buildTranslationPrompt translator.mjs Multi-language translation with brand preservation
buildImagePrompt image-generator.mjs Cover image generation prompt

To review or update any prompt, edit lib/prompts.mjs β€” no need to search across module files.

Design Principles

  • Config-driven: All project-specific content lives in one config file. No hardcoded product names, URLs, or topic areas in source code.
  • Prompts in one file: All Gemini prompts consolidated in lib/prompts.mjs for easy review and iteration.
  • Stateless: Reads from disk, writes to disk, exits. No database, no API server. Git is the state store.
  • Partial success: If 5/6 translations succeed, saves those 5 and reports the failure. If image generation fails, post continues without an image.
  • Retry-aware: Every API call is wrapped in exponential backoff. Rate limits (429) get longer delays. Fatal errors (401/403) are not retried.

πŸ“– Configuration Reference

Full configuration with every option: autoblog.config.example.mjs

Quick reference of all config sections

Section What it controls Required?
product Product name, URL, description, features, tone, brand names Yes
authors Author roster with name, role, image, category assignments Yes
topics Topic clusters (search queries), regional contexts, recency Yes
output Post/image directories, body format (html/md/mdx), frontmatter schema, word count, CTA markers Has defaults
translation Enabled flag, language codes, rate limiting Has defaults (disabled)
models Gemini model names for text and image Has defaults
steps Toggle each pipeline step on/off Has defaults (all on)
notifications Telegram/Slack config Optional
retry Max attempts, base delay for exponential backoff Has defaults
seo DataForSEO credentials, location, difficulty/volume thresholds Has defaults (disabled)
schedule Cron expression, posts per run, content calendar Has defaults
readability Target Flesch-Kincaid grade range, warn vs. fail Has defaults
gsc GSC property URL, lookback days, schedule frequency Optional
context Enable performance feedback loop, file path Optional
analytics GA4 property ID for pageview/engagement tracking Optional
contentStrategy Intent mix, format mix, category weights, local content (via --init-strategy or .autoblog-strategy.json) Optional

Body format options

Format Config value Output Best for
HTML 'html' <article><section><h2><p> Next.js, custom rendering
Markdown 'markdown' ## Heading\n\nParagraph Hugo, Jekyll, Gatsby, Astro
MDX 'mdx' Markdown + JSX components MDX-based sites

Astro content collections

Autoblog generates standard .md files with YAML frontmatter β€” compatible with Astro's content collections. Set bodyFormat: 'markdown' and define a matching Zod schema in your src/content.config.ts:

import { defineCollection, z } from 'astro:content';
import { glob } from 'astro/loaders';

const blog = defineCollection({
  loader: glob({ pattern: '**/*.md', base: './src/content/blog' }),
  schema: z.object({
    title: z.string(),
    date: z.string(),
    excerpt: z.string(),
    coverImage: z.string(),
    author: z.string(),
    category: z.string(),
    tags: z.array(z.string()),
    seoKeywords: z.string(),  // always comma-separated string (normalized by pipeline)
    readingTime: z.string().optional(),
    relatedPosts: z.array(z.string()).default([]),  // empty array when no related posts
    qa: z.array(z.object({ question: z.string(), answer: z.string() })),
    schema: z.object({
      type: z.string(),
      headline: z.string(),
      description: z.string(),
      wordCount: z.number(),
      keywords: z.string().optional(),
    }),
  }),
});

export const collections = { blog };

Set output.postsDir to your Astro content directory (e.g., src/content/blog).


⚑ Optional Enhancements

All features below are opt-in. Each activates only when its config flag is enabled and/or API credentials are present. If credentials are missing or an API call fails, the feature is silently skipped and the pipeline continues normally.

GSC-Informed Topic Research

Mines Google Search Console data before trending research to find quick-win keywords (position 4-15), orphan queries (high impressions, no dedicated page), and declining pages that need refreshing.

// In autoblog.config.mjs
gsc: {
  enabled: true,
  propertyUrl: 'sc-domain:example.com',
}

Requires: GSC_SERVICE_ACCOUNT_JSON env var β€” supports both:

  • Service account key (JSON with client_email + private_key) β€” for autonomous GitHub Actions
  • OAuth user credentials (JSON with client_id + refresh_token) β€” for local development

For service accounts, verify domain ownership via DNS TXT record so the account gets siteOwner permission. Add gsc.quotaProject with your GCP project ID.

Search Intent Classification

Automatically classifies keywords as informational, commercial, transactional, or navigational. The writer then structures the post to match searcher expectations (how-to guide vs. comparison vs. product tutorial).

No config needed β€” automatically enhances existing keyword research when steps.keywordResearch: true.

Meta Tag Optimization

After writing, generates 3 optimized title variants using different hook strategies (curiosity, benefit, specificity) and picks the highest-scoring one. Also optimizes the meta description to 150-160 characters.

steps: { metaOptimize: true }

Cost: ~$0.001 per post (1 Gemini Flash call).

Cross-Model Quality Review

Sends the post to a stronger model (Gemini Pro) for quality scoring on factual accuracy, keyword naturalness, tone alignment, and structure. If the score is below threshold, automatically rewrites incorporating the feedback.

steps: { crossModelReview: true },
crossModel: {
  model: 'gemini-2.5-pro',
  qualityThreshold: 7,
}

Cost: ~$0.02-0.05 per post (1 Gemini Pro call, possibly 1 rewrite).

Embedded JSON-LD Schema

Generates BlogPosting and FAQPage JSON-LD <script> blocks from frontmatter and embeds them directly in the post body. Your site renders the post and gets schema markup for free.

steps: { embedSchema: true },
output: { siteUrl: 'https://example.com' }

Context Persistence + Performance Feedback Loop

Maintains a .autoblog-context.json file that tracks which topics were generated, what keywords were targeted, and (optionally) performance data from GSC and GA4. The feedback loop feeds this data back into topic research and keyword strategy β€” the pipeline avoids declining keyword angles and prioritizes categories that perform well.

context: { enabled: true },
// Optional: GA4 performance tracking
analytics: { enabled: true, propertyId: '123456789' }

When performance data is available, the pipeline:

  • Injects top-performing and underperforming categories into the research prompt
  • Flags declining keywords (position > 15) to prevent cannibalization
  • Computes trend per post (strong/moderate/weak based on position)
  • Ignores stale data (> 60 days old)

Requires: GA4_SERVICE_ACCOUNT_JSON env var β€” supports both service account keys and OAuth user credentials (same as GSC). For service accounts, grant Viewer access via the GA4 Admin API or link the GA4 property to your GCP project via BigQuery.

GSC Schedule Frequency

Control how often GSC data is mined, independent of pipeline cron. GSC data lags 2-3 days and rankings need 7-14 days to settle β€” running every pipeline execution is wasteful.

gsc: {
  enabled: true,
  propertyUrl: 'sc-domain:example.com',
  schedule: {
    frequency: 'weekly',  // 'every-run' | 'weekly' | 'biweekly' | 'monthly' | number (days)
  },
}

Content Strategy + Self-Balancing (--init-strategy)

Interactive wizard that asks about your business goals, audience, and competitors, then uses Gemini to recommend an optimal content mix. The pipeline self-balances over time.

npx autoblog --init-strategy    # launches interactive wizard

The wizard asks 6 questions, then saves a .autoblog-strategy.json with:

  • Intent mix β€” target % for informational, commercial, transactional, navigational
  • Format mix β€” target % for how-to, comparison, listicle, news-analysis, tutorial, local-guide, case-study
  • Category weights β€” relative weight per topic cluster
  • Local content config β€” cities, templates, throttling

Each pipeline run compares actual content distribution against targets. When the mix drifts beyond tolerance (default 10%), the balancer injects a directive into research and keyword prompts (e.g., "prioritize commercial/comparison content").

Local Content Engine (Programmatic SEO)

Template-based generation of location-specific pages. Define cities and templates; the pipeline generates one per run until all combinations are fulfilled.

// In .autoblog-strategy.json (generated by wizard) or inline in config
contentStrategy: {
  localContent: {
    enabled: true,
    locations: [
      { city: 'Melbourne', region: 'Victoria', country: 'AU' },
      { city: 'Sydney', region: 'New South Wales', country: 'AU' },
    ],
    templates: [
      'How to Find Verified Building Leads in {city} ({year})',
      'Best Contractors in {city}: What to Look For',
    ],
    maxPerWeek: 1,
  },
}

Enable with steps: { localContent: true }. The writer receives location-specific guidance (mention local industry, use city in headings, add local FAQ).

Topic Backlog

Research generates 5-10 candidate topics per run but only 1 gets written. Previously the rest were discarded. Now:

  • Time-sensitive topics (relevanceScore >= 0.8, breaking news) are written immediately
  • Evergreen topics (score < 0.8, guides, comparisons) are saved to topicBacklog[] in the context file
  • Next run checks backlog first β€” picks top topic, only does fresh Gemini research if empty
  • Topics expire after 30 days, capped at 30 entries
Run 1: Research finds 5 topics β†’ writes #1 (breaking news) β†’ saves #2-#4 to backlog
Run 2: Checks backlog β†’ picks #2 β†’ writes it β†’ #3-#4 remain
Run 3: Checks backlog β†’ picks #3 β†’ writes it
Run 4: Backlog empty β†’ fresh Gemini research

Requires: context.enabled: true. No additional config needed.

CMS Direct Publishing

After saving files locally, also pushes to your CMS via REST API. Supports WordPress, Ghost, Webflow, Strapi, and Contentful.

publish: {
  cms: 'wordpress',  // or 'ghost', 'webflow', 'strapi', 'contentful'
  draft: true,       // publish as draft
}

Auth via env vars β€” see the secrets table in the GitHub Actions section below.


πŸ€– Running on Autopilot (GitHub Actions)

Setup

  1. Copy the workflow template:

    cp templates/github-workflow.yml .github/workflows/auto-blog.yml
  2. Edit the workflow β€” search for <!-- CHANGE --> comments and update:

    • Cron schedule (match your schedule.cron)
    • Git committer email (must be authorized by your deploy platform)
    • Deploy platform config (Vercel by default β€” swap for Netlify/Cloudflare/GitHub Pages)
    • Notification config (Telegram chat ID or remove)
  3. Add secrets to your GitHub repo (Settings β†’ Secrets β†’ Actions):

Secret Required
GEMINI_API_KEY Yes
DATAFORSEO_LOGIN If seo.enabled
DATAFORSEO_PASSWORD If seo.enabled
GSC_SERVICE_ACCOUNT_JSON If gsc.enabled (GSC topic mining)
GA4_SERVICE_ACCOUNT_JSON If analytics.enabled (GA4 performance tracking)
CMS_ENDPOINT If publish.cms is set
CMS_USERNAME / CMS_PASSWORD WordPress publishing
CMS_ADMIN_API_KEY Ghost publishing (id:secret format)
CMS_API_TOKEN Webflow/Strapi/Contentful publishing
CMS_COLLECTION_ID Webflow publishing
CMS_SPACE_ID Contentful publishing
VERCEL_TOKEN If deploying to Vercel
TELEGRAM_BOT_TOKEN For notifications
TELEGRAM_CHAT_ID For notifications

Manual trigger

The workflow supports manual execution from the GitHub Actions UI with inputs for --batch count and --dry-run mode.

Batch mode for seeding

npx autoblog --batch 10

Generates 10 posts sequentially. Deduplication is cumulative (post 3 knows about posts 1 and 2). If post 5 fails, posts 1-4 are still saved and the pipeline continues to post 6.


πŸ€– For AI Agents β€” Setting Up Autoblog in a New Project

Copy the prompt below and give it to your AI coding agent (Claude Code, Cursor, Windsurf, Copilot, etc.) to set up autoblog in your project.

Setup Prompt

I want to set up @stayboba/autoblog β€” an automated blog content pipeline that uses Gemini AI
to research trending topics, write SEO-optimized blog posts, generate cover images, and
optionally translate to multiple languages.

Package: https://www.npmjs.com/package/@stayboba/autoblog
Docs: https://github.com/arul-buk/autoblog

Here's what I need you to do:

STEP 1: GATHER INFORMATION

Before doing anything, ask me ALL of the following questions at once (not one by one):

1. Product name β€” What is the product/brand name?
2. Product URL β€” What is the website URL?
3. Product description β€” One sentence describing what the product does and who it's for.
4. Key features β€” List 3-6 features the AI can reference in blog posts. Be specific
   (e.g., "Async Standups β€” automated daily standups across time zones" not just "standups").
5. Tone β€” How should the blog sound? (e.g., "Technical but approachable, write for
   engineering managers" or "Friendly and reassuring, write for non-technical parents")
6. Topic clusters β€” What 3-6 content pillars should the blog cover? For each, give me
   3-8 Google search queries the pipeline will use to find trending topics. Include the
   current year for recency.
7. Authors β€” 1-3 author personas (name, role, which topic clusters they cover). Can be fictional.
8. Blog post format β€” Does your site use HTML body (<article><section><h2><p>), markdown, or MDX?
9. Output directories β€” Where should posts be saved (e.g., _posts/, src/content/blog/)?
   Where should cover images go (e.g., public/images/blog/)?
10. Translations β€” Do you want posts translated? If yes, which languages?
    (supported: es, pt, fr, de, zh, ja, ko, ar, hi, etc.)
11. DataForSEO β€” Do you have a DataForSEO account for real keyword volume data?
    (Optional β€” the pipeline works without it using Gemini-only keyword strategy)
12. Image style β€” Any specific visual style for cover images? (e.g., "Swiss Brutalist with
    dark backgrounds", "watercolor illustrations", or leave blank for the default minimalist style)
13. GitHub Actions β€” Do you want this running automatically on a schedule? If yes, how often?
    (e.g., every 3 days, weekly)
14. Optional enhancements β€” Do you want any of these? (all are opt-in, all skip gracefully):
    a. GSC topic mining β€” Mine Google Search Console for quick-win keywords (needs service account)
    b. Meta optimization β€” CTR-optimize titles with 3 variants (~$0.001/post)
    c. Cross-model review β€” Quality check via Gemini Pro (~$0.02-0.05/post)
    d. Embedded JSON-LD β€” Embed BlogPosting + FAQPage schema in post body
    e. Context persistence β€” Track posts + performance across runs
    f. GA4 analytics β€” Pull pageview/engagement data (needs service account)
    g. CMS publishing β€” Push to WordPress, Ghost, Webflow, Strapi, or Contentful

STEP 2: INSTALL AND CONFIGURE

Once I answer the questions above:

1. Run: npm install @stayboba/autoblog
2. Copy the example config:
   cp node_modules/@stayboba/autoblog/autoblog.config.example.mjs autoblog.config.mjs
3. Edit autoblog.config.mjs with my answers β€” fill in product, authors, topics, output
   paths, and all settings. Refer to the example config for the full schema.
4. Create a .env file with:
   GEMINI_API_KEY=          (I'll fill in the key β€” get one at https://aistudio.google.com/apikey)
   DATAFORSEO_LOGIN=        (only if using DataForSEO)
   DATAFORSEO_PASSWORD=     (only if using DataForSEO)
5. Add .env and autoblog.config.mjs to .gitignore if not already there

STEP 3: TEST IT

1. Run: npx autoblog --dry-run
2. Show me the output and confirm it looks correct
3. If it works, run: npx autoblog (generates one real post)
4. Verify the post was saved in the correct directory with proper frontmatter

STEP 4: SET UP GITHUB ACTIONS (if requested)

1. Copy the workflow template:
   cp node_modules/@stayboba/autoblog/templates/github-workflow.yml .github/workflows/auto-blog.yml
2. Edit the workflow with the correct cron schedule, output paths, and deploy command
   for my hosting provider (Vercel, Netlify, etc.)
3. Tell me what GitHub repository secrets I need to add:
   - GEMINI_API_KEY (required)
   - DATAFORSEO_LOGIN, DATAFORSEO_PASSWORD (if using DataForSEO)
   - GSC_SERVICE_ACCOUNT_JSON (if using GSC topic mining)
   - GA4_SERVICE_ACCOUNT_JSON (if using GA4 analytics)
   - CMS_ENDPOINT, CMS_USERNAME, CMS_PASSWORD (if publishing to WordPress)
   - CMS_ENDPOINT, CMS_ADMIN_API_KEY (if publishing to Ghost)
   - CMS_API_TOKEN, CMS_COLLECTION_ID (if publishing to Webflow)
   - TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID (if you want Telegram notifications)

STEP 5: VERIFY SITE RENDERS THE POSTS

1. Check that my site's blog rendering (frontmatter schema, content format) is compatible
   with the generated output
2. If using Astro 6, set up the content collection config with the correct Zod schema
   (see the README for the full schema)
3. If using Next.js, verify the markdown/MDX rendering pipeline handles the frontmatter fields
4. Start the dev server and confirm the generated post renders correctly

IMPORTANT NOTES:
- The Gemini API key is required. Get one free at https://aistudio.google.com/apikey
- DataForSEO is optional but recommended for real keyword data (~$0.04-0.20 per post)
- Each post costs ~$0.02-0.17 depending on pipeline steps enabled
- The pipeline generates: frontmatter (title, excerpt, author, date, category, coverImage,
  seoKeywords, readingTime, relatedPosts) + full article body
- Cover images are 16:9 AI-generated PNGs
- Do not embed pipeline code in the project β€” keep autoblog as a package dependency
- Do not modify autoblog source files β€” everything is config-driven
- Start with budget mode (no DataForSEO, no translations), enable features incrementally

Architecture rules

  • Do not embed pipeline code in the consuming project. Keep autoblog as a separate package. The config file is the only project-specific artifact.
  • Do not modify autoblog source files to customize behavior. Everything is config-driven.
  • Schema markup is the website's job. The pipeline outputs frontmatter; the website renders JSON-LD.
  • Start with budget mode. Enable DataForSEO and translations after confirming the basic pipeline works.
  • The pipeline is stateless. Reads from disk, writes to disk, exits. No database. Git is the state store.

Frontmatter schema reference

The website needs to render JSON-LD schema from schema and qa frontmatter fields:

// BlogPosting schema
const blogPostingSchema = {
  "@context": "https://schema.org",
  "@type": frontmatter.schema.type,          // "BlogPosting"
  "headline": frontmatter.schema.headline,
  "description": frontmatter.schema.description,
  "wordCount": frontmatter.schema.wordCount,
  "datePublished": frontmatter.date,
  "dateModified": frontmatter.lastModified,
  "author": { "@type": "Person", "name": frontmatter.author },
  "image": `https://${siteUrl}${frontmatter.coverImage}`,
};

// FAQPage schema
const faqSchema = {
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": frontmatter.qa.map(item => ({
    "@type": "Question",
    "name": item.question,
    "acceptedAnswer": { "@type": "Answer", "text": item.answer },
  })),
};

Troubleshooting

Problem Fix
Config file not found Create autoblog.config.mjs or use --config path
GEMINI_API_KEY required export GEMINI_API_KEY=your-key or add to .env file
seo.apiLogin required Set DataForSEO env vars or set seo.enabled: false
All candidate topics already covered Add new queries to topics.clusters or use calendar with specific topics
GEO/AEO score below 50 Usually improves on re-run. Try steps.humanize: false temporarily to isolate.
Image generation failed Post saved without image. Non-blocking. Re-run or generate manually.
Local topic deduped, pipeline exits Fixed in 1.2.0 β€” now falls back to trending research automatically
DataForSEO returns no data for AU Fixed in 1.2.0 β€” Gemini seed keywords used as fallback
primaryKeyword null in context Fixed in 1.2.0 β€” seeds propagated when DataForSEO unavailable

πŸ” Google Service Account Setup (GSC + GA4)

One service account handles both GSC and GA4 across all your sites.

Step 1 β€” Create GCP project + service account

gcloud projects create your-project-id
gcloud config set project your-project-id
gcloud services enable searchconsole.googleapis.com analyticsdata.googleapis.com analyticsadmin.googleapis.com siteverification.googleapis.com
gcloud iam service-accounts create autoblog-agent --display-name="Autoblog Pipeline Agent"
gcloud iam service-accounts keys create ~/autoblog-service-account.json \
  --iam-account=autoblog-agent@your-project-id.iam.gserviceaccount.com

Step 2 β€” Grant project-level permissions

gcloud projects add-iam-policy-binding your-project-id \
  --member="serviceAccount:autoblog-agent@your-project-id.iam.gserviceaccount.com" \
  --role="roles/viewer"
gcloud projects add-iam-policy-binding your-project-id \
  --member="serviceAccount:autoblog-agent@your-project-id.iam.gserviceaccount.com" \
  --role="roles/serviceusage.serviceUsageConsumer"

Step 3 β€” Verify domains for GSC (DNS method)

# Get verification token for each domain
# (use the Site Verification API β€” see gsc.mjs for the JWT auth pattern)

# Add TXT record to your DNS (e.g., via Cloudflare API)
# Then verify via the Site Verification API

# After verification, add the site to GSC:
# PUT https://searchconsole.googleapis.com/webmasters/v3/sites/sc-domain%3Ayour-domain.com

The service account becomes siteOwner with full read access to search analytics.

Step 4 β€” Grant GA4 access

Create an OAuth Desktop client in your GCP project (Cloud Console β†’ APIs & Services β†’ Credentials β†’ OAuth client ID β†’ Desktop). Use it to call the GA4 Admin API once:

# POST https://analyticsadmin.googleapis.com/v1alpha/accounts/{ACCOUNT_ID}/accessBindings
# Body: { "user": "autoblog-agent@your-project-id.iam.gserviceaccount.com", "roles": ["predefinedRoles/viewer"] }
# Requires: analytics.manage.users OAuth scope

This is a one-time operation. The service account then has permanent autonomous access.

Step 5 β€” Configure

# In .env or shell profile
export GSC_SERVICE_ACCOUNT_JSON="$HOME/autoblog-service-account.json"
export GA4_SERVICE_ACCOUNT_JSON="$HOME/autoblog-service-account.json"

# In autoblog.config.mjs
gsc: {
  enabled: true,
  propertyUrl: 'sc-domain:your-domain.com',
  quotaProject: 'your-project-id',
  schedule: { frequency: 'weekly' },
},
analytics: {
  enabled: true,
  propertyId: '123456789',  // GA4 property ID (numeric)
},

Step 6 β€” GitHub Actions secrets

gh secret set GSC_SERVICE_ACCOUNT_JSON --repo your-org/your-repo < ~/autoblog-service-account.json
gh secret set GA4_SERVICE_ACCOUNT_JSON --repo your-org/your-repo < ~/autoblog-service-account.json

πŸ“¦ Updating

From npm

npm update @stayboba/autoblog

From GitHub source

npm install github:arul-buk/autoblog

What to check after updating

  1. Run tests β€” npx autoblog --dry-run to verify pipeline works with your config
  2. New config options β€” check autoblog.config.example.mjs for new sections (all optional, backward compatible)
  3. Context file β€” new fields are added automatically; old context files work without migration
  4. Strategy file β€” .autoblog-strategy.json is optional; re-run --init-strategy to regenerate with new format options

Version history

Version Changes
1.2.0 Context feedback loop, strategy balancer, local content engine, topic backlog, GSC schedule frequency, OAuth credential support, schema embedder fix, 135 tests
1.1.0 GSC mining, meta optimizer, cross-model review, schema embedder, context persistence, CMS publishing, intent classification
1.0.1 Fix bin path for npx resolution
1.0.0 Initial release β€” core pipeline with 9 steps

Project Structure

autoblog/
β”œβ”€β”€ bin/
β”‚   └── autoblog.mjs              # CLI entry point
β”œβ”€β”€ lib/
β”‚   β”œβ”€β”€ config.mjs                 # Config loader + .autoblog-strategy.json merge
β”‚   β”œβ”€β”€ retry.mjs                  # Exponential backoff
β”‚   β”œβ”€β”€ scheduler.mjs              # Content calendar
β”‚   β”œβ”€β”€ strategy-balancer.mjs      # Content diversity gap analysis (new)
β”‚   β”œβ”€β”€ strategy-wizard.mjs        # Interactive --init-strategy wizard (new)
β”‚   β”œβ”€β”€ local-content.mjs          # Template-based geo pages (new)
β”‚   β”œβ”€β”€ gsc.mjs                    # GSC mining + schedule frequency (new)
β”‚   β”œβ”€β”€ context.mjs                # Context persistence + feedback loop + GA4
β”‚   β”œβ”€β”€ topics.mjs                 # Topic research (Gemini + Google)
β”‚   β”œβ”€β”€ deduper.mjs                # Semantic deduplication
β”‚   β”œβ”€β”€ keyword-research.mjs       # Intelligent keyword strategy + DataForSEO
β”‚   β”œβ”€β”€ prompts.mjs                # All Gemini prompt builders (single source of truth)
β”‚   β”œβ”€β”€ writer.mjs                 # Post generation (GEO/AEO compliant)
β”‚   β”œβ”€β”€ style-guide.mjs            # Style guide resolver
β”‚   β”œβ”€β”€ humanizer.mjs              # AI pattern removal + style matching
β”‚   β”œβ”€β”€ meta-optimizer.mjs         # CTR title optimization (optional)
β”‚   β”œβ”€β”€ cross-reviewer.mjs         # Cross-model quality review (optional)
β”‚   β”œβ”€β”€ validator.mjs              # Quality gate + GEO/AEO scoring
β”‚   β”œβ”€β”€ linker.mjs                 # Internal linking
β”‚   β”œβ”€β”€ readability.mjs            # Flesch-Kincaid scoring
β”‚   β”œβ”€β”€ schema-embedder.mjs        # JSON-LD embedding (optional)
β”‚   β”œβ”€β”€ translator.mjs             # Multi-language translation
β”‚   β”œβ”€β”€ image-generator.mjs        # Cover image generation
β”‚   β”œβ”€β”€ publisher.mjs              # CMS publishing β€” 5 adapters (optional)
β”‚   └── pipeline.mjs               # 16-step orchestrator
β”œβ”€β”€ test/
β”‚   β”œβ”€β”€ fixtures/                   # Mock data for tests
β”‚   β”œβ”€β”€ context-insights.test.mjs   # Context feedback loop (19 tests)
β”‚   β”œβ”€β”€ schema-embedder.test.mjs    # JSON-LD schema (13 tests)
β”‚   β”œβ”€β”€ strategy-balancer.test.mjs  # Strategy balancing (8 tests)
β”‚   β”œβ”€β”€ local-content.test.mjs      # Local content engine (10 tests)
β”‚   β”œβ”€β”€ validator.test.mjs          # Post validation + GEO/AEO (20 tests)
β”‚   β”œβ”€β”€ readability-scheduler-linker.test.mjs  # Readability + scheduler + linker (26 tests)
β”‚   β”œβ”€β”€ meta-optimizer.test.mjs     # Meta tag optimization (13 tests)
β”‚   β”œβ”€β”€ cross-reviewer.test.mjs     # Cross-model review (12 tests)
β”‚   β”œβ”€β”€ publisher.test.mjs          # CMS publishing (14 tests)
β”‚   └── simulate-context-diff.mjs   # Interactive scenario simulation tool
β”œβ”€β”€ autoblog.config.example.mjs    # Full config reference
└── package.json                   # npm test: 135 tests via node:test

πŸ§ͺ Testing

135 tests using Node.js built-in node:test (zero test dependencies).

npm test              # run all 135 tests

Scenario simulation

Visualize how context and strategy data changes prompts across different scenarios:

node test/simulate-context-diff.mjs --diff                    # all scenarios, diff only
node test/simulate-context-diff.mjs --prompt research          # research prompt only
node test/simulate-context-diff.mjs --scenario 1,4 --diff      # compare specific scenarios

License

MIT - see LICENSE

About

Config-driven automated blog pipeline powered by Gemini AI. Researches trending topics, writes SEO-optimized posts, generates cover images, and translates to 6 languages.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors