Enterprise-grade, multi-threaded SEO Crawler, Rule Engine, and Link Graph Analyzer. Built in TypeScript for speed, compliance, and deep site health audits.
SEOCore is an enterprise-grade, high-performance SEO auditing and site crawling platform. It combines a concurrent crawler, Cheerio-based scrapers, a declarative Rules Engine, and Graph Theory to analyze link structures, calculate authority scores, track redirects, and score site health across multiple dimensions.
- Developers & Web Engineers: Run local audits, profile rendering pipelines, track performance budgets, and integrate SEO linting directly into CI/CD pipelines.
- SEO Specialists: Analyze canonicalization, crawl depth, HTTP redirect chains, structured data, canonical compliance, robots.txt directives, and sitemap coverage.
- Site Administrators: Find broken links, orphan pages, crawl budget waste, and redirect loops.
- Runtime: Node.js (v20+) & TypeScript
- Monorepo Manager: Nx Monorepo
- Crawler: Custom HTTP engine powered by Bottleneck (rate-limiting) & p-queue (concurrency)
- Headless Browser: Playwright (optional, for client-side JavaScript rendering)
- HTML Parser: Cheerio (fast server-side DOM selection)
- Validation & CLI: Zod (configuration schema enforcement) & Commander.js
- Test Runner: Vitest
-
Execution Tier System:
- Tiers drive everything from crawl limits to rule selection and scoring behavior
- Fast: Core rules only, 1 page, static HTML
- Standard: + Performance, 100 pages, simulated CWV
- Deep: + All modules, 500 pages, Playwright rendering
- Enterprise: + Plugins, 5000 pages, Lighthouse sampling
-
High-Performance Concurrent Crawler:
- Built-in rate-limiting, custom backoff delays, retry policies, and timeout handlers.
- Respects robots.txt directives and extracts URLs from
sitemap.xmlautomatically.
-
Path Filtering (Inclusions/Exclusions):
- Restrict audits using wildcards (e.g.
/blog/*or*.html). - Block admin sections or static resource patterns.
- Restrict audits using wildcards (e.g.
-
Deep Redirect Hop & Loop Tracking:
- Manual redirection handling intercepts 3xx responses.
- Traces complete redirect chains (statusCode and hops) and catches circular redirect loops.
-
Unified Structured Data & Entity Graph Auditor:
- Compiles Schema.org JSON-LD, Microdata, RDFa elements from raw source HTML and Playwright rendering.
- Stitches nodes into an Entity Graph, resolves referencing pointers deeply, and maps DAG layouts safely.
- Evaluates E-E-A-T markers (sameAs links pointing to Wikipedia/Wikidata/LinkedIn).
- Cross-checks schema values (price, title, canonical URL) against HTML headers, canonicals, and OpenGraph/Twitter card tags.
-
Crawl Graph & Link Authority Analysis:
- Computes in-degree, out-degree, and custom authority scores (PageRank style).
- Flags orphan pages and structural dead ends.
-
AI Visibility & LLM Crawler Directives Auditor:
- Evaluates brand visibility and structured indexing across search engines, chatbots, and AI crawlers.
- Strictly validates crawlability configurations (robots.txt, sitemaps).
- Audits
llms.txtand/.well-known/llms.txtrules for GPTBot, ClaudeBot, PerplexityBot, and Google-Extended.
-
Mobile SEO Scorer & Evaluator:
- Evaluates mobile usability (viewport meta, responsive layouts, navigation toggle detection, tap targets).
- Scores mobile performance (simulated Core Web Vitals including throttled mobile LCP, mobile CLS, JS payload and requests).
- Verifies responsive design quality (CSS media queries, fluid layouts, standard mobile breakpoints).
- Audits mobile indexing readiness (content parity, structured data validity, mobile-first canonical configuration).
- Enforces
isVerifiable()guards and strict empty states (non-pass by default) under static crawls, capping unverified performance scores at 50 to ensure high scores require real runtime validation.
-
E-E-A-T & Content Quality Analyzer:
- Evaluates Experience, Expertise, Authoritativeness, and Trustworthiness (E-E-A-T) pillars.
- Scores content readability (Flesch Reading Ease, Flesch-Kincaid Grade Level).
- Analyzes content structure, word count, and internal link density.
- Extracts top keywords and checks for keyword stuffing.
- Verifies AI citation readiness (structured data completeness, llms.txt presence, semantic HTML usage).
- Provides actionable findings with severity levels.
- Supports JSON and HTML report exports for documentation and CI/CD integration.
-
Outbound Authority Links & Google Rank Checker:
- Analyzes backlink domains metrics (authority counts, referring domains, spam scores).
- Verifies keywords visibility inside Google Top 10 Search Results via serpapi or headless browser automation.
- Competitive Site Comparer:
- Compares health metrics, performance budgets, metadata, and link structures across two different URLs or exported JSON audits.
-
Hreflang Validator:
- Validates bidirectional hreflang links across pages.
- Checks for consistent x-default configurations.
- Validates language code formats.
- Deep-crawls all hreflang-referenced pages with
--deepoption. - Exports validation reports in terminal and JSON formats.
-
Optional Headless Rendering:
- Boot Playwright to parse single-page apps (SPAs) that require client-side execution.
-
Visual Screenshot Capture:
- Capture screenshots of your pages at different breakpoints (mobile, tablet, desktop).
- Use Playwright device descriptors (e.g., "iPhone 15 Pro") for accurate mobile screenshots.
- Capture full-page screenshots of your website.
- Deep crawl to capture screenshots for all pages listed in your sitemap.
-
Dedicated Image Audit (
imagescommand):- Audits page or site-wide images for SEO, performance, accessibility, and caching.
- Discovers assets from
<img>,<picture>, inlinebackground-image, and<link rel="preload" as="image">. - Fetches metadata in parallel (size, format, cache headers, CDN signals) and decodes dimensions with
sharp. - Optional Playwright mode for rendered vs natural size, viewport placement, and LCP image detection.
- Rules cover payload weight, legacy formats, lazy-loading strategy, CLS risk, responsive
srcset, alt text, and broken/mixed-content URLs. - Byte-weighted scoring, mobile payload budgets (1.5MB), and LCP image weight targets (100KB).
- Exports terminal summary plus JSON or HTML reports with thumbnails and worst-offender tables.
-
Evidence-based Technology Stack Detection (
technologycommand):- Analyzes frontend frameworks, rendering strategies, CDN/edge delivery networks, backend servers, CMS packages, analytics trackers, UI systems, asset fonts, and third-party tools.
- Suppresses low-confidence noise. Requires deterministic evidence weights before reporting.
- Classifies page rendering strategies directly (Hybrid, SSR, CSR, or static HTML).
- Exports findings in terminal tables, structured raw JSON, or clean HTML charts.
-
JavaScript SEO Impact Report (
js-impactcommand):- Compares raw HTML against rendered DOM to detect SEO-relevant changes caused by client-side JavaScript.
- Flags metadata, heading, content, links, image, and structured-data parity issues between pre-render and post-render states.
- Helps diagnose CSR / hydration problems that can hide content or links from crawlers.
- Exports terminal, JSON, HTML, and Markdown reports for debugging and CI workflows.
-
Business Directory Presence & NAP Consistency Audit (
directoriescommand):- Detects business listings across major directories and local citation sources.
- Extracts source-site NAP data (name, phone, address, website) and compares it against candidate listings.
- Classifies listings as
Issues not found,Wrong Phone Number,Wrong Business Name,No Phone Number,Not Present, orSearch failed. - Uses a resilient HTTP search cascade (
Bing -> Brave -> Mojeek -> DuckDuckGo) with optional SerpAPI or Playwright fallback. - Outputs terminal tables or raw JSON for citation cleanup, local SEO audits, and missed-opportunity reporting.
-
Audit Snapshots & Diff:
- Save audit snapshots automatically with
--saveflag - Compare current audit against previous snapshot with
--diff - CI mode with regression detection (
--diff --ci) fails only on regressions - Stores snapshots in
./.seocore/history/<host>/directory
- Save audit snapshots automatically with
-
Explain & Dry-Run UX:
- Preview audit configuration without crawling with
--dry-run - Explains active tier, enabled modules, page budget, and active rules
seocore rules explain <rule-id>shows detailed rule informationseocore tier explain <tier>shows tier capabilities and configuration
- Preview audit configuration without crawling with
-
Schema Graph Explorer:
- Analyze structured data entities and their relationships
- Detects broken references, duplicate entities, and schema coverage gaps
- Exports in terminal, JSON, HTML, and Mermaid diagram formats
-
Internal Link Planner:
- Generates actionable internal linking recommendations
- Identifies orphan pages and low-authority priority pages
- Suggests source/target page pairs with anchor text themes
- Highlights high-leverage hub pages
-
Search Opportunities Analyzer:
- Combines crawl findings with optional GSC/CrUX data
- Prioritizes opportunities by estimated business impact and ease-of-fix
- Works without external providers using heuristic-based ranking
- Identifies metadata, performance, indexing, internal links, schema, and content opportunities
-
Production-Ready Reporting:
- Real-time colored terminal logging via custom EventBus.
- Exports rich, detailed audit logs in terminal, JSON, HTML, and SARIF formats.
The project is structured as a modular TypeScript monorepo managed with Nx:
packages/
├── cli/ # Command-line interface containing CLI commands
├── engine/ # Main orchestrator linking crawling, parsing, and scoring
├── crawler/ # HttpCrawler, PlaywrightCrawler, robots.txt & sitemap parser
├── analyzers/ # Fast cheerio scrapers and page normalizers
├── rules/ # Declarative SEO auditing rules and rule compiler
├── scoring/ # Crawl graph authority & category scoring engines
├── config/ # Config loading, default presets, and Zod schema validation
├── sdk/ # Shared interfaces, events, schemas, and common utilities
└── reporter/ # Exporters (TerminalReporter and JsonReporter)
- Node.js v20.0.0 or higher
- npm
Global install (lets you run seocore directly):
npm install -g seocore
seocore config initLocal project install (run with npx):
npm install seocore
npx seocore config initOne-off run (no install):
npx seocore@latest config init-
Clone repository:
git clone https://github.com/codepurse/SEOCORE.git cd SEOCORE -
Install dependencies:
npm install
-
Build monorepo:
npm run build
SEOCore can be executed via the CLI or imported directly as an SDK.
The core CLI executable is seocore.
If you installed package locally with npm install seocore, run commands as npx seocore ....
Examples below use direct seocore ... form, which assumes global install via npm install -g seocore.
audit: Audit a website for SEO, speed, indexing, accessibility, and metadatacrawl: Crawl a website and list discovered pages without scoringcompare: Compare two websites or SEO audit reportsimages: Analyze images on a webpage or crawl an entire site for image issuestechnology: Detect website technology stack with evidence-based confidence scoresjs-impact: Compare raw HTML vs rendered DOM for JavaScript SEO impactdirectories: Check business directory presence and NAP consistency across citation sourcesinspect: Single-aspect probes (robots, sitemap, schema, hreflang, backlinks, rank, screenshot, llms-txt)analyze: Analyzer-driven deep dives (content, ai-visibility, schema-graph, link-plan, opportunities)config: Manage and validate SEO configrules: Manage and inspect SEO validation rulestier: Manage execution tiers
Generate a default seocore.config.json configuration file at your project root:
seocore config initShow current config:
seocore config showValidate config:
seocore config validateAudit a website's landing page (default standard tier):
seocore audit https://example.comAudit using specific tiers:
# Fast tier (core rules, 1 page, static HTML)
seocore audit https://example.com --tier fast
# Standard tier (core + performance, 100 pages, simulated CWV)
seocore audit https://example.com --tier standard
# Deep tier (all modules, 500 pages, Playwright rendering)
seocore audit https://example.com --tier deep
# Enterprise tier (all modules + plugins, 5000 pages, Lighthouse sampling)
seocore audit https://example.com --tier enterpriseExport audit as HTML report:
seocore audit https://example.com --format html --output ./seocore-report.htmlSave audit snapshot for later comparison:
seocore audit https://example.com --saveCompare current audit against previous snapshot:
seocore audit https://example.com --diffSave new snapshot and compare with previous:
seocore audit https://example.com --save --diffCI mode - fail on regressions only:
seocore audit https://example.com --diff --ciDry-run - preview what will be audited without crawling:
seocore audit https://example.com --dry-runAudit Flags: --save, --diff, --ci, --dry-run, --history-dir <path> (custom snapshot directory)
Map site structure and list HTTP responses without executing SEO rules or scoring:
seocore crawl https://example.com --depth 2 --max-pages 100List all registered rules, severity levels, and category assignments:
seocore rules listDescribe a specific rule:
seocore rules describe <rule-id>Explain a specific rule in detail:
seocore rules explain <rule-id>List all available tiers, their capabilities, and configurations:
seocore tier listDescribe a specific tier:
seocore tier describe <tier-name>Explain a specific tier in detail:
seocore tier explain <tier-name>Evaluate search engine/chatbot discovery, metadata structure, citation readiness, and entity mapping:
seocore analyze ai-visibility https://example.comOutput results in raw JSON:
seocore analyze ai-visibility https://example.com --jsonEvaluate Experience, Expertise, Authoritativeness, and Trustworthiness (E-E-A-T), content readability, structure, and AI citation readiness:
seocore analyze content https://example.com/blog/postExport as JSON:
seocore analyze content https://example.com --json --output content-report.jsonExport as HTML:
seocore analyze content https://example.com --format html --output content-report.htmlCI mode with budgets:
seocore analyze content https://example.com --ci --budget-eeat 70 --budget-content 75Explore structured data entities, relationships, and schema completeness:
seocore analyze schema-graph https://example.comExport as Mermaid diagram:
seocore analyze schema-graph https://example.com --format mermaidExport as JSON or HTML:
seocore analyze schema-graph https://example.com --format json
seocore analyze schema-graph https://example.com --format html --output schema-graph.htmlSchema Graph Flags: --format terminal|json|html|mermaid, -o <path>
Generate actionable internal linking recommendations with ranked source → target suggestions, orphan page detection, and hub identification:
seocore analyze link-plan https://example.comShow top N recommendations:
seocore analyze link-plan https://example.com --top 20Export as JSON:
seocore analyze link-plan https://example.com --format json --output link-plan.jsonExport as HTML report:
seocore analyze link-plan https://example.com --format html --output link-plan.htmlFull site crawl with high-confidence filter:
seocore analyze link-plan https://example.com --full --min-confidence 60Link Plan Flags:
--top <number>— Limit suggestions displayed--format terminal|json|html— Output format (default: terminal)-o, --output <path>— Export file path--full— Crawl entire site (100 pages, depth 5)-d, --depth <number>— Crawl depth limit (default: 3)-m, --max-pages <number>— Maximum pages to crawl (default: 50)--min-confidence <number>— Minimum confidence threshold 0-100 (default: 0)--max-suggestions-per-target <number>— Max suggestions per target page (default: 5)--verbose— Show additional diagnostic details (scores, signals)
Identify high-impact, page-level organic search opportunities ranked by deterministic business impact and ease of fix:
seocore analyze opportunities https://example.comShow only top opportunities:
seocore analyze opportunities https://example.com --top 25Show only medium/high opportunities:
seocore analyze opportunities https://example.com --min-priority mediumExport as JSON:
seocore analyze opportunities https://example.com --format json --output opportunities.jsonExport as HTML (with rich summary cards and action plan metrics):
seocore analyze opportunities https://example.com --format html --output opportunities.htmlEnrich with Google Search Console or CrUX field performance data:
seocore analyze opportunities https://example.com --with-gsc --gsc-file ./gsc-pages.json --with-crux --crux-file ./crux-pages.jsonRun deeper crawl with explicit limits:
seocore analyze opportunities https://example.com --full --depth 5 --max-pages 100Show verbose ranking inputs and loader warnings:
seocore analyze opportunities https://example.com --verboseOpportunities Flags:
-f, --format <terminal|json|html>: Output format (default: terminal)-o, --output <path>: Export file path--with-gsc: Include GSC metrics--gsc-file <path>: GSC JSON export file path--with-crux: Include CrUX performance metrics--crux-file <path>: CrUX JSON export file path--full: Crawl the entire site using the command's larger default budget-d, --depth <number>: Override crawl depth limit-m, --max-pages <number>: Override maximum crawled pages--top <n>: Limit shown/exported top items--min-priority <low|medium|high>: Filter minimum priority to display--verbose: Show full scoring inputs and warnings
Notes:
- Works without external providers using crawl heuristics only.
--with-gscand--with-cruximprove ranking quality but are optional.- If
--gsc-fileor--crux-fileis omitted, the command falls back to./gsc-pages.jsonand./crux-pages.json. - Output is site-level analysis with page-level prioritized actions, not a full enterprise audit replacement.
The inspect command has subcommands for individual checks:
- robots: Verify robots.txt access rules, exclusions, and sitemap references
seocore inspect robots https://example.com
```
- sitemap: Analyze sitemap.xml and verify all linked URLs are reachable
seocore inspect sitemap https://example.com --check-links
```
- llms-txt: Verify
llms.txtand/.well-known/llms.txtrules for AI crawlers like GPTBot, ClaudeBot, and PerplexityBot
seocore inspect llms-txt https://example.com
```
- schema: Validate Schema.org JSON-LD, Microdata, and RDFa structures
seocore inspect schema https://example.com
```
- hreflang: Validate a website's hreflang tags for bidirectional links, x-default consistency, and language code validity
seocore inspect hreflang https://example.com
```
- backlinks: Extract backlink profiles and analyze referring domain authority and spam scores
seocore inspect backlinks https://example.com
```
- keywords: Perform advanced SEO keyword intelligence, noise filtering, and topic clustering
seocore inspect keywords "behavioral health"
```
With deep expansions:
```bash
seocore inspect keywords "behavioral health" --expand
```
With noise filtering options:
```bash
seocore inspect keywords "behavioral health" --strict-noise-filter
```
- rank: Check if a target website ranks in Google's top 10 organic results for a given keyword
seocore inspect rank "seo crawler" https://example.com
```
- screenshot: Capture screenshots of a target page or entire website
seocore inspect screenshot https://example.com --breakpoints mobile,tablet,desktop
```
Compare SEO health scores, metadata differences, and performance metrics across two websites or audit files:
seocore compare https://site-a.com https://site-b.com --focus technicalAudit images on a single page or across the site for weight, format, delivery, CLS, LCP, alt text, caching, and broken URLs. See docs/commands/images.md for the full rule catalog.
Single page (default):
seocore images https://example.comFull site crawl (same origin, respects robots.txt; capped at ~100 pages and 500 unique images by default):
seocore images https://example.com --crawlPlaywright mode (rendered size, viewport, LCP element on the start URL):
seocore images https://example.com --playwrightSite crawl + Playwright + HTML report:
seocore images https://example.com --crawl --playwright -f html -o ./seocore-images-report.htmlJSON export with custom thresholds:
seocore images https://example.com --crawl --max-images 200 --threshold-kb 150 -f json -o ./images-audit.jsonFlags: --crawl, --playwright, --threshold-kb (default 100), --concurrency (default 10), --max-images (default 500), --user-agent, --timeout (default 30000ms), -f json|html, -o <path>.
Identify framework, CDN, hosting, CMS, libraries, analytics, fonts, and external APIs with confidence ratings:
seocore technology https://example.comShow underlying signature evidence lines and raw scores:
seocore technology https://example.com --verboseExport stack detection to structured JSON or standalone HTML:
seocore technology https://example.com --format html --output ./technology-report.htmlCompare raw source HTML against rendered DOM to see what JavaScript changes for crawlers. See docs/js-impact.md for command details and output reference.
seocore js-impact https://example.comUse safer wait modes for JS-heavy marketing sites that never go idle:
seocore js-impact https://example.com --wait-event load --timeout-ms 45000Export machine-readable JSON or shareable HTML:
seocore js-impact https://example.com --output json --output-file ./js-impact-report.json
seocore js-impact https://example.com --output html --output-file ./js-impact-report.htmlFlags: --wait-event load|domcontentloaded|networkidle, --timeout-ms, --wait-extra-ms, -o terminal|json|html|markdown, --output-file <path>.
Check whether a business appears on key local/business directories and whether the listing NAP matches the source website:
seocore directories https://example.comForce the multi-engine HTTP cascade search mode:
seocore directories https://example.com --provider cascadeUse live browser search when HTML search engines are blocked:
seocore directories https://example.com --provider playwright --showExport citation results as JSON:
seocore directories https://example.com --json --output ./directories-report.jsonSearch providers:
auto: UseSERPAPI_KEYif present, otherwise use the HTTP cascade and fall back to Playwright when needed.serpapi: Most reliable live-search mode whenSERPAPI_KEYis configured.cascade: HTTP-first search chain usingBing -> Brave -> Mojeek -> DuckDuckGo.duckduckgo: Force DuckDuckGo HTML search only.playwright: Browser-driven live search for sites that block HTML endpoints.
Flags: --provider auto|serpapi|cascade|duckduckgo|playwright, --show, --concurrency (default 4), --max-candidates (default 3), --json, -f terminal|json, -o <path>.
Typical statuses: Issues not found, Wrong Phone Number, Wrong Business Name, No Phone Number, Not Present, Search failed.
Import SEOCore directly into your Node/TypeScript backend:
import { SeoEngine } from '@seocore/engine';
import { EventBus, ExecutionTier } from '@seocore/sdk';
// Initialize the real-time event bus
const eventBus = new EventBus();
eventBus.on('page:loaded', (data) => {
console.log(`Crawled: ${data.url} | Status: ${data.statusCode}`);
});
// Run audit using a tier
const engine = new SeoEngine(eventBus);
const result = await engine.run(
'https://example.com',
{ /* optional overrides here */ },
ExecutionTier.STANDARD
);
console.log(`Overall Health Score: ${result.score}%`);Custom audits are defined via seocore.config.json or inline overrides.
| Option | Type | Default | Description |
|---|---|---|---|
tier |
"fast" | "standard" | "deep" | "enterprise" |
"standard" |
Execution tier driving crawl limits, rules, and scoring. Overrides preset. |
preset |
"quick" | "standard" | "deep" | "enterprise" |
"standard" |
Scrape profile adjusting page/depth depth limits (legacy, use tier). |
concurrency |
number |
5 |
Maximum simultaneous page crawl requests. |
maxDepth |
number |
3 |
Distance of steps allowed from seed landing URL. |
maxPages |
number |
100 |
Hard cap on total crawled pages per audit. |
rateLimitMs |
number |
100 |
Delay spacing between concurrent requests. |
retryCount |
number |
2 |
Number of crawl attempts on 5xx failures. |
playwrightEnabled |
boolean |
false |
Enable Playwright headless rendering for SPAs. |
excludePatterns |
string[] |
[] |
Glob/wildcard path list to bypass. |
includePatterns |
string[] |
[] |
Glob/wildcard path list restricted for crawling. |
ruleOverrides |
object |
{} |
Disable, override weight/severity/findings for rules. Supports findingSeverityOverrides. |
{
"preset": "standard",
"concurrency": 10,
"maxPages": 500,
"rateLimitMs": 50,
"excludePatterns": [
"/admin/*",
"*/checkout/*",
"*.pdf"
],
"includePatterns": [
"/blog/*",
"/products/*"
],
"ruleOverrides": {
"missing-meta-description": {
"severity": "error",
"weight": 8
},
"duplicate-h1": {
"enabled": false
},
"security-headers": {
"severity": "warning",
"findingSeverityOverrides": {
"security-headers:missing-csp": "error"
}
}
}
}We welcome community contributions! Please read our guidelines to get started:
- Fork the repo and create your branch from
main. - Ensure you have Node 20+ installed.
- Write clean, modular TypeScript following existing packages patterns.
- Run tests before submitting a pull request:
npm test - Submit detailed PR descriptions mapping features to technical specifications.
This project is licensed under the MIT License. See LICENSE for more details.