Skip to content

feat(seo-client): fan-out queries across markets using site region#1522

Merged
ekremney merged 7 commits intomainfrom
feat/seo-client-fanout-markets
Apr 10, 2026
Merged

feat(seo-client): fan-out queries across markets using site region#1522
ekremney merged 7 commits intomainfrom
feat/seo-client-fanout-markets

Conversation

@ekremney
Copy link
Copy Markdown
Member

@ekremney ekremney commented Apr 8, 2026

Summary

The SEO provider requires a database (country) parameter on every API call — unlike the previous provider (Ahrefs), there is no "global" mode. Until now, all queries were hardcoded to database=us, which meant non-US domains (e.g. www.deceuninck.es) returned little to no data.

This PR makes the SEO client locale-aware by accepting the site's region (from Site.getRegion(), ISO 3166-1 alpha-2) and fanning out queries across multiple markets to build a global traffic picture.

Why fan-out instead of single-region queries?

The site's region tells us where the site primarily operates, but organic traffic is not confined to a single country. A .es domain may rank in es, fr, it, and br simultaneously. Querying only the site's region would undercount traffic just as badly as querying only us.

Smoke test data for adobe.com (region=US):

Method Single DB (us) Fan-out (12 DBs)
getTopPages top traffic 72,331,413 72,331,413
getMetrics org_traffic ~48M 153,060,534
getMetrics org_keywords ~11.5M 26,101,114
getPaidPages top_keyword_country always US UK, US, IN (actual source)

For www.deceuninck.es (region=ES), the previous database=us returned zero results. With fan-out, ES database returns 10 pages with traffic data, and the other markets gracefully return nothing.

What changed

New: fanOut(items, fn, operation) resilience primitive

A single batched fan-out method (batch size 10) that all multi-market methods now use — including getBrokenBacklinks, which previously had its own inline batching loop. Each call to fn(item) already has per-request retry with exponential backoff via sendRawRequest; fanOut adds:

  • Batched Promise.allSettled to respect rate limits
  • Consistent log.warn for items that fail after all retries
  • Fulfilled results collected with their key for downstream merge

New: getDatabases(region) helper

Builds the query set: BIG_MARKETS + site region if not already present.

BIG_MARKETS = ['us', 'uk', 'de', 'fr', 'es', 'it', 'br', 'ca', 'au', 'in', 'jp', 'nl'] — 12 major SEO provider databases by search volume.

If the site's region is already in BIG_MARKETS (e.g. ES), no duplication. If not (e.g. CZ), it's appended as a 13th database.

Refactored: positional params → options bags

All updated methods now use a clean options bag instead of positional parameters. Callers no longer need to pass undefined placeholders to reach later parameters:

Method Signature
getTopPages (url, { limit, region })
getPaidPages (url, { date, limit, region })
getMetrics (url, { date, region })
getOrganicTraffic (url, { startDate, endDate, region })
getBrokenBacklinks No signature change (refactored to use fanOut)
getOrganicKeywords No change needed (already uses options bag)

Merge strategies

Method Merge strategy
getTopPages Sum sum_traffic per URL across DBs, first keyword wins
getPaidPages Sum traffic per URL, top_keyword_country reflects actual DB
getMetrics Sum all numeric fields across DBs
getOrganicTraffic Group by date, sum all fields across DBs

Fixed: lastMonthISO() default date

getMetrics and getPaidPages previously defaulted to todayISO(), but the SEO provider publishes monthly snapshots with a delay — the current month has no data yet. Changed default to lastMonthISO() (1st of previous month) so callers without an explicit date get the most recent available data.

How callers use the region

const site = await dataAccess.getSiteById(siteId);
const region = site.getRegion(); // ISO 3166-1 alpha-2, e.g. 'ES', 'CZ', or null

const topPages = await seoClient.getTopPages(url, { limit: 200, region });
const metrics = await seoClient.getMetrics(url, { region });
const traffic = await seoClient.getOrganicTraffic(url, { startDate, endDate, region });
const paid = await seoClient.getPaidPages(url, { limit: 200, region });

All methods are backward-compatible — the options bag is optional and defaults to querying only big markets.

Test plan

  • Unit tests: 140 passing, client.js at 100% lines/statements/functions, 97.5% branches
  • Smoke tested against live API with adobe.com (US) and www.deceuninck.es (ES)
  • Verify getMetrics/getPaidPages return data without explicit date (lastMonthISO fix)
  • Verify non-US domain returns data (was zero before)
  • Verify top_keyword_country in getPaidPages reflects actual market, not hardcoded US

🤖 Generated with Claude Code

ekremney and others added 2 commits April 8, 2026 15:06
The SEO provider requires a database (country) parameter on every call —
there is no global mode. Previously all queries were hardcoded to US,
yielding no data for non-US domains. This change makes the client
locale-aware by accepting the site's region and fanning out queries
across 12 major markets to build a global traffic picture.

- Add fanOut() resilience primitive with batching and error logging
- Add getDatabases(region) and BIG_MARKETS constant
- Update getTopPages, getPaidPages, getMetrics, getOrganicTraffic
  to fan out and merge results across all databases
- Refactor getBrokenBacklinks to use fanOut (was inline batching)
- Add lastMonthISO() default date (provider has no current-month data)
- getPaidPages top_keyword_country now reflects actual source market

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace awkward positional parameters with options bags for
getTopPages, getPaidPages, getMetrics, and getOrganicTraffic.
Callers no longer need to pass undefined placeholders to reach
later parameters like region.

Before: seoClient.getPaidPages(url, undefined, 200, 'prefix', { region })
After:  seoClient.getPaidPages(url, { limit: 200, region })

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 8, 2026

This PR will trigger a minor release when merged.

@ekremney ekremney requested review from buuhuu and herzog31 April 8, 2026 13:19
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@herzog31 herzog31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work overall — the fan-out approach is well-motivated and the fanOut abstraction is clean. A few issues to address before merging, the most important being the type declarations and the silent breaking change for positional callers.

src/index.d.ts is not updated (not in diff — needs a separate change)

The type declarations still reflect the old signatures and are wrong for all four refactored methods:

  • getTopPages(url: string, limit?: number) → should be (url: string, options?: { limit?: number, region?: string })
  • getOrganicTraffic(url: string, startDate: string, endDate: string) → should be (url: string, options?: { startDate?: string, endDate?: string, region?: string })
  • getPaidPages(url: string, date?: string, limit?: number, mode?: ...) → should use options bag, drop mode
  • getMetrics(url: string, date?: string) → should add region
  • BIG_MARKETS, getDatabases, and lastMonthISO are missing from the exports section

}

async getMetrics(url, date = todayISO()) {
async getMetrics(url, { date = lastMonthISO(), region } = {}) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent breaking change: callers still using the old positional signature getMetrics(url, '2025-03-01') will have their date silently ignored. JS will destructure the string '2025-03-01'.date is undefined — so it falls back to lastMonthISO() with no error. Same issue applies to getPaidPages. Consider validating that the second argument is not a string/number (or document this as a breaking change and update consuming repos).

* @returns {Promise<{result: {metrics: Array}, fullAuditRef: string}>}
*/
async getOrganicTraffic(url, startDate, endDate) {
async getOrganicTraffic(url, { startDate, endDate, region } = {}) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSDoc above this method (lines 434–436) still documents startDate and endDate as separate positional @param entries. Update to reflect the options bag:

 * @param {{ startDate?: string, endDate?: string, region?: string }} [options]

Also note: if called without dates (e.g. from a caller that didn't migrate), startDate and endDate are both undefined. The filter isoDate < undefined is always false, so all historical data is returned silently. Worth a comment documenting this behaviour (or a guard).

* @returns {Promise<Array<{key: string, value: T}>>} Fulfilled results
* @template T
*/
async fanOut(items, fn, operation) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fanOut is a public instance method with JSDoc, and the tests call it directly on client. If it's intended as an internal implementation detail, a brief note (e.g. // Internal — not part of the public API) would prevent downstream callers from depending on it. If it's intentionally public, no change needed — just flagging the intent.

- Add runtime guard for old positional callers: passing a string/number
  as the second argument now throws instead of silently ignoring it
- Validate startDate/endDate are required in getOrganicTraffic
- Update JSDoc for getOrganicTraffic to reflect options bag
- Mark fanOut as @Private (internal implementation detail)
- Update index.d.ts: new options bag signatures for all four methods,
  add missing exports (BIG_MARKETS, getDatabases, lastMonthISO)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ekremney
Copy link
Copy Markdown
Member Author

ekremney commented Apr 8, 2026

Addressing review feedback (1965b0e)

1. Silent breaking change for positional callers (@herzog31 on getMetrics line 366)

Added a runtime guard on all four refactored methods — passing a string/number as the second argument now throws immediately:

if (typeof opts !== 'object' || opts === null) {
  throw new Error('Second argument must be an options object, not a positional value');
}

Old callers like getMetrics(url, '2025-03-01') will get a clear error instead of silently falling back. Tests added for all four methods.

2. getOrganicTraffic JSDoc + undefined dates (@herzog31 on line 439)

  • Updated JSDoc to document the options bag with @param {object} options
  • Added validation: startDate and endDate are now required — calling without them throws 'startDate and endDate are required'

3. fanOut public vs internal (@herzog31 on line 105)

Marked as @private in JSDoc. It's an internal implementation detail.

4. index.d.ts not updated (top-level review)

Updated all four method signatures to reflect options bags. Added missing exports: BIG_MARKETS, getDatabases, lastMonthISO.

Copy link
Copy Markdown
Member

@herzog31 herzog31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All feedback addressed in the latest commit:

  • Runtime guard added to all four methods — passing a positional string/number now throws instead of silently misbehaving
  • getOrganicTraffic now validates that startDate and endDate are required
  • JSDoc updated to reflect the options bag
  • fanOut marked @private
  • index.d.ts fully updated: options bag signatures for all four methods, plus BIG_MARKETS, getDatabases, lastMonthISO exports added

ekremney and others added 3 commits April 10, 2026 16:22
Increases fan-out coverage from ~77% to ~92.8% of global organic traffic
(measured against adobe.com) while keeping API unit cost at 30 per call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
eslint-config-helix 3.0.24 tightened curly from multi-line to all.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ekremney ekremney merged commit 8b2260c into main Apr 10, 2026
5 checks passed
@ekremney ekremney deleted the feat/seo-client-fanout-markets branch April 10, 2026 15:08
solaris007 pushed a commit that referenced this pull request Apr 10, 2026
## [@adobe/mysticat-shared-seo-client-v1.2.0](https://github.com/adobe/spacecat-shared/compare/@adobe/mysticat-shared-seo-client-v1.1.3...@adobe/mysticat-shared-seo-client-v1.2.0) (2026-04-10)

### Features

* **seo-client:** fan-out queries across markets using site region ([#1522](#1522)) ([8b2260c](8b2260c))
@solaris007
Copy link
Copy Markdown
Member

🎉 This PR is included in version @adobe/mysticat-shared-seo-client-v1.2.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants