feat(seo-client): fan-out queries across markets using site region#1522
feat(seo-client): fan-out queries across markets using site region#1522
Conversation
The SEO provider requires a database (country) parameter on every call — there is no global mode. Previously all queries were hardcoded to US, yielding no data for non-US domains. This change makes the client locale-aware by accepting the site's region and fanning out queries across 12 major markets to build a global traffic picture. - Add fanOut() resilience primitive with batching and error logging - Add getDatabases(region) and BIG_MARKETS constant - Update getTopPages, getPaidPages, getMetrics, getOrganicTraffic to fan out and merge results across all databases - Refactor getBrokenBacklinks to use fanOut (was inline batching) - Add lastMonthISO() default date (provider has no current-month data) - getPaidPages top_keyword_country now reflects actual source market Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace awkward positional parameters with options bags for
getTopPages, getPaidPages, getMetrics, and getOrganicTraffic.
Callers no longer need to pass undefined placeholders to reach
later parameters like region.
Before: seoClient.getPaidPages(url, undefined, 200, 'prefix', { region })
After: seoClient.getPaidPages(url, { limit: 200, region })
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
This PR will trigger a minor release when merged. |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
herzog31
left a comment
There was a problem hiding this comment.
Good work overall — the fan-out approach is well-motivated and the fanOut abstraction is clean. A few issues to address before merging, the most important being the type declarations and the silent breaking change for positional callers.
src/index.d.ts is not updated (not in diff — needs a separate change)
The type declarations still reflect the old signatures and are wrong for all four refactored methods:
getTopPages(url: string, limit?: number)→ should be(url: string, options?: { limit?: number, region?: string })getOrganicTraffic(url: string, startDate: string, endDate: string)→ should be(url: string, options?: { startDate?: string, endDate?: string, region?: string })getPaidPages(url: string, date?: string, limit?: number, mode?: ...)→ should use options bag, dropmodegetMetrics(url: string, date?: string)→ should addregionBIG_MARKETS,getDatabases, andlastMonthISOare missing from the exports section
| } | ||
|
|
||
| async getMetrics(url, date = todayISO()) { | ||
| async getMetrics(url, { date = lastMonthISO(), region } = {}) { |
There was a problem hiding this comment.
Silent breaking change: callers still using the old positional signature getMetrics(url, '2025-03-01') will have their date silently ignored. JS will destructure the string '2025-03-01' — .date is undefined — so it falls back to lastMonthISO() with no error. Same issue applies to getPaidPages. Consider validating that the second argument is not a string/number (or document this as a breaking change and update consuming repos).
| * @returns {Promise<{result: {metrics: Array}, fullAuditRef: string}>} | ||
| */ | ||
| async getOrganicTraffic(url, startDate, endDate) { | ||
| async getOrganicTraffic(url, { startDate, endDate, region } = {}) { |
There was a problem hiding this comment.
The JSDoc above this method (lines 434–436) still documents startDate and endDate as separate positional @param entries. Update to reflect the options bag:
* @param {{ startDate?: string, endDate?: string, region?: string }} [options]Also note: if called without dates (e.g. from a caller that didn't migrate), startDate and endDate are both undefined. The filter isoDate < undefined is always false, so all historical data is returned silently. Worth a comment documenting this behaviour (or a guard).
| * @returns {Promise<Array<{key: string, value: T}>>} Fulfilled results | ||
| * @template T | ||
| */ | ||
| async fanOut(items, fn, operation) { |
There was a problem hiding this comment.
fanOut is a public instance method with JSDoc, and the tests call it directly on client. If it's intended as an internal implementation detail, a brief note (e.g. // Internal — not part of the public API) would prevent downstream callers from depending on it. If it's intentionally public, no change needed — just flagging the intent.
- Add runtime guard for old positional callers: passing a string/number as the second argument now throws instead of silently ignoring it - Validate startDate/endDate are required in getOrganicTraffic - Update JSDoc for getOrganicTraffic to reflect options bag - Mark fanOut as @Private (internal implementation detail) - Update index.d.ts: new options bag signatures for all four methods, add missing exports (BIG_MARKETS, getDatabases, lastMonthISO) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addressing review feedback (1965b0e)1. Silent breaking change for positional callers (@herzog31 on Added a runtime guard on all four refactored methods — passing a string/number as the second argument now throws immediately: if (typeof opts !== 'object' || opts === null) {
throw new Error('Second argument must be an options object, not a positional value');
}Old callers like 2.
3. Marked as 4. Updated all four method signatures to reflect options bags. Added missing exports: |
herzog31
left a comment
There was a problem hiding this comment.
All feedback addressed in the latest commit:
- Runtime guard added to all four methods — passing a positional string/number now throws instead of silently misbehaving
getOrganicTrafficnow validates thatstartDateandendDateare required- JSDoc updated to reflect the options bag
fanOutmarked@privateindex.d.tsfully updated: options bag signatures for all four methods, plusBIG_MARKETS,getDatabases,lastMonthISOexports added
Increases fan-out coverage from ~77% to ~92.8% of global organic traffic (measured against adobe.com) while keeping API unit cost at 30 per call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
eslint-config-helix 3.0.24 tightened curly from multi-line to all. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## [@adobe/mysticat-shared-seo-client-v1.2.0](https://github.com/adobe/spacecat-shared/compare/@adobe/mysticat-shared-seo-client-v1.1.3...@adobe/mysticat-shared-seo-client-v1.2.0) (2026-04-10) ### Features * **seo-client:** fan-out queries across markets using site region ([#1522](#1522)) ([8b2260c](8b2260c))
|
🎉 This PR is included in version @adobe/mysticat-shared-seo-client-v1.2.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Summary
The SEO provider requires a
database(country) parameter on every API call — unlike the previous provider (Ahrefs), there is no "global" mode. Until now, all queries were hardcoded todatabase=us, which meant non-US domains (e.g.www.deceuninck.es) returned little to no data.This PR makes the SEO client locale-aware by accepting the site's
region(fromSite.getRegion(), ISO 3166-1 alpha-2) and fanning out queries across multiple markets to build a global traffic picture.Why fan-out instead of single-region queries?
The site's region tells us where the site primarily operates, but organic traffic is not confined to a single country. A
.esdomain may rank ines,fr,it, andbrsimultaneously. Querying only the site's region would undercount traffic just as badly as querying onlyus.Smoke test data for
adobe.com(region=US):getTopPagestop trafficgetMetricsorg_trafficgetMetricsorg_keywordsgetPaidPagestop_keyword_countryUSUK,US,IN(actual source)For
www.deceuninck.es(region=ES), the previousdatabase=usreturned zero results. With fan-out, ES database returns 10 pages with traffic data, and the other markets gracefully return nothing.What changed
New:
fanOut(items, fn, operation)resilience primitiveA single batched fan-out method (batch size 10) that all multi-market methods now use — including
getBrokenBacklinks, which previously had its own inline batching loop. Each call tofn(item)already has per-request retry with exponential backoff viasendRawRequest;fanOutadds:Promise.allSettledto respect rate limitslog.warnfor items that fail after all retriesNew:
getDatabases(region)helperBuilds the query set:
BIG_MARKETS+ site region if not already present.BIG_MARKETS = ['us', 'uk', 'de', 'fr', 'es', 'it', 'br', 'ca', 'au', 'in', 'jp', 'nl']— 12 major SEO provider databases by search volume.If the site's region is already in
BIG_MARKETS(e.g.ES), no duplication. If not (e.g.CZ), it's appended as a 13th database.Refactored: positional params → options bags
All updated methods now use a clean options bag instead of positional parameters. Callers no longer need to pass
undefinedplaceholders to reach later parameters:getTopPages(url, { limit, region })getPaidPages(url, { date, limit, region })getMetrics(url, { date, region })getOrganicTraffic(url, { startDate, endDate, region })getBrokenBacklinksfanOut)getOrganicKeywordsMerge strategies
getTopPagessum_trafficper URL across DBs, first keyword winsgetPaidPagestop_keyword_countryreflects actual DBgetMetricsgetOrganicTrafficFixed:
lastMonthISO()default dategetMetricsandgetPaidPagespreviously defaulted totodayISO(), but the SEO provider publishes monthly snapshots with a delay — the current month has no data yet. Changed default tolastMonthISO()(1st of previous month) so callers without an explicit date get the most recent available data.How callers use the region
All methods are backward-compatible — the options bag is optional and defaults to querying only big markets.
Test plan
client.jsat 100% lines/statements/functions, 97.5% branchesadobe.com(US) andwww.deceuninck.es(ES)getMetrics/getPaidPagesreturn data without explicit date (lastMonthISO fix)top_keyword_countryingetPaidPagesreflects actual market, not hardcoded US🤖 Generated with Claude Code