refactor(fetch-all): reuse notion fetch pipeline by luandro · Pull Request #28 · digidem/comapeo-docs

luandro · 2025-09-25T19:03:03Z

User description

Summary

refactor fetch-all CLI to share notion fetch pipeline and centralized graceful shutdown helpers
remove bespoke export and preview pipeline in favor of shared generateBlocks logic
update fetchAll helpers to filter via runFetchPipeline and retain preview/analysis features

Testing

bunx eslint scripts/notion-fetch-all/index.ts scripts/notion-fetch-all/fetchAll.ts scripts/notion-fetch/runFetch.ts scripts/notion-fetch/index.ts
bunx vitest run scripts/notion-fetch-all/index.test.ts scripts/notion-fetch-all/fetchAll.test.ts

Fixes #27

PR Type

Enhancement, Tests, Bug fix

Description

Align fetch-all export with page scoping
Reuse shared generateBlocks pipeline
Harden image handling with cache/fallbacks
Add comprehensive CLI and pipeline tests

Diagram Walkthrough

flowchart LR
  fetchAll["fetch-all CLI"] -- "calls" --> runFetchPipeline["shared runFetch pipeline"]
  runFetchPipeline -- "uses" --> generateBlocks["generateBlocks (shared)"]
  generateBlocks -- "writes" --> docsOut["Markdown + images"]
  generateBlocks -- "updates" --> i18nCode["i18n code.json"]
  tests["New test suites"] -- "mock/verify" --> fetchAll

File Walkthrough

Relevant files

Enhancement

2 files

generateBlocks.ts `Robust image pipeline, locale scoping, safe frontmatter`	+1065/-289
index.ts `CLI uses shared runtime and pipeline with shutdown`	+265/-210

Refactor

1 files

generateBlocksForAll.ts `Delegate to shared generateBlocks implementation`	+1/-1081

Tests

1 files

notionFixtures.ts `Add reusable Notion mock fixtures for tests`	+377/-0

Additional files

19 files

TASK.md	+0/-31
code.json	+120/-0
code.json	+120/-0
lefthook.yml	+5/-0
package.json	+15/-11
fetchNotionData.ts	+80/-5
index.cli.test.ts	+156/-0
fetchAll.ts	+159/-134
downloadImage.test.ts	+500/-0
integration.test.ts	+131/-0
runFetchPipeline.test.ts	+367/-0
generateBlocks.test.ts	+464/-3
index.test.ts	+37/-51
index.ts	+50/-142
runFetch.ts	+99/-0
runtime.ts	+98/-0
helpers.ts	+39/-0
index.ts	+3/-0
mocks.ts	+75/-0

github-actions · 2025-09-25T19:04:48Z

PR Code Suggestions ✨

Latest suggestions up to acf5500

Previous suggestions

Suggestions up to commit 2043324

Category	Suggestion	Impact
Possible issue	Preserve results on anomaly retry On anomaly retry you discard the retried page results, potentially skipping data. Merge `retryResp.results` (deduping by id) before advancing the cursor to avoid data loss. scripts/fetchNotionData.ts [66-85] const anomaly = hasMore && (duplicateDetected \|\| !startCursor \|\| startCursor === prevCursor \|\| prevCount === 0); if (anomaly) { - // One retry attempt to recover from transient anomaly console.warn("Notion API pagination anomaly detected; retrying once..."); const retryResp = await enhancedNotion.databasesQuery({ database_id: DATABASE_ID, filter, start_cursor: prevCursor, page_size: 100, }); + const retryResults = Array.isArray(retryResp.results) ? retryResp.results : []; + for (const r of retryResults) { + const id = (r as any)?.id; + if (!id \|\| seenIds.has(id)) continue; + seenIds.add(id); + results.push(r); + } const retryCursor = retryResp.next_cursor ?? undefined; if (retryCursor && retryCursor !== prevCursor) { hasMore = Boolean(retryResp.has_more); startCursor = retryCursor; continue; } console.warn( "Anomaly persisted after retry; stopping early with partial results." ); break; } Suggestion importance[1-10]: 8 __ Why: Correctly points out that retry results are discarded during anomaly handling, risking data loss. Merging `retryResp.results` with deduplication aligns with the pagination robustness goal and provides a meaningful reliability improvement.	Medium
	Create missing i18n docs directories Ensure the i18n content directories exist for all non-default locales, otherwise writes later (e.g., to PATH from getI18NPath) can fail. Create those locale-specific doc paths at startup, mirroring the images/docs directory setup. scripts/notion-fetch/generateBlocks.ts [438-451] const getI18NPath = (locale: string) => path.join(I18N_PATH, locale, "docusaurus-plugin-content-docs", "current"); const locales = config.i18n.locales; const DEFAULT_LOCALE = config.i18n.defaultLocale; const LANGUAGE_NAME_TO_LOCALE: Record<string, string> = { English: "en", Spanish: "es", Portuguese: "pt", }; const FALLBACK_TITLE_PREFIX = "untitled"; // Ensure directories exist (preserve existing content) fs.mkdirSync(CONTENT_PATH, { recursive: true }); fs.mkdirSync(IMAGES_PATH, { recursive: true }); +for (const locale of locales.filter((l: string) => l !== DEFAULT_LOCALE)) { + fs.mkdirSync(getI18NPath(locale), { recursive: true }); +} Suggestion importance[1-10]: 7 __ Why: The code ensures CONTENT_PATH and IMAGES_PATH exist but not the i18n docs directories; later writes use getI18NPath which could fail. Adding directory creation for non-default locales is accurate and improves robustness, though not critical.	Medium
	Guard against double shutdown calls Protect against multiple shutdown invocations by de-duplicating exit flows. Wrap `gracefulShutdown` with a guard to avoid concurrent calls when both catch paths fire, preventing double cleanup or double exit. scripts/notion-fetch-all/index.ts [672-685] +let exiting = false; if (isDirectExec && process.env.NODE_ENV !== "test") { (async () => { try { await main(); } catch (error) { - console.error(chalk.red("❌ Fatal error:"), error); + if (!exiting) { + exiting = true; + console.error(chalk.red("❌ Fatal error:"), error); + await gracefulShutdown(1); + } + } + })().catch(async (err) => { + if (!exiting) { + exiting = true; + console.error(chalk.red("❌ Unhandled fatal error:"), err); await gracefulShutdown(1); } - })().catch(async (err) => { - console.error(chalk.red("❌ Unhandled fatal error:"), err); - await gracefulShutdown(1); }); } Suggestion importance[1-10]: 6 __ Why: The improved code adds an `exiting` guard around `gracefulShutdown` to avoid duplicate shutdowns from both promise paths. It aligns with the new runtime-based shutdown flow and reduces risk of double cleanup; useful but not critical.	Low
	Restore test-observable exit behavior Returning the shutdown result here changes previous behavior where tests expected an exception via mocked process.exit. To avoid silent continuation in non-test contexts and keep test contract, rethrow the exit code in test mode so callers can assert, while preserving real exit in production. scripts/notion-fetch/index.ts [37-51] if (!process.env.NOTION_API_KEY) { const msg = "Missing NOTION_API_KEY environment variable."; - // Keep concise output to avoid leaking sensitive context in logs console.error(chalk.bold.red(msg)); - return await gracefulShutdown(1); + const code = await gracefulShutdown(1); + if (process.env.NODE_ENV === "test") throw new Error(`Process exit called with code: ${code}`); + return code; } if (!resolvedDatabaseId) { console.error( chalk.bold.red( "Error: DATABASE_ID (or NOTION_DATABASE_ID) is not defined in the environment variables." ) ); - return await gracefulShutdown(1); + const code = await gracefulShutdown(1); + if (process.env.NODE_ENV === "test") throw new Error(`Process exit called with code: ${code}`); + return code; } Suggestion importance[1-10]: 2 __ Why: The PR intentionally changed shutdown semantics to return an exit code in tests via `gracefulShutdown`. Reintroducing throws contradicts the new tests expecting numeric returns, making this suggestion inaccurate in this context.	Low
General	Align hook runner and scope tests Use the same package manager as the project scripts to avoid environment mismatch and ensure non-interactive CI behavior with --run. Also, restrict the glob to test files to prevent unnecessary runs on non-test changes. lefthook.yml [26-29] - name: notion-fetch-tests - run: bunx vitest run scripts/notion-fetch/__tests__/ + run: npx vitest --run scripts/notion-fetch/__tests__/ stage_fixed: true - glob: "scripts/notion-fetch/*/.{ts,js}" + glob: "scripts/notion-fetch//__tests__//*.{ts,js}" Suggestion importance[1-10]: 7 __ Why: Switching from `bunx` to `npx` aligns with project scripts and `--run` makes CI non‑interactive; narrowing the glob to tests reduces unnecessary runs. The mapping to the new hunk is accurate (lines 26–29).	Medium
	Make vitest runs fail-safe Ensure vitest exits non-zero on failure in CI by adding the --run flag consistently and avoid path typos that could silently pass. Also, quote paths to guard against shell expansion issues on some environments. package.json [37-40] -"test:notion-fetch": "vitest run scripts/notion-fetch/__tests__/", -"test:notion-cli": "vitest run scripts/notion-fetch-all/__tests__/", -"test:notion-pipeline": "vitest run scripts/notion-fetch/__tests__/runFetchPipeline.test.ts", -"test:notion-image": "vitest run scripts/notion-fetch/__tests__/downloadImage.test.ts", +"test:notion-fetch": "vitest --run scripts/notion-fetch/__tests__/", +"test:notion-cli": "vitest --run scripts/notion-fetch-all/__tests__/", +"test:notion-pipeline": "vitest --run \"scripts/notion-fetch/__tests__/runFetchPipeline.test.ts\"", +"test:notion-image": "vitest --run \"scripts/notion-fetch/__tests__/downloadImage.test.ts\"", Suggestion importance[1-10]: 6 __ Why: Adding `--run` improves CI consistency and quoting specific test paths is harmless; however, vitest already exits non‑zero on failures in run mode and the change is minor. The existing_code lines map to new lines 37–40.	Low
	Reduce false positives in URL sanitation The regex for whitespace-in-URL can overmatch valid URLs with spaces encoded as %20 or with parentheses content. Tighten the pattern to ignore percent-encoded spaces and allow balanced parentheses sequences to reduce false positives. scripts/notion-fetch/generateBlocks.ts [249-280] function sanitizeMarkdownImages(content: string): string { if (!content \|\| content.indexOf("![") === -1) return content; - // Cap processing size to avoid ReDoS on pathological inputs const MAX_LEN = 2_000_000; const text = content.length > MAX_LEN ? content.slice(0, MAX_LEN) : content; let sanitized = text; - // Pattern 1: Completely empty URLs + // Empty URLs sanitized = sanitized.replace( /!\[([^\]])\]$\s$/g, "[Image: $1] (Image URL was empty)" ); - - // Pattern 2: Invalid literal placeholders + // Invalid placeholders sanitized = sanitized.replace( /!\[([^\]])\]$\s(?:undefined\|null)\s$/g, "[Image: $1]* (Image URL was invalid)" ); - - // Pattern 3: Unencoded whitespace inside URL (safe regex without nested greedy scans) - // Matches any whitespace in the URL excluding escaped closing parens and %20 + // URLs containing raw whitespace (exclude %20 and allow escaped parens) sanitized = sanitized.replace( - /!\[([^\]])\]$\s([^()\s][^()\r\n]?(?:\s+)[^()\r\n]?)\s$/g, - "[Image: $1]* (Image URL contained whitespace)" + /!\[([^\]])\]$\s((?:(?:\\$\|\\$\|[^()\s])\|\([^)]$)+)\s\)/g, + (m, alt, url) => { + // If url has any raw whitespace, flag it; ignore percent-encoded ones + return /\s/.test(url) ? `[Image: ${alt}] (Image URL contained whitespace)` : m; + } ); - // If we truncated, append a notice to avoid corrupting content silently if (text.length !== content.length) { return sanitized + "\n\n<!-- Content truncated for sanitation safety -->"; } return sanitized; } Suggestion importance[1-10]: 5 __ Why: The concern about overmatching whitespace is reasonable and the proposed change aims to be safer, but it introduces a more complex regex and runtime checking that may have edge cases; impact is moderate and correctness isn't fully verifiable from context.	Low
	Ensure sorting occurs post-transform The transform is applied before sorting and without honoring the caller’s requested `sortBy`/`sortDirection`. This can produce unexpected order in the result. Move sorting to after transformation (and any limiting) to guarantee consistent ordering. scripts/notion-fetch-all/fetchAll.ts [67-96] const { data: rawData = [], metrics } = await runFetchPipeline({ filter, fetchSpinnerText: fetchSpinnerText ?? "Fetching ALL pages from Notion (excluding removed items by default)...", generateSpinnerText: generateSpinnerText ?? "Exporting pages to markdown files", transform: (pages) => { try { fetchedCount = Array.isArray(pages) ? pages.length : 0; + // Only filter/limit here; sorting is handled after transform const transformed = applyFetchAllTransform( Array.isArray(pages) ? pages : [], { statusFilter, maxPages, includeRemoved, } ); return Array.isArray(transformed) ? transformed : []; } catch (e) { console.warn( "fetchAll transform failed, using untransformed data:", (e as Error)?.message ?? e ); return Array.isArray(pages) ? pages : []; } }, onProgress: progressLogger, shouldGenerate: exportFiles, }); +// ... later +const pages = defensivelyFiltered.map((page) => transformPage(page)); +const sortedPages = sortPages(pages, sortBy, sortDirection); Suggestion importance[1-10]: 5 __ Why: The suggestion highlights ordering concerns but the current code already sorts after transforming raw pages (`sortPages` after `transformPage`). The advice is reasonable but mostly reiterates existing behavior; impact is moderate at best.	Low

Suggestions up to commit 23b60c9

Category	Suggestion	Impact
Possible issue	Use recorded match indices for replacements Using `indexOf(original)` after prior mutations can match the wrong occurrence when identical markdown appears multiple times. Compute and record match indices from the initial scan (`imgRegex.exec`) and use those for replacements to avoid corrupt or misplaced substitutions. scripts/notion-fetch/generateBlocks.ts [1205-1374] const imgRegex = /!\[([^\]])\]$\s((?:\\$\|[^)])+?)\s*\)/g; const imageMatches: Array<{ full: string; url: string; alt: string; idx: number; + start: number; + end: number; }> = []; +let m: RegExpExecArray \| null; +let tmpIndex = 0; +let safetyCounter = 0; +const SAFETY_LIMIT = 500; + +while ((m = imgRegex.exec(sourceMarkdown)) !== null) { + if (++safetyCounter > SAFETY_LIMIT) break; + const start = m.index; + const full = m[0]; + const end = start + full.length; + const rawUrl = m[2]; + const unescapedUrl = rawUrl.replace(/\\\)/g, ")"); + imageMatches.push({ + full, + url: unescapedUrl, + alt: m[1], + idx: tmpIndex++, + start, + end, + }); +} ... -const idx = markdownString.parent.indexOf(original); -if (idx === -1) continue; -const end = idx + original.length; +const indexedReplacements: Array<{ start: number; end: number; text: string }> = []; +for (const result of imageResults) { + if (result.status !== "fulfilled") { totalFailures++; continue; } + const processResult = result.value; + const match = imageMatches.find(im => im.idx === processResult.index); + if (!match) continue; + let replacementText: string; + if (processResult.success && processResult.newPath) { + replacementText = match.full.replace(processResult.imageUrl!, processResult.newPath); + totalSaved += processResult.savedBytes \|\| 0; + successfulImages++; + } else { + replacementText = createFallbackImageMarkdown(match.full, match.url, match.idx); + totalFailures++; + } + indexedReplacements.push({ start: match.start, end: match.end, text: replacementText }); +} +indexedReplacements.sort((a, b) => b.start - a.start); +let processedMarkdown = markdownString.parent; +for (const rep of indexedReplacements) { + processedMarkdown = processedMarkdown.slice(0, rep.start) + rep.text + processedMarkdown.slice(rep.end); +} Suggestion importance[1-10]: 8 __ Why: The PR already collects matches then later uses `indexOf(original)` on potentially mutated content, which can be incorrect for duplicate patterns. Recording start/end indices during the initial scan and using them for replacements improves correctness and robustness.	Medium
Possible issue	Robust spinner lifecycle guarding Ensure spinners are always stopped even if ora throws or spinners aren't started, to avoid hanging TTY state. Guard spinner starts and stops and null-check unregister functions to prevent runtime errors in non-TTY/test environments. scripts/notion-fetch/runFetch.ts [40-96] -const fetchSpinner = ora(fetchSpinnerText).start(); -const unregisterFetchSpinner = trackSpinner(fetchSpinner); +const fetchSpinner = ora(fetchSpinnerText); +let unregisterFetchSpinner: (() => void) \| undefined; +try { + fetchSpinner.start(); + unregisterFetchSpinner = trackSpinner(fetchSpinner); -try { let data = await fetchNotionData(filter); data = Array.isArray(data) ? data : []; data = await sortAndExpandNotionData(data); data = Array.isArray(data) ? data : []; if (transform) { const transformed = await transform(data); data = Array.isArray(transformed) ? transformed : []; } if (fetchSpinner.isSpinning) { fetchSpinner.succeed(chalk.green("Data fetched successfully")); } if (!shouldGenerate) { return { data }; } - const generateSpinner = ora(generateSpinnerText).start(); - const unregisterGenerateSpinner = trackSpinner(generateSpinner); + const generateSpinner = ora(generateSpinnerText); + let unregisterGenerateSpinner: (() => void) \| undefined; + try { + generateSpinner.start(); + unregisterGenerateSpinner = trackSpinner(generateSpinner); - try { const metrics = await generateBlocks(data, (progress) => { if (generateSpinner.isSpinning) { generateSpinner.text = chalk.blue( `${generateSpinnerText}: ${progress.current}/${progress.total}` ); } onProgress?.(progress); }); if (generateSpinner.isSpinning) { generateSpinner.succeed(chalk.green("Blocks generated successfully")); } return { data, metrics }; } catch (error) { if (generateSpinner.isSpinning) { generateSpinner.fail(chalk.red("Failed to generate blocks")); } throw error; } finally { - unregisterGenerateSpinner(); + unregisterGenerateSpinner?.(); } } catch (error) { if (fetchSpinner.isSpinning) { fetchSpinner.fail(chalk.red("Failed to fetch data from Notion")); } throw error; } finally { - unregisterFetchSpinner(); + unregisterFetchSpinner?.(); } Suggestion importance[1-10]: 6 __ Why: The change defensively guards spinner start/stop and unregister callbacks, which can help avoid edge-case errors in non-TTY/test environments. It's a moderate robustness improvement; existing code already tracks and stops spinners reliably, so impact is limited.	Low
Security	Sanitize cached image filenames Avoid caching paths with directory traversal or unexpected separators. Sanitize `localPath` before saving and when resolving on disk to prevent reading/writing outside `IMAGES_PATH` if a malicious URL produces a crafted filename. scripts/notion-fetch/generateBlocks.ts [294-361] -class ImageCache { - private cacheFile: string; - private cache: Map<string, ImageCacheEntry>; +private getAbsoluteImagePath(fileNameOrWebPath: string): string { + const baseName = path.basename(fileNameOrWebPath \|\| ""); + // reject suspicious names + if (!baseName \|\| baseName.includes("..") \|\| baseName.includes(path.sep)) { + return path.join(IMAGES_PATH, "_invalid-image-name_"); + } + return path.join(IMAGES_PATH, baseName); +} - constructor() { - this.cacheFile = path.join(process.cwd(), "image-cache.json"); - this.cache = new Map(); - this.loadCache(); - } -... - private getAbsoluteImagePath(fileNameOrWebPath: string): string { - const baseName = path.basename(fileNameOrWebPath); - return path.join(IMAGES_PATH, baseName); - } -... - set(url: string, localPath: string, blockName: string): void { - // store only the basename to avoid mixing web and fs paths - const entry: ImageCacheEntry = { - url, - localPath: path.basename(localPath), - timestamp: new Date().toISOString(), - blockName, - }; - this.cache.set(url, entry); - this.saveCache(); - } +set(url: string, localPath: string, blockName: string): void { + const safeBase = path.basename(localPath \|\| ""); + const entry: ImageCacheEntry = { + url, + localPath: safeBase, + timestamp: new Date().toISOString(), + blockName, + }; + this.cache.set(url, entry); + this.saveCache(); +} Suggestion importance[1-10]: 7 __ Why: The cache stores basenames and resolves paths under `IMAGES_PATH`; adding explicit basename/invalid-name checks marginally hardens against malformed inputs without changing behavior. It's a reasonable security hardening with low risk but not critical.	Medium
Security	Harden output path containment check The relative path check can be bypassed with path tricks on some platforms. Normalize and ensure the resolved output stays within the project root using `path.resolve(projectRoot, rel)` comparison, and reject paths containing null bytes or control chars before writing. scripts/notion-fetch-all/index.ts [428-447] const candidatePath = path.resolve(outputFile); const projectRoot = path.resolve(process.cwd()); const rel = path.relative(projectRoot, candidatePath); -if (rel.startsWith("..") \|\| path.isAbsolute(rel)) { - throw new Error( - `Refusing to write outside project directory: ${candidatePath}` - ); +// basic invalid chars check +if (/\0/.test(candidatePath)) { + throw new Error("Invalid output path"); } -outputPath = candidatePath; +const safeResolved = path.resolve(projectRoot, rel); +if ( + !safeResolved.startsWith(projectRoot + path.sep) && + safeResolved !== projectRoot // allow writing exactly in root +) { + throw new Error(`Refusing to write outside project directory: ${candidatePath}`); +} +outputPath = safeResolved; const outDir = path.dirname(outputPath); fs.mkdirSync(outDir, { recursive: true }); fs.writeFileSync(outputPath, outputContent, "utf8"); Suggestion importance[1-10]: 6 __ Why: The PR already prevents writing outside the project via a relative check; normalizing and validating the resolved path adds defense-in-depth. It's a valid security improvement but incremental rather than fixing a demonstrated bug.	Low
General	Add retry for pagination anomalies Avoid silently truncating data on transient pagination anomalies. Instead of breaking early, attempt one retry page fetch, then continue if cursor advances; only then break to prevent infinite loops. scripts/fetchNotionData.ts [20-71] while (hasMore) { if (++safetyCounter > MAX_PAGES) { - console.warn( - "Pagination safety limit exceeded; returning partial results." - ); + console.warn("Pagination safety limit exceeded; returning partial results."); break; } const response = await enhancedNotion.databasesQuery({ database_id: DATABASE_ID, filter, start_cursor: startCursor, page_size: 100, }); const pageResults = Array.isArray(response.results) ? response.results : []; - // Detect duplicate IDs to avoid stalling and data corruption let duplicateDetected = false; for (const r of pageResults) { const id = (r as any)?.id; if (id && seenIds.has(id)) { duplicateDetected = true; break; } if (id) seenIds.add(id); } results.push(...pageResults); const prevCursor = startCursor; const prevCount = pageResults.length; hasMore = Boolean(response.has_more); startCursor = response.next_cursor ?? undefined; - if ( - hasMore && - (duplicateDetected \|\| - !startCursor \|\| - startCursor === prevCursor \|\| - prevCount === 0) - ) { - console.warn( - "Notion API pagination anomaly detected; stopping early with partial results." - ); + const anomaly = hasMore && (duplicateDetected \|\| !startCursor \|\| startCursor === prevCursor \|\| prevCount === 0); + if (anomaly) { + // One retry attempt to recover from transient anomaly + console.warn("Notion API pagination anomaly detected; retrying once..."); + const retryResp = await enhancedNotion.databasesQuery({ + database_id: DATABASE_ID, + filter, + start_cursor: prevCursor, + page_size: 100, + }); + const retryCursor = retryResp.next_cursor ?? undefined; + if (retryCursor && retryCursor !== prevCursor) { + hasMore = Boolean(retryResp.has_more); + startCursor = retryCursor; + continue; + } + console.warn("Anomaly persisted after retry; stopping early with partial results."); break; } } Suggestion importance[1-10]: 7 __ Why: Introducing a single retry on cursor anomalies can prevent premature truncation of results, improving resilience without risking infinite loops. It aligns with the current pagination loop and safety checks; moderate but meaningful reliability gain.	Medium
	Harden database ID validation Also validate that `resolvedDatabaseId` is a non-empty string without whitespace to avoid passing an invalid ID downstream. Trim and check length before proceeding to prevent subtle runtime failures. scripts/notion-fetch/index.ts [37-51] if (!process.env.NOTION_API_KEY) { const msg = "Missing NOTION_API_KEY environment variable."; - // Keep concise output to avoid leaking sensitive context in logs console.error(chalk.bold.red(msg)); return await gracefulShutdown(1); } -if (!resolvedDatabaseId) { +const dbId = typeof resolvedDatabaseId === "string" ? resolvedDatabaseId.trim() : ""; +if (!dbId) { console.error( chalk.bold.red( "Error: DATABASE_ID (or NOTION_DATABASE_ID) is not defined in the environment variables." ) ); return await gracefulShutdown(1); } Suggestion importance[1-10]: 5 __ Why: Trimming and validating the database ID avoids passing whitespace-only values, a minor correctness improvement. The existing check already handles undefined; this refines validation with low risk and modest impact.	Low
	Ensure correct hook execution context Use the correct Lefthook key `stage_fixed` only if supported; otherwise use `stage_fixed: true` with `run_from_repo_root: true` to ensure paths resolve. Also quote the glob consistently and ensure the test command exits non-zero on failures. lefthook.yml [26-29] - name: notion-fetch-tests run: bunx vitest run scripts/notion-fetch/__tests__/ + run_from_repo_root: true stage_fixed: true - glob: "scripts/notion-fetch/*/.{ts,js}" + glob: "/scripts/notion-fetch//.{ts,js}" Suggestion importance[1-10]: 3 __ Why: The existing_code matches the new hunk lines, but the proposed changes (adding `run_from_repo_root` and altering the glob to `"*/scripts..."`) are speculative and may not be necessary; they could even broaden the scope unintentionally. The advice about ensuring non-zero exits is generic and not directly implemented.	Low

Suggestions up to commit 8a93e0c

Category	Suggestion	Impact
Possible issue	Fix output path safety check This path check can misclassify root-level writes on Windows or if output equals the cwd. Normalize paths and use a safer containment test to avoid false positives and path traversal. Also ensure the parent directory exists before writing. scripts/notion-fetch-all/index.ts [427-438] const candidatePath = path.resolve(outputFile); -const cwd = process.cwd(); -if (!candidatePath.startsWith(path.resolve(cwd) + path.sep)) { +const projectRoot = path.resolve(process.cwd()); +const rel = path.relative(projectRoot, candidatePath); +if (rel.startsWith("..") \|\| path.isAbsolute(rel)) { throw new Error( `Refusing to write outside project directory: ${candidatePath}` ); } outputPath = candidatePath; -// eslint-disable-next-line security/detect-non-literal-fs-filename +const outDir = path.dirname(outputPath); +fs.mkdirSync(outDir, { recursive: true }); fs.writeFileSync(outputPath, outputContent, "utf8"); Suggestion importance[1-10]: 8 __ Why: Using `path.relative` for containment and creating parent directories avoids false positives and ensures safe writes across platforms. This directly fixes a potential path check bug, with higher impact on correctness and security.	Medium
General	Sanitize translation keys and values Add a filesystem-safe guard for `original` keys to prevent huge or unsafe keys from bloating or breaking `code.json`. Also ensure only string keys are stored and trim excessively long strings. This hardens against malformed Notion content and reduces the risk of corrupted translation files. scripts/notion-fetch/generateBlocks.ts [911-946] function setTranslationString( lang: string, original: string, translated: string ) { const lPath = path.join(I18N_PATH, lang, "code.json"); const dir = path.dirname(lPath); - // ensure directory exists fs.mkdirSync(dir, { recursive: true }); let fileContents = "{}"; try { const existing = fs.readFileSync(lPath, "utf8"); if (typeof existing === "string" && existing.trim().length > 0) { fileContents = existing; } } catch { console.warn( chalk.yellow( `Translation file missing for ${lang}, creating a new one at ${lPath}` ) ); } let file: Record<string, any>; try { file = JSON.parse(fileContents); } catch (parseError) { console.warn( chalk.yellow( `Failed to parse translation file for ${lang}, resetting content` ), parseError ); file = {}; } - file[original] = { message: translated }; + + const safeKey = + typeof original === "string" + ? original.slice(0, 2000) + : String(original).slice(0, 2000); + const safeMessage = + typeof translated === "string" + ? translated.slice(0, 5000) + : String(translated).slice(0, 5000); + + file[safeKey] = { message: safeMessage }; fs.writeFileSync(lPath, JSON.stringify(file, null, 4)); } Suggestion importance[1-10]: 7 __ Why: The guard against excessively long or non-string keys/values is sensible and reduces risk of corrupting `code.json`. It's a moderate robustness improvement aligned with the new function. Not critical, but useful and accurate to the PR context.	Medium
	Harden image failure logging Ensure the log directory exists and guard against non-serializable entries to avoid write failures. Also bound the size of individual log entries to keep the log file performant and prevent accidental secrets leakage. scripts/notion-fetch/generateBlocks.ts [168-223] async function logImageFailure(logEntry: any): Promise<void> { const logPath = path.join(process.cwd(), "image-failures.json"); + const logDir = path.dirname(logPath); const tmpPath = `${logPath}.tmp`; const MAX_ENTRIES = 5000; + const MAX_FIELD_LEN = 2000; + + const safeEntry = (() => { + try { + const clone: Record<string, unknown> = {}; + for (const [k, v] of Object.entries(logEntry ?? {})) { + let val = v; + if (typeof val === "string") val = val.slice(0, MAX_FIELD_LEN); + else if (typeof val === "object") val = JSON.parse(JSON.stringify(val)); + clone[k] = val; + } + return clone; + } catch { + return { message: "non-serializable log entry" }; + } + })(); imageLogWriting = imageLogWriting .then(async () => { + try { + fs.mkdirSync(logDir, { recursive: true }); + } catch { + // ignore + } + let existingLogs: any[] = []; try { if (fs.existsSync(logPath)) { const content = fs.readFileSync(logPath, "utf-8"); const parsed = JSON.parse(content); existingLogs = Array.isArray(parsed) ? parsed : []; } } catch { existingLogs = []; } - existingLogs.push(logEntry); - // Prevent unbounded growth + existingLogs.push(safeEntry); if (existingLogs.length > MAX_ENTRIES) { existingLogs = existingLogs.slice(-MAX_ENTRIES); } const payload = JSON.stringify(existingLogs, null, 2); try { fs.writeFileSync(tmpPath, payload); fs.renameSync(tmpPath, logPath); } catch { console.warn( chalk.yellow("Failed to write image failure log atomically") ); try { fs.writeFileSync(logPath, payload); } catch { console.warn(chalk.yellow("Failed to write image failure log")); } } finally { try { if (fs.existsSync(tmpPath)) fs.unlinkSync(tmpPath); } catch { // best-effort cleanup } } return undefined; }) .catch((e) => { - // Reset mutex on failure to avoid blocking future writes console.warn( chalk.yellow("Image failure log write error; resetting queue"), e ); return undefined; }); await imageLogWriting; } Suggestion importance[1-10]: 7 __ Why: Ensuring the log directory exists and serializing/trimming entries strengthens reliability without changing behavior. It fits the added logging and is correctly mapped. Impact is moderate on stability.	Medium
	Honor statusFilter in final guard Ensure the final defensive filter respects `statusFilter` consistently, not just `includeRemoved`. Without this, `statusFilter` may be bypassed if upstream changes. Apply both filters before transforming to `PageWithStatus`. scripts/notion-fetch-all/fetchAll.ts [98-105] const { data: rawData = [], metrics } = await runFetchPipeline({ filter, fetchSpinnerText: fetchSpinnerText ?? "Fetching ALL pages from Notion (excluding removed items by default)...", generateSpinnerText: generateSpinnerText ?? "Exporting pages to markdown files", transform: (pages) => { try { fetchedCount = Array.isArray(pages) ? pages.length : 0; const transformed = applyFetchAllTransform( Array.isArray(pages) ? pages : [], { statusFilter, maxPages, includeRemoved, } ); return Array.isArray(transformed) ? transformed : []; } catch (e) { console.warn( "fetchAll transform failed, using untransformed data:", (e as Error)?.message ?? e ); return Array.isArray(pages) ? pages : []; } }, onProgress: progressLogger, shouldGenerate: exportFiles, }); +// Apply defensive filters for both removal and explicit status +const defensivelyFiltered = rawData.filter((p) => { + const status = getStatusFromRawPage(p); + if (!includeRemoved && status === "Remove") return false; + if (statusFilter && status !== statusFilter) return false; + return true; +}); + +const pages = defensivelyFiltered.map((page) => transformPage(page)); + Suggestion importance[1-10]: 7 __ Why: Applying `statusFilter` in the final defensive filter is accurate and useful to ensure consistency if upstream transform is bypassed. It’s a reasonable enhancement with moderate impact on correctness.	Medium
	Prevent double-ending spinners Avoid calling both `fail/succeed` and `stop` on the same spinner to prevent double-ending and potential inconsistent states. Only stop the spinner if it is still spinning, and do not call `stop` after `succeed`/`fail`. scripts/notion-fetch/runFetch.ts [40-55] const fetchSpinner = ora(fetchSpinnerText).start(); const unregisterFetchSpinner = trackSpinner(fetchSpinner); -... +try { + let data = await fetchNotionData(filter); + data = Array.isArray(data) ? data : []; + data = await sortAndExpandNotionData(data); + data = Array.isArray(data) ? data : []; + if (transform) { + const transformed = await transform(data); + data = Array.isArray(transformed) ? transformed : []; + } + if (fetchSpinner.isSpinning) { + fetchSpinner.succeed(chalk.green("Data fetched successfully")); + } + if (!shouldGenerate) { + return { data }; + } + const generateSpinner = ora(generateSpinnerText).start(); + const unregisterGenerateSpinner = trackSpinner(generateSpinner); + try { + const metrics = await generateBlocks(data, (progress) => { + if (generateSpinner.isSpinning) { + generateSpinner.text = chalk.blue( + `${generateSpinnerText}: ${progress.current}/${progress.total}` + ); + } + onProgress?.(progress); + }); + if (generateSpinner.isSpinning) { + generateSpinner.succeed(chalk.green("Blocks generated successfully")); + } + return { data, metrics }; + } catch (error) { + if (generateSpinner.isSpinning) { + generateSpinner.fail(chalk.red("Failed to generate blocks")); + } + throw error; + } finally { + unregisterGenerateSpinner(); + } } catch (error) { - fetchSpinner.fail(chalk.red("Failed to fetch data from Notion")); + if (fetchSpinner.isSpinning) { + fetchSpinner.fail(chalk.red("Failed to fetch data from Notion")); + } throw error; } finally { - if (fetchSpinner.isSpinning) fetchSpinner.stop(); unregisterFetchSpinner(); } Suggestion importance[1-10]: 6 __ Why: The suggestion correctly notes not to call `stop` after `succeed`/`fail` to avoid double-ending spinners, which aligns with the PR’s spinner handling pattern. It’s a maintainability/robustness improvement, not critical to functionality.	Low
	Avoid conflicting file filters Using both `files` and `glob` can cause Lefthook to ignore the `glob` filter, running tests when unrelated files change. Constrain the hook to staged changes within the intended path by removing `files` and enabling `stage_fixed` to pass only staged files. This prevents unnecessary test runs and speeds up commits. lefthook.yml [26-29] - name: notion-fetch-tests run: bunx vitest run scripts/notion-fetch/__tests__/ - files: git ls-files -m + stage_fixed: true glob: "scripts/notion-fetch/*/.{ts,js}" Suggestion importance[1-10]: 6 __ Why: The suggestion correctly targets the new hook lines (26-29) and highlights a plausible Lefthook config conflict where `files` may override `glob`, causing unnecessary runs. Replacing `files` with `stage_fixed: true` is a reasonable improvement for performance, though impact is moderate and depends on project workflow.	Low
	Improve pagination anomaly diagnostics If an anomaly triggers early break, attach context (counts and last cursor) to the warning to aid debugging. This helps identify data loss or API issues without changing logic. scripts/fetchNotionData.ts [21-71] while (hasMore) { if (++safetyCounter > MAX_PAGES) { console.warn( - "Pagination safety limit exceeded; returning partial results." + `Pagination safety limit exceeded at ${safetyCounter} pages; returning ${results.length} items so far.` ); break; } const response = await enhancedNotion.databasesQuery({ database_id: DATABASE_ID, filter, start_cursor: startCursor, page_size: 100, }); const pageResults = Array.isArray(response.results) ? response.results : []; - // Detect duplicate IDs to avoid stalling and data corruption let duplicateDetected = false; for (const r of pageResults) { const id = (r as any)?.id; if (id && seenIds.has(id)) { duplicateDetected = true; break; } if (id) seenIds.add(id); } results.push(...pageResults); const prevCursor = startCursor; const prevCount = pageResults.length; hasMore = Boolean(response.has_more); startCursor = response.next_cursor ?? undefined; if ( hasMore && (duplicateDetected \|\| !startCursor \|\| startCursor === prevCursor \|\| prevCount === 0) ) { console.warn( - "Notion API pagination anomaly detected; stopping early with partial results." + `Notion API pagination anomaly detected (dup=${duplicateDetected}, prevCount=${prevCount}, prevCursor=${prevCursor}, nextCursor=${startCursor}); returning ${results.length} items so far.` ); break; } } Suggestion importance[1-10]: 5 __ Why: Adding more detailed context to warnings is correct and can aid debugging without changing behavior. It’s helpful but minor, focusing on observability rather than core functionality.	Low

Suggestions up to commit 8a93e0c

Category	Suggestion	Impact
Possible issue	Use regex match indices for replacements Using indexOf on the original markdown later can target the wrong occurrence for duplicate matches, causing mis-replacements. Capture match indices from regex execution and use them to apply replacements deterministically. scripts/notion-fetch/generateBlocks.ts [1174-1204] const imgRegex = /!\[([^\]])\]$\s((?:\\$\|[^)])+?)\s*\)/g; -... +const imageMatches: Array<{ full: string; url: string; alt: string; idx: number; start: number; end: number }> = []; +let m: RegExpExecArray \| null; +let tmpIndex = 0; +let safetyCounter = 0; +const SAFETY_LIMIT = 500; while ((m = imgRegex.exec(sourceMarkdown)) !== null) { if (++safetyCounter > SAFETY_LIMIT) { - console.warn( - chalk.yellow( - `⚠️ Image match limit (${SAFETY_LIMIT}) reached; skipping remaining.` - ) - ); + console.warn(chalk.yellow(`⚠️ Image match limit (${SAFETY_LIMIT}) reached; skipping remaining.`)); break; } + const start = m.index; + const full = m[0]; + const end = start + full.length; const rawUrl = m[2]; const unescapedUrl = rawUrl.replace(/\\\)/g, ")"); imageMatches.push({ - full: m[0], + full, url: unescapedUrl, alt: m[1], idx: tmpIndex++, + start, + end, }); } +... +// When building replacements +for (const match of imageMatches) { + // compute replacementText for this match... + indexedReplacements.push({ start: match.start, end: match.end, text: replacementText }); +} Suggestion importance[1-10]: 8 __ Why: The PR later uses `indexOf` on the whole markdown which can mis-target duplicates; capturing `start/end` from regex and using them for replacements is a robust fix that prevents incorrect substitutions and aligns with the current image-processing flow.	Medium
	Prevent premature pagination termination Avoid stopping early solely due to duplicate IDs within a page; this can occur legitimately across pages. Instead, skip duplicates when pushing into `results` and only early-exit on robust anomalies (no `next_cursor` with `has_more` or zero results repeatedly). This prevents silent data loss. scripts/fetchNotionData.ts [37-71] const pageResults = Array.isArray(response.results) ? response.results : []; -// Detect duplicate IDs to avoid stalling and data corruption -let duplicateDetected = false; +// Append only unseen items; duplicates are skipped but do not trigger early stop for (const r of pageResults) { const id = (r as any)?.id; - if (id && seenIds.has(id)) { - duplicateDetected = true; - break; + if (id) { + if (seenIds.has(id)) continue; + seenIds.add(id); } - if (id) seenIds.add(id); + results.push(r as Record<string, unknown>); } -results.push(...pageResults); - const prevCursor = startCursor; -const prevCount = pageResults.length; +const gotCount = pageResults.length; hasMore = Boolean(response.has_more); startCursor = response.next_cursor ?? undefined; +// Only treat as anomaly when Notion signals more but gives no cursor or no items, +// or when cursor doesn't advance across iterations. if ( hasMore && - (duplicateDetected \|\| - !startCursor \|\| - startCursor === prevCursor \|\| - prevCount === 0) + (!startCursor \|\| startCursor === prevCursor \|\| gotCount === 0) ) { console.warn( "Notion API pagination anomaly detected; stopping early with partial results." ); break; } Suggestion importance[1-10]: 8 __ Why: Skipping duplicates instead of early exit avoids silent data loss and makes pagination more robust; this directly improves correctness in the new pagination logic.	Medium
	Consolidate filtering into API request Avoid using the upstream `filter` when `statusFilter` is also provided, as the transform will then apply a second filter that can over‑constrain results. Instead, build the Notion API filter to reflect both `includeRemoved` and `statusFilter`, and keep the transform focused on slicing (e.g., `maxPages`). This prevents silently dropping pages due to double filtering. scripts/notion-fetch-all/fetchAll.ts [67-96] +const notionFilter = buildStatusFilter(includeRemoved, statusFilter); const { data: rawData = [], metrics } = await runFetchPipeline({ - filter, + filter: notionFilter, fetchSpinnerText: fetchSpinnerText ?? "Fetching ALL pages from Notion (excluding removed items by default)...", generateSpinnerText: generateSpinnerText ?? "Exporting pages to markdown files", transform: (pages) => { try { fetchedCount = Array.isArray(pages) ? pages.length : 0; - const transformed = applyFetchAllTransform( - Array.isArray(pages) ? pages : [], - { - statusFilter, - maxPages, - includeRemoved, - } - ); - return Array.isArray(transformed) ? transformed : []; + // Only apply client-side constraints not expressible server-side (e.g., maxPages) + const transformed = typeof maxPages === "number" && maxPages > 0 + ? (Array.isArray(pages) ? pages.slice(0, maxPages) : []) + : (Array.isArray(pages) ? pages : []); + return transformed; } catch (e) { console.warn( "fetchAll transform failed, using untransformed data:", (e as Error)?.message ?? e ); return Array.isArray(pages) ? pages : []; } }, onProgress: progressLogger, shouldGenerate: exportFiles, }); Suggestion importance[1-10]: 7 __ Why: Pushing `statusFilter` into the API avoids double filtering and reduces data transfer; the change is reasonable and aligns with the code’s transform stage. However, it’s an optimization rather than a critical bug fix.	Medium
General	Use stable cache directory for logs Writing logs into the current working directory can fail when the CWD is unwritable and can scatter files. Use a stable, project-root-relative path (e.g., alongside other caches) and ensure its directory exists. This prevents permission errors and keeps artifacts organized. scripts/notion-fetch/generateBlocks.ts [169-171] -const logPath = path.join(process.cwd(), "image-failures.json"); +const logsDir = path.join(__dirname, "../../.cache"); +fs.mkdirSync(logsDir, { recursive: true }); +const logPath = path.join(logsDir, "image-failures.json"); const tmpPath = `${logPath}.tmp`; const MAX_ENTRIES = 5000; Suggestion importance[1-10]: 7 __ Why: Writing logs to `process.cwd()` can be brittle; using a stable cache dir and ensuring it exists improves reliability and organization. The improved code aligns with the intent and integrates cleanly with the PR’s logging feature.	Medium
	Harden project-root path check The path check can falsely reject valid paths on Windows (different separators) and when the output file is exactly the CWD. Normalize both paths and handle equality to avoid blocking legitimate writes. Then write the file. scripts/notion-fetch-all/index.ts [428-439] const candidatePath = path.resolve(outputFile); -const cwd = process.cwd(); -if (!candidatePath.startsWith(path.resolve(cwd) + path.sep)) { - throw new Error( - `Refusing to write outside project directory: ${candidatePath}` - ); +const projectRoot = path.resolve(process.cwd()); +const normalizedCandidate = path.normalize(candidatePath); +const normalizedRoot = path.normalize(projectRoot); +const isInside = + normalizedCandidate === normalizedRoot \|\| + normalizedCandidate.startsWith(normalizedRoot + path.sep); +if (!isInside) { + throw new Error(`Refusing to write outside project directory: ${candidatePath}`); } -outputPath = candidatePath; -// eslint-disable-next-line security/detect-non-literal-fs-filename +outputPath = normalizedCandidate; fs.writeFileSync(outputPath, outputContent, "utf8"); Suggestion importance[1-10]: 7 __ Why: Normalizing paths and allowing equality prevents false negatives on Windows and when writing directly to the project root, improving cross-platform correctness without reducing safety.	Medium
	Push statusFilter into API query Add optional support for a specific `statusFilter` to be pushed down to Notion, avoiding client-side filtering inconsistencies. This ensures server-side filtering is authoritative and reduces data transfer and post-processing. scripts/notion-fetch-all/fetchAll.ts [117-134] -function buildStatusFilter(includeRemoved: boolean) { +function buildStatusFilter(includeRemoved: boolean, statusFilter?: string) { + // If a specific status is requested, prefer that exact match regardless of includeRemoved + if (typeof statusFilter === "string" && statusFilter.trim()) { + return { + property: NOTION_PROPERTIES.STATUS, + select: { equals: statusFilter.trim() }, + }; + } if (includeRemoved) { return undefined; } - return { or: [ { property: NOTION_PROPERTIES.STATUS, select: { is_empty: true }, }, { property: NOTION_PROPERTIES.STATUS, select: { does_not_equal: "Remove" }, }, ], }; } Suggestion importance[1-10]: 6 __ Why: Extending `buildStatusFilter` to accept `statusFilter` is consistent with the previous suggestion and improves efficiency, but it’s a modest improvement and not critical to correctness.	Low
	Avoid conflicting file filters Ensure the test step only runs when relevant staged files change; using both `files` and `glob` may not filter as intended. Replace with a single `glob` or a proper `files` pattern referencing staged files. This prevents unexpected execution and speeds up pre-commit. lefthook.yml [26-29] - name: notion-fetch-tests run: bunx vitest run scripts/notion-fetch/__tests__/ - files: git ls-files -m glob: "scripts/notion-fetch/*/.{ts,js}" Suggestion importance[1-10]: 6 __ Why: Using both `files: git ls-files -m` and a `glob` in lefthook can be conflicting or ineffective; simplifying to a single `glob` is a reasonable improvement for correctness and speed. The change matches the PR context, but impact is moderate and depends on how lefthook interprets these fields.	Low
	Make test hook non-interactive, concise Prepend the test command with the package runner to fail fast on missing `bunx` or `vitest` and ensure non-interactive CI behavior. Add `--run` and `--reporter=dot` plus `--silent` to reduce noisy output in hooks and enforce exit codes. lefthook.yml [26-29] - name: notion-fetch-tests - run: bunx vitest run scripts/notion-fetch/__tests__/ - files: git ls-files -m + run: bunx vitest run --run --reporter=dot --silent scripts/notion-fetch/__tests__/ glob: "scripts/notion-fetch/*/.{ts,js}" Suggestion importance[1-10]: 5 __ Why: Adding `--run`, `--reporter=dot`, and `--silent` can streamline output and ensure non-interactive behavior, but it's optional and environment-dependent; `bunx` already enforces exit codes. The suggestion is reasonable yet not critical, with modest impact.	Low

socket-security · 2025-09-26T16:40:22Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	@docusaurus/tsconfig@3.8.1 ⏵ 3.9.1				⁺¹
	@docusaurus/preset-classic@3.8.1 ⏵ 3.9.1	⁺²		⁺³¹	⁺²

View full report

…ilities - Add notionFixtures.ts with reusable test data structures - Enhance helpers.ts with proper axios mocking patterns - Improve mocks.ts with better mock implementations - Update index.ts to export new test utilities These utilities provide foundation for reliable test data creation, proper HTTP mocking, and hermetic test isolation across the test suite.

- Add downloadImage.test.ts with retry logic and error handling tests - Add runFetchPipeline.test.ts with pipeline coordination tests - Add integration.test.ts with test utility validation - Fix generateBlocks.test.ts with proper test structure Key improvements: - Implements exponential backoff retry testing with fake timers - Comprehensive error handling coverage (timeouts, DNS, HTTP errors) - Proper test data structure with Sub-item relations for multi-language pages - Full mocking of axios, SpinnerManager, and image processing pipeline - Hermetic tests with proper fs mocking to prevent file system side effects All tests now pass (30+ assertions) providing reliable regression protection for the Notion fetch pipeline, replacing need for manual testing.

- Add index.cli.test.ts with comprehensive CLI component testing - Fix static method mocking for PreviewGenerator, StatusAnalyzer, ComparisonEngine - Correct test imports to work with actual static class methods - Add proper environment variable validation testing Key fixes: - Mock static methods instead of instance methods (generatePreview, analyzePublicationStatus, compareWithPublished) - Update test assertions to validate static class API instead of instance API - Ensure proper CLI initialization and error handling coverage All 7 CLI tests now pass, providing coverage for the notion-fetch-all command-line interface and its core components.

- Fix existing generateBlocks.test.ts with proper runFetchPipeline import - Update TASK.md to reflect all completed deliverables and blockers resolved - Ensure test imports match actual function names in implementation This completes the final pieces of the test reliability improvement task, ensuring all existing tests work with the new comprehensive test suite.

All deliverables from the Notion fetch test suite stabilization task have been successfully completed: - ✅ Fixed axios mocking and runFetch typos - ✅ Added comprehensive retry, error handling, and fallback behavior tests - ✅ Implemented hermetic testing with proper fs mocking - ✅ All 37 tests passing across notion-fetch and notion-fetch-all suites The test suite is now reliable and trustworthy enough to replace manual regression testing as originally requested.

…essing - Add robust title fallback logic in resolvePageTitle() function - Implement createStandalonePageGroup() for pages without sub-items - Fix heading consumption bug in Toggle section processing - Update test fixtures and enhance test coverage - Achieve 100% test pass rate (13/13 tests passing) Resolves all issues documented in TASK.md: - Title property access with fallback to legacy "Title" property - Pages without Sub-item relations now properly processed - Heading titles correctly applied to subsequent Toggle sections

…vements - Move spinner cleanup to finally blocks for proper resource management - Replace top-level await with async IIFE for better compatibility - Add test-environment-aware retry logic (10-50ms in tests vs 1000-4000ms in production) - Implement pagination safety limits (10,000 pages max) to prevent infinite loops - Enhance input validation for status extraction with type checking - Improve optional chaining usage for cleaner code - Add cursor validation to prevent infinite pagination loops Test performance improvement: 19.76s → 1.52s (92% faster) All 13 tests passing with proper error handling and robustness maintained.

- Add comprehensive URL validation to prevent empty/invalid image references - Implement intelligent fallback system that replaces failed images with informative placeholders - Add persistent image cache to prevent re-downloading and improve performance - Create detailed failure logging system for manual recovery (image-failures.json) - Enhance error handling with 5-phase processing: validate → download → cache → fallback → sanitize - Add comprehensive statistics and progress reporting - Ensure zero image loss - all images are either downloaded successfully or replaced with documented placeholders - Prevent empty URLs in markdown that cause MDX compilation errors Key improvements: - URL validation catches empty, null, undefined, and malformed URLs before processing - Caching system prevents re-downloading same images across runs - Fallback placeholders preserve context with original URL for recovery - Comprehensive logging enables manual recovery of failed images - 100% protection against empty image references in final markdown - Performance optimized with parallel processing and intelligent caching All tests passing with 1.38s duration maintained.

- Fix image cache path normalization to prevent cache misses - Ensure translation directory creation to prevent runtime errors - Add test environment guard to graceful shutdown to prevent test runner exits - Improve image failure logging atomicity with temp file strategy All suggestions from PR #28 automated review have been applied. Tests passing with enhanced error handling and robustness.

- Extended gracefulShutdown to return exit code when NODE_ENV=test - Updated main() function to return exit code from gracefulShutdown - Fixed all 11 failing test assertions to check returned exit codes - Production behavior unchanged (still calls process.exit) - All 15 integration tests now pass (previously 11 were failing) Resolves issue where gracefulShutdown skipped process.exit in test mode, leaving mocked process.exit unused and causing test expectation failures.

- Normalize cached image path format in downloadAndProcessImageWithCache - Add safe error normalization in catch blocks for unknown error types - Fix status comparison to use explicit string literal - Add explicit pagination size to Notion API queries These changes improve error handling robustness and ensure consistent behavior across cached vs fresh image processing paths.

- Prevent URL mock overrides by implementing routing system in test mocks - Fix brittle image regex matching by collecting matches before processing - Avoid crashes when metrics are absent in index.ts - Guard metrics when generation is skipped in fetchAll.ts - Align image cache path resolution using consistent IMAGES_PATH constant - Fix variable reference issue in image processing phase These changes improve testing reliability, error handling robustness, and image processing consistency based on pr-agent feedback.

- Add safety cap for image processing to prevent runaway loops (500 image limit) - Await image failure logging promises to prevent unhandled rejections - Add numeric guards for metrics logging to handle non-finite values - Safely normalize content-type headers for robust format detection - Add defensive checks for pipeline output validation with array guards - Validate data array integrity across fetch pipeline transformations These improvements enhance error handling, prevent resource exhaustion, and ensure robust operation under various edge conditions.

- Add pagination loop guard to prevent infinite loops on empty pages - Fix image sanitization regex to avoid false positives on valid URLs - Make logging atomic and resilient with proper temp file cleanup - Add spinner state checks to prevent double-stopping issues - Add defensive post-pipeline filtering for Remove status pages - Ensure unhandled promise rejections are properly caught These changes improve the reliability and robustness of the Notion data fetching and processing pipeline based on PR review feedback.

Implements several key improvements based on PR review: - Use index-based safe replacements to avoid string replacement issues - Harden pagination against duplicates and infinite loops - Make logging mutex failure-safe with error recovery - Safeguard spinner state checks for test environment compatibility - Prevent regex backtracking and ReDoS attacks - Sanitize error logging to avoid sensitive context leakage These changes improve system reliability, security, and test compatibility while maintaining existing functionality.

- Validate unified database ID source to properly check resolvedDatabaseId instead of just process.env.DATABASE_ID - Add output path validation for security to prevent writing outside project directory - Make pagination more graceful by using warnings instead of throwing errors on anomalies - Add error handling around transform step to prevent pipeline crashes - Remove ineffective path normalization on image file basenames - Update test expectations to match new error messages

- Fix output path safety check using path.relative for cross-platform compatibility - Sanitize translation keys and values to prevent corrupted code.json files - Harden image failure logging with serialization guards and directory creation - Honor statusFilter in final defensive guard to ensure consistent filtering - Prevent double-ending spinners by removing redundant stop() calls - Fix conflicting lefthook file filters by using stage_fixed instead of files

- Use recorded match indices for image replacements to prevent incorrect matches - Sanitize cached image filenames to prevent directory traversal - Harden output path containment check with better validation - Add retry mechanism for pagination anomalies - Improve spinner lifecycle guarding to prevent TTY issues Addresses high and medium impact suggestions from PR review

- Preserve retry results during pagination anomaly handling to prevent data loss - Update Lefthook config to use npx and scope tests to test files only - Add --run flag to vitest commands for consistent CI behavior - Quote specific test file paths to prevent shell expansion issues Addresses latest review suggestions for improved reliability

github-actions · 2025-09-27T10:12:19Z

Failed to generate code suggestions for PR

luandro marked this pull request as draft September 26, 2025 13:51

luandro marked this pull request as ready for review September 26, 2025 16:52

luandro marked this pull request as draft September 26, 2025 16:56

luandro force-pushed the fix-fetch-all-align-export-with-notion-page-scoping branch from 6f70b58 to 5b2329a Compare September 26, 2025 17:30

luandro marked this pull request as ready for review September 26, 2025 18:48

luandro added 10 commits September 26, 2025 18:33

refactor(fetch-all): reuse notion fetch pipeline

8c63640

fix(fetch-all): harden content generation

8e6a78d

luandro force-pushed the fix-fetch-all-align-export-with-notion-page-scoping branch from 57af855 to d0fb89e Compare September 26, 2025 21:46

luandro marked this pull request as draft September 26, 2025 21:47

luandro marked this pull request as ready for review September 26, 2025 21:48

luandro marked this pull request as draft September 26, 2025 21:51

luandro marked this pull request as ready for review September 26, 2025 21:55

luandro added 2 commits September 26, 2025 19:14

luandro marked this pull request as draft September 26, 2025 22:30

luandro marked this pull request as ready for review September 26, 2025 22:30

luandro marked this pull request as draft September 27, 2025 03:51

luandro marked this pull request as ready for review September 27, 2025 03:51

luandro marked this pull request as draft September 27, 2025 04:22

luandro marked this pull request as ready for review September 27, 2025 04:27

luandro marked this pull request as draft September 27, 2025 04:35

luandro marked this pull request as ready for review September 27, 2025 04:39

luandro marked this pull request as draft September 27, 2025 04:42

luandro marked this pull request as ready for review September 27, 2025 04:47

luandro marked this pull request as draft September 27, 2025 04:50

luandro added 2 commits September 27, 2025 01:53

style: fix prettier formatting

8a93e0c

luandro marked this pull request as ready for review September 27, 2025 04:54

luandro marked this pull request as draft September 27, 2025 05:02

luandro marked this pull request as ready for review September 27, 2025 05:02

luandro marked this pull request as draft September 27, 2025 05:05

luandro marked this pull request as ready for review September 27, 2025 08:09

luandro marked this pull request as draft September 27, 2025 09:55

luandro marked this pull request as ready for review September 27, 2025 10:01

luandro marked this pull request as draft September 27, 2025 10:04

luandro marked this pull request as ready for review September 27, 2025 10:06

luandro marked this pull request as draft September 27, 2025 10:11

luandro marked this pull request as ready for review September 27, 2025 10:11

luandro merged commit 1b3b703 into main Sep 27, 2025
5 checks passed

luandro deleted the fix-fetch-all-align-export-with-notion-page-scoping branch February 13, 2026 13:40

tomasciccola mentioned this pull request Mar 26, 2026

fix(translate): prevent content loss in long-form translation #169

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(fetch-all): reuse notion fetch pipeline#28

refactor(fetch-all): reuse notion fetch pipeline#28
luandro merged 24 commits intomainfrom
fix-fetch-all-align-export-with-notion-page-scoping

luandro commented Sep 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Sep 25, 2025 •

edited

Loading

Uh oh!

socket-security bot commented Sep 26, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

luandro commented Sep 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Testing

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

socket-security bot commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

luandro commented Sep 25, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Sep 25, 2025 •

edited

Loading

socket-security bot commented Sep 26, 2025 •

edited

Loading