Skip to content

refactor(fetch-all): reuse notion fetch pipeline#28

Merged
luandro merged 24 commits intomainfrom
fix-fetch-all-align-export-with-notion-page-scoping
Sep 27, 2025
Merged

refactor(fetch-all): reuse notion fetch pipeline#28
luandro merged 24 commits intomainfrom
fix-fetch-all-align-export-with-notion-page-scoping

Conversation

@luandro
Copy link
Contributor

@luandro luandro commented Sep 25, 2025

User description

Summary

  • refactor fetch-all CLI to share notion fetch pipeline and centralized graceful shutdown helpers
  • remove bespoke export and preview pipeline in favor of shared generateBlocks logic
  • update fetchAll helpers to filter via runFetchPipeline and retain preview/analysis features

Testing

  • bunx eslint scripts/notion-fetch-all/index.ts scripts/notion-fetch-all/fetchAll.ts scripts/notion-fetch/runFetch.ts scripts/notion-fetch/index.ts
  • bunx vitest run scripts/notion-fetch-all/index.test.ts scripts/notion-fetch-all/fetchAll.test.ts

Fixes #27


PR Type

Enhancement, Tests, Bug fix


Description

  • Align fetch-all export with page scoping

  • Reuse shared generateBlocks pipeline

  • Harden image handling with cache/fallbacks

  • Add comprehensive CLI and pipeline tests


Diagram Walkthrough

flowchart LR
  fetchAll["fetch-all CLI"] -- "calls" --> runFetchPipeline["shared runFetch pipeline"]
  runFetchPipeline -- "uses" --> generateBlocks["generateBlocks (shared)"]
  generateBlocks -- "writes" --> docsOut["Markdown + images"]
  generateBlocks -- "updates" --> i18nCode["i18n code.json"]
  tests["New test suites"] -- "mock/verify" --> fetchAll
Loading

File Walkthrough

Relevant files
Enhancement
2 files
generateBlocks.ts
Robust image pipeline, locale scoping, safe frontmatter   
+1065/-289
index.ts
CLI uses shared runtime and pipeline with shutdown             
+265/-210
Refactor
1 files
generateBlocksForAll.ts
Delegate to shared generateBlocks implementation                 
+1/-1081
Tests
1 files
notionFixtures.ts
Add reusable Notion mock fixtures for tests                           
+377/-0 
Additional files
19 files
TASK.md +0/-31   
code.json +120/-0 
code.json +120/-0 
lefthook.yml +5/-0     
package.json +15/-11 
fetchNotionData.ts +80/-5   
index.cli.test.ts +156/-0 
fetchAll.ts +159/-134
downloadImage.test.ts +500/-0 
integration.test.ts +131/-0 
runFetchPipeline.test.ts +367/-0 
generateBlocks.test.ts +464/-3 
index.test.ts +37/-51 
index.ts +50/-142
runFetch.ts +99/-0   
runtime.ts +98/-0   
helpers.ts +39/-0   
index.ts +3/-0     
mocks.ts +75/-0   

@github-actions
Copy link
Contributor

github-actions bot commented Sep 25, 2025

PR Code Suggestions ✨

Latest suggestions up to acf5500


Previous suggestions

Suggestions up to commit 2043324
CategorySuggestion                                                                                                                                    Impact
Possible issue
Preserve results on anomaly retry

On anomaly retry you discard the retried page results, potentially skipping data.
Merge retryResp.results (deduping by id) before advancing the cursor to avoid data
loss.

scripts/fetchNotionData.ts [66-85]

 const anomaly =
   hasMore &&
   (duplicateDetected ||
     !startCursor ||
     startCursor === prevCursor ||
     prevCount === 0);
 if (anomaly) {
-  // One retry attempt to recover from transient anomaly
   console.warn("Notion API pagination anomaly detected; retrying once...");
   const retryResp = await enhancedNotion.databasesQuery({
     database_id: DATABASE_ID,
     filter,
     start_cursor: prevCursor,
     page_size: 100,
   });
+  const retryResults = Array.isArray(retryResp.results) ? retryResp.results : [];
+  for (const r of retryResults) {
+    const id = (r as any)?.id;
+    if (!id || seenIds.has(id)) continue;
+    seenIds.add(id);
+    results.push(r);
+  }
   const retryCursor = retryResp.next_cursor ?? undefined;
   if (retryCursor && retryCursor !== prevCursor) {
     hasMore = Boolean(retryResp.has_more);
     startCursor = retryCursor;
     continue;
   }
   console.warn(
     "Anomaly persisted after retry; stopping early with partial results."
   );
   break;
 }
Suggestion importance[1-10]: 8

__

Why: Correctly points out that retry results are discarded during anomaly handling, risking data loss. Merging retryResp.results with deduplication aligns with the pagination robustness goal and provides a meaningful reliability improvement.

Medium
Create missing i18n docs directories

Ensure the i18n content directories exist for all non-default locales, otherwise
writes later (e.g., to PATH from getI18NPath) can fail. Create those locale-specific
doc paths at startup, mirroring the images/docs directory setup.

scripts/notion-fetch/generateBlocks.ts [438-451]

 const getI18NPath = (locale: string) =>
   path.join(I18N_PATH, locale, "docusaurus-plugin-content-docs", "current");
 const locales = config.i18n.locales;
 const DEFAULT_LOCALE = config.i18n.defaultLocale;
 
 const LANGUAGE_NAME_TO_LOCALE: Record<string, string> = {
   English: "en",
   Spanish: "es",
   Portuguese: "pt",
 };
 
 const FALLBACK_TITLE_PREFIX = "untitled";
 
 // Ensure directories exist (preserve existing content)
 fs.mkdirSync(CONTENT_PATH, { recursive: true });
 fs.mkdirSync(IMAGES_PATH, { recursive: true });
+for (const locale of locales.filter((l: string) => l !== DEFAULT_LOCALE)) {
+  fs.mkdirSync(getI18NPath(locale), { recursive: true });
+}
Suggestion importance[1-10]: 7

__

Why: The code ensures CONTENT_PATH and IMAGES_PATH exist but not the i18n docs directories; later writes use getI18NPath which could fail. Adding directory creation for non-default locales is accurate and improves robustness, though not critical.

Medium
Guard against double shutdown calls

Protect against multiple shutdown invocations by de-duplicating exit flows. Wrap
gracefulShutdown with a guard to avoid concurrent calls when both catch paths fire,
preventing double cleanup or double exit.

scripts/notion-fetch-all/index.ts [672-685]

+let exiting = false;
 if (isDirectExec && process.env.NODE_ENV !== "test") {
   (async () => {
     try {
       await main();
     } catch (error) {
-      console.error(chalk.red("❌ Fatal error:"), error);
+      if (!exiting) {
+        exiting = true;
+        console.error(chalk.red("❌ Fatal error:"), error);
+        await gracefulShutdown(1);
+      }
+    }
+  })().catch(async (err) => {
+    if (!exiting) {
+      exiting = true;
+      console.error(chalk.red("❌ Unhandled fatal error:"), err);
       await gracefulShutdown(1);
     }
-  })().catch(async (err) => {
-    console.error(chalk.red("❌ Unhandled fatal error:"), err);
-    await gracefulShutdown(1);
   });
 }
Suggestion importance[1-10]: 6

__

Why: The improved code adds an exiting guard around gracefulShutdown to avoid duplicate shutdowns from both promise paths. It aligns with the new runtime-based shutdown flow and reduces risk of double cleanup; useful but not critical.

Low
Restore test-observable exit behavior

Returning the shutdown result here changes previous behavior where tests expected an
exception via mocked process.exit. To avoid silent continuation in non-test contexts
and keep test contract, rethrow the exit code in test mode so callers can assert,
while preserving real exit in production.

scripts/notion-fetch/index.ts [37-51]

 if (!process.env.NOTION_API_KEY) {
   const msg = "Missing NOTION_API_KEY environment variable.";
-  // Keep concise output to avoid leaking sensitive context in logs
   console.error(chalk.bold.red(msg));
-  return await gracefulShutdown(1);
+  const code = await gracefulShutdown(1);
+  if (process.env.NODE_ENV === "test") throw new Error(`Process exit called with code: ${code}`);
+  return code;
 }
 
 if (!resolvedDatabaseId) {
   console.error(
     chalk.bold.red(
       "Error: DATABASE_ID (or NOTION_DATABASE_ID) is not defined in the environment variables."
     )
   );
-  return await gracefulShutdown(1);
+  const code = await gracefulShutdown(1);
+  if (process.env.NODE_ENV === "test") throw new Error(`Process exit called with code: ${code}`);
+  return code;
 }
Suggestion importance[1-10]: 2

__

Why: The PR intentionally changed shutdown semantics to return an exit code in tests via gracefulShutdown. Reintroducing throws contradicts the new tests expecting numeric returns, making this suggestion inaccurate in this context.

Low
General
Align hook runner and scope tests

Use the same package manager as the project scripts to avoid environment mismatch
and ensure non-interactive CI behavior with --run. Also, restrict the glob to test
files to prevent unnecessary runs on non-test changes.

lefthook.yml [26-29]

 - name: notion-fetch-tests
-  run: bunx vitest run scripts/notion-fetch/__tests__/
+  run: npx vitest --run scripts/notion-fetch/__tests__/
   stage_fixed: true
-  glob: "scripts/notion-fetch/**/*.{ts,js}"
+  glob: "scripts/notion-fetch/**/__tests__/**/*.{ts,js}"
Suggestion importance[1-10]: 7

__

Why: Switching from bunx to npx aligns with project scripts and --run makes CI non‑interactive; narrowing the glob to tests reduces unnecessary runs. The mapping to the new hunk is accurate (lines 26–29).

Medium
Make vitest runs fail-safe

Ensure vitest exits non-zero on failure in CI by adding the --run flag consistently
and avoid path typos that could silently pass. Also, quote paths to guard against
shell expansion issues on some environments.

package.json [37-40]

-"test:notion-fetch": "vitest run scripts/notion-fetch/__tests__/",
-"test:notion-cli": "vitest run scripts/notion-fetch-all/__tests__/",
-"test:notion-pipeline": "vitest run scripts/notion-fetch/__tests__/runFetchPipeline.test.ts",
-"test:notion-image": "vitest run scripts/notion-fetch/__tests__/downloadImage.test.ts",
+"test:notion-fetch": "vitest --run scripts/notion-fetch/__tests__/",
+"test:notion-cli": "vitest --run scripts/notion-fetch-all/__tests__/",
+"test:notion-pipeline": "vitest --run \"scripts/notion-fetch/__tests__/runFetchPipeline.test.ts\"",
+"test:notion-image": "vitest --run \"scripts/notion-fetch/__tests__/downloadImage.test.ts\"",
Suggestion importance[1-10]: 6

__

Why: Adding --run improves CI consistency and quoting specific test paths is harmless; however, vitest already exits non‑zero on failures in run mode and the change is minor. The existing_code lines map to new lines 37–40.

Low
Reduce false positives in URL sanitation

The regex for whitespace-in-URL can overmatch valid URLs with spaces encoded as %20
or with parentheses content. Tighten the pattern to ignore percent-encoded spaces
and allow balanced parentheses sequences to reduce false positives.

scripts/notion-fetch/generateBlocks.ts [249-280]

 function sanitizeMarkdownImages(content: string): string {
   if (!content || content.indexOf("![") === -1) return content;
-  // Cap processing size to avoid ReDoS on pathological inputs
   const MAX_LEN = 2_000_000;
   const text = content.length > MAX_LEN ? content.slice(0, MAX_LEN) : content;
   let sanitized = text;
 
-  // Pattern 1: Completely empty URLs
+  // Empty URLs
   sanitized = sanitized.replace(
     /!\[([^\]]*)\]\(\s*\)/g,
     "**[Image: $1]** *(Image URL was empty)*"
   );
-
-  // Pattern 2: Invalid literal placeholders
+  // Invalid placeholders
   sanitized = sanitized.replace(
     /!\[([^\]]*)\]\(\s*(?:undefined|null)\s*\)/g,
     "**[Image: $1]** *(Image URL was invalid)*"
   );
-
-  // Pattern 3: Unencoded whitespace inside URL (safe regex without nested greedy scans)
-  // Matches any whitespace in the URL excluding escaped closing parens and %20
+  // URLs containing raw whitespace (exclude %20 and allow escaped parens)
   sanitized = sanitized.replace(
-    /!\[([^\]]*)\]\(\s*([^()\s][^()\r\n]*?(?:\s+)[^()\r\n]*?)\s*\)/g,
-    "**[Image: $1]** *(Image URL contained whitespace)*"
+    /!\[([^\]]*)\]\(\s*((?:(?:\\\)|\\\(|[^()\s])|\([^)]*\))+)\s*\)/g,
+    (m, alt, url) => {
+      // If url has any raw whitespace, flag it; ignore percent-encoded ones
+      return /\s/.test(url) ? `**[Image: ${alt}]** *(Image URL contained whitespace)*` : m;
+    }
   );
 
-  // If we truncated, append a notice to avoid corrupting content silently
   if (text.length !== content.length) {
     return sanitized + "\n\n<!-- Content truncated for sanitation safety -->";
   }
   return sanitized;
 }
Suggestion importance[1-10]: 5

__

Why: The concern about overmatching whitespace is reasonable and the proposed change aims to be safer, but it introduces a more complex regex and runtime checking that may have edge cases; impact is moderate and correctness isn't fully verifiable from context.

Low
Ensure sorting occurs post-transform

The transform is applied before sorting and without honoring the caller’s requested
sortBy/sortDirection. This can produce unexpected order in the result. Move sorting
to after transformation (and any limiting) to guarantee consistent ordering.

scripts/notion-fetch-all/fetchAll.ts [67-96]

 const { data: rawData = [], metrics } = await runFetchPipeline({
   filter,
   fetchSpinnerText:
     fetchSpinnerText ??
     "Fetching ALL pages from Notion (excluding removed items by default)...",
   generateSpinnerText:
     generateSpinnerText ?? "Exporting pages to markdown files",
   transform: (pages) => {
     try {
       fetchedCount = Array.isArray(pages) ? pages.length : 0;
+      // Only filter/limit here; sorting is handled after transform
       const transformed = applyFetchAllTransform(
         Array.isArray(pages) ? pages : [],
         {
           statusFilter,
           maxPages,
           includeRemoved,
         }
       );
       return Array.isArray(transformed) ? transformed : [];
     } catch (e) {
       console.warn(
         "fetchAll transform failed, using untransformed data:",
         (e as Error)?.message ?? e
       );
       return Array.isArray(pages) ? pages : [];
     }
   },
   onProgress: progressLogger,
   shouldGenerate: exportFiles,
 });
+// ... later
+const pages = defensivelyFiltered.map((page) => transformPage(page));
+const sortedPages = sortPages(pages, sortBy, sortDirection);
Suggestion importance[1-10]: 5

__

Why: The suggestion highlights ordering concerns but the current code already sorts after transforming raw pages (sortPages after transformPage). The advice is reasonable but mostly reiterates existing behavior; impact is moderate at best.

Low
Suggestions up to commit 23b60c9
CategorySuggestion                                                                                                                                    Impact
Possible issue
Use recorded match indices for replacements

Using indexOf(original) after prior mutations can match the wrong occurrence when
identical markdown appears multiple times. Compute and record match indices from the
initial scan (imgRegex.exec) and use those for replacements to avoid corrupt or
misplaced substitutions.

scripts/notion-fetch/generateBlocks.ts [1205-1374]

 const imgRegex = /!\[([^\]]*)\]\(\s*((?:\\\)|[^)])+?)\s*\)/g;
 const imageMatches: Array<{
   full: string;
   url: string;
   alt: string;
   idx: number;
+  start: number;
+  end: number;
 }> = [];
+let m: RegExpExecArray | null;
+let tmpIndex = 0;
+let safetyCounter = 0;
+const SAFETY_LIMIT = 500;
+
+while ((m = imgRegex.exec(sourceMarkdown)) !== null) {
+  if (++safetyCounter > SAFETY_LIMIT) break;
+  const start = m.index;
+  const full = m[0];
+  const end = start + full.length;
+  const rawUrl = m[2];
+  const unescapedUrl = rawUrl.replace(/\\\)/g, ")");
+  imageMatches.push({
+    full,
+    url: unescapedUrl,
+    alt: m[1],
+    idx: tmpIndex++,
+    start,
+    end,
+  });
+}
 ...
-const idx = markdownString.parent.indexOf(original);
-if (idx === -1) continue;
-const end = idx + original.length;
+const indexedReplacements: Array<{ start: number; end: number; text: string }> = [];
+for (const result of imageResults) {
+  if (result.status !== "fulfilled") { totalFailures++; continue; }
+  const processResult = result.value;
+  const match = imageMatches.find(im => im.idx === processResult.index);
+  if (!match) continue;
+  let replacementText: string;
+  if (processResult.success && processResult.newPath) {
+    replacementText = match.full.replace(processResult.imageUrl!, processResult.newPath);
+    totalSaved += processResult.savedBytes || 0;
+    successfulImages++;
+  } else {
+    replacementText = createFallbackImageMarkdown(match.full, match.url, match.idx);
+    totalFailures++;
+  }
+  indexedReplacements.push({ start: match.start, end: match.end, text: replacementText });
+}
+indexedReplacements.sort((a, b) => b.start - a.start);
+let processedMarkdown = markdownString.parent;
+for (const rep of indexedReplacements) {
+  processedMarkdown = processedMarkdown.slice(0, rep.start) + rep.text + processedMarkdown.slice(rep.end);
+}
Suggestion importance[1-10]: 8

__

Why: The PR already collects matches then later uses indexOf(original) on potentially mutated content, which can be incorrect for duplicate patterns. Recording start/end indices during the initial scan and using them for replacements improves correctness and robustness.

Medium
Robust spinner lifecycle guarding

Ensure spinners are always stopped even if ora throws or spinners aren't started, to
avoid hanging TTY state. Guard spinner starts and stops and null-check unregister
functions to prevent runtime errors in non-TTY/test environments.

scripts/notion-fetch/runFetch.ts [40-96]

-const fetchSpinner = ora(fetchSpinnerText).start();
-const unregisterFetchSpinner = trackSpinner(fetchSpinner);
+const fetchSpinner = ora(fetchSpinnerText);
+let unregisterFetchSpinner: (() => void) | undefined;
+try {
+  fetchSpinner.start();
+  unregisterFetchSpinner = trackSpinner(fetchSpinner);
 
-try {
   let data = await fetchNotionData(filter);
   data = Array.isArray(data) ? data : [];
 
   data = await sortAndExpandNotionData(data);
   data = Array.isArray(data) ? data : [];
 
   if (transform) {
     const transformed = await transform(data);
     data = Array.isArray(transformed) ? transformed : [];
   }
 
   if (fetchSpinner.isSpinning) {
     fetchSpinner.succeed(chalk.green("Data fetched successfully"));
   }
 
   if (!shouldGenerate) {
     return { data };
   }
 
-  const generateSpinner = ora(generateSpinnerText).start();
-  const unregisterGenerateSpinner = trackSpinner(generateSpinner);
+  const generateSpinner = ora(generateSpinnerText);
+  let unregisterGenerateSpinner: (() => void) | undefined;
+  try {
+    generateSpinner.start();
+    unregisterGenerateSpinner = trackSpinner(generateSpinner);
 
-  try {
     const metrics = await generateBlocks(data, (progress) => {
       if (generateSpinner.isSpinning) {
         generateSpinner.text = chalk.blue(
           `${generateSpinnerText}: ${progress.current}/${progress.total}`
         );
       }
       onProgress?.(progress);
     });
 
     if (generateSpinner.isSpinning) {
       generateSpinner.succeed(chalk.green("Blocks generated successfully"));
     }
 
     return { data, metrics };
   } catch (error) {
     if (generateSpinner.isSpinning) {
       generateSpinner.fail(chalk.red("Failed to generate blocks"));
     }
     throw error;
   } finally {
-    unregisterGenerateSpinner();
+    unregisterGenerateSpinner?.();
   }
 } catch (error) {
   if (fetchSpinner.isSpinning) {
     fetchSpinner.fail(chalk.red("Failed to fetch data from Notion"));
   }
   throw error;
 } finally {
-  unregisterFetchSpinner();
+  unregisterFetchSpinner?.();
 }
Suggestion importance[1-10]: 6

__

Why: The change defensively guards spinner start/stop and unregister callbacks, which can help avoid edge-case errors in non-TTY/test environments. It's a moderate robustness improvement; existing code already tracks and stops spinners reliably, so impact is limited.

Low
Security
Sanitize cached image filenames

Avoid caching paths with directory traversal or unexpected separators. Sanitize
localPath before saving and when resolving on disk to prevent reading/writing
outside IMAGES_PATH if a malicious URL produces a crafted filename.

scripts/notion-fetch/generateBlocks.ts [294-361]

-class ImageCache {
-  private cacheFile: string;
-  private cache: Map<string, ImageCacheEntry>;
+private getAbsoluteImagePath(fileNameOrWebPath: string): string {
+  const baseName = path.basename(fileNameOrWebPath || "");
+  // reject suspicious names
+  if (!baseName || baseName.includes("..") || baseName.includes(path.sep)) {
+    return path.join(IMAGES_PATH, "_invalid-image-name_");
+  }
+  return path.join(IMAGES_PATH, baseName);
+}
 
-  constructor() {
-    this.cacheFile = path.join(process.cwd(), "image-cache.json");
-    this.cache = new Map();
-    this.loadCache();
-  }
-...
-  private getAbsoluteImagePath(fileNameOrWebPath: string): string {
-    const baseName = path.basename(fileNameOrWebPath);
-    return path.join(IMAGES_PATH, baseName);
-  }
-...
-  set(url: string, localPath: string, blockName: string): void {
-    // store only the basename to avoid mixing web and fs paths
-    const entry: ImageCacheEntry = {
-      url,
-      localPath: path.basename(localPath),
-      timestamp: new Date().toISOString(),
-      blockName,
-    };
-    this.cache.set(url, entry);
-    this.saveCache();
-  }
+set(url: string, localPath: string, blockName: string): void {
+  const safeBase = path.basename(localPath || "");
+  const entry: ImageCacheEntry = {
+    url,
+    localPath: safeBase,
+    timestamp: new Date().toISOString(),
+    blockName,
+  };
+  this.cache.set(url, entry);
+  this.saveCache();
+}
Suggestion importance[1-10]: 7

__

Why: The cache stores basenames and resolves paths under IMAGES_PATH; adding explicit basename/invalid-name checks marginally hardens against malformed inputs without changing behavior. It's a reasonable security hardening with low risk but not critical.

Medium
Harden output path containment check

The relative path check can be bypassed with path tricks on some platforms.
Normalize and ensure the resolved output stays within the project root using
path.resolve(projectRoot, rel) comparison, and reject paths containing null bytes or
control chars before writing.

scripts/notion-fetch-all/index.ts [428-447]

 const candidatePath = path.resolve(outputFile);
 const projectRoot = path.resolve(process.cwd());
 const rel = path.relative(projectRoot, candidatePath);
-if (rel.startsWith("..") || path.isAbsolute(rel)) {
-  throw new Error(
-    `Refusing to write outside project directory: ${candidatePath}`
-  );
+// basic invalid chars check
+if (/\0/.test(candidatePath)) {
+  throw new Error("Invalid output path");
 }
-outputPath = candidatePath;
+const safeResolved = path.resolve(projectRoot, rel);
+if (
+  !safeResolved.startsWith(projectRoot + path.sep) &&
+  safeResolved !== projectRoot // allow writing exactly in root
+) {
+  throw new Error(`Refusing to write outside project directory: ${candidatePath}`);
+}
+outputPath = safeResolved;
 const outDir = path.dirname(outputPath);
 fs.mkdirSync(outDir, { recursive: true });
 fs.writeFileSync(outputPath, outputContent, "utf8");
Suggestion importance[1-10]: 6

__

Why: The PR already prevents writing outside the project via a relative check; normalizing and validating the resolved path adds defense-in-depth. It's a valid security improvement but incremental rather than fixing a demonstrated bug.

Low
General
Add retry for pagination anomalies

Avoid silently truncating data on transient pagination anomalies. Instead of
breaking early, attempt one retry page fetch, then continue if cursor advances; only
then break to prevent infinite loops.

scripts/fetchNotionData.ts [20-71]

 while (hasMore) {
   if (++safetyCounter > MAX_PAGES) {
-    console.warn(
-      "Pagination safety limit exceeded; returning partial results."
-    );
+    console.warn("Pagination safety limit exceeded; returning partial results.");
     break;
   }
 
   const response = await enhancedNotion.databasesQuery({
     database_id: DATABASE_ID,
     filter,
     start_cursor: startCursor,
     page_size: 100,
   });
 
   const pageResults = Array.isArray(response.results) ? response.results : [];
 
-  // Detect duplicate IDs to avoid stalling and data corruption
   let duplicateDetected = false;
   for (const r of pageResults) {
     const id = (r as any)?.id;
     if (id && seenIds.has(id)) {
       duplicateDetected = true;
       break;
     }
     if (id) seenIds.add(id);
   }
 
   results.push(...pageResults);
 
   const prevCursor = startCursor;
   const prevCount = pageResults.length;
   hasMore = Boolean(response.has_more);
   startCursor = response.next_cursor ?? undefined;
 
-  if (
-    hasMore &&
-    (duplicateDetected ||
-      !startCursor ||
-      startCursor === prevCursor ||
-      prevCount === 0)
-  ) {
-    console.warn(
-      "Notion API pagination anomaly detected; stopping early with partial results."
-    );
+  const anomaly = hasMore && (duplicateDetected || !startCursor || startCursor === prevCursor || prevCount === 0);
+  if (anomaly) {
+    // One retry attempt to recover from transient anomaly
+    console.warn("Notion API pagination anomaly detected; retrying once...");
+    const retryResp = await enhancedNotion.databasesQuery({
+      database_id: DATABASE_ID,
+      filter,
+      start_cursor: prevCursor,
+      page_size: 100,
+    });
+    const retryCursor = retryResp.next_cursor ?? undefined;
+    if (retryCursor && retryCursor !== prevCursor) {
+      hasMore = Boolean(retryResp.has_more);
+      startCursor = retryCursor;
+      continue;
+    }
+    console.warn("Anomaly persisted after retry; stopping early with partial results.");
     break;
   }
 }
Suggestion importance[1-10]: 7

__

Why: Introducing a single retry on cursor anomalies can prevent premature truncation of results, improving resilience without risking infinite loops. It aligns with the current pagination loop and safety checks; moderate but meaningful reliability gain.

Medium
Harden database ID validation

Also validate that resolvedDatabaseId is a non-empty string without whitespace to
avoid passing an invalid ID downstream. Trim and check length before proceeding to
prevent subtle runtime failures.

scripts/notion-fetch/index.ts [37-51]

 if (!process.env.NOTION_API_KEY) {
   const msg = "Missing NOTION_API_KEY environment variable.";
-  // Keep concise output to avoid leaking sensitive context in logs
   console.error(chalk.bold.red(msg));
   return await gracefulShutdown(1);
 }
 
-if (!resolvedDatabaseId) {
+const dbId = typeof resolvedDatabaseId === "string" ? resolvedDatabaseId.trim() : "";
+if (!dbId) {
   console.error(
     chalk.bold.red(
       "Error: DATABASE_ID (or NOTION_DATABASE_ID) is not defined in the environment variables."
     )
   );
   return await gracefulShutdown(1);
 }
Suggestion importance[1-10]: 5

__

Why: Trimming and validating the database ID avoids passing whitespace-only values, a minor correctness improvement. The existing check already handles undefined; this refines validation with low risk and modest impact.

Low
Ensure correct hook execution context

Use the correct Lefthook key stage_fixed only if supported; otherwise use
stage_fixed: true with run_from_repo_root: true to ensure paths resolve. Also quote
the glob consistently and ensure the test command exits non-zero on failures.

lefthook.yml [26-29]

 - name: notion-fetch-tests
   run: bunx vitest run scripts/notion-fetch/__tests__/
+  run_from_repo_root: true
   stage_fixed: true
-  glob: "scripts/notion-fetch/**/*.{ts,js}"
+  glob: "**/scripts/notion-fetch/**/*.{ts,js}"
Suggestion importance[1-10]: 3

__

Why: The existing_code matches the new hunk lines, but the proposed changes (adding run_from_repo_root and altering the glob to "**/scripts...") are speculative and may not be necessary; they could even broaden the scope unintentionally. The advice about ensuring non-zero exits is generic and not directly implemented.

Low
Suggestions up to commit 8a93e0c
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix output path safety check

This path check can misclassify root-level writes on Windows or if output equals the
cwd. Normalize paths and use a safer containment test to avoid false positives and
path traversal. Also ensure the parent directory exists before writing.

scripts/notion-fetch-all/index.ts [427-438]

 const candidatePath = path.resolve(outputFile);
-const cwd = process.cwd();
-if (!candidatePath.startsWith(path.resolve(cwd) + path.sep)) {
+const projectRoot = path.resolve(process.cwd());
+const rel = path.relative(projectRoot, candidatePath);
+if (rel.startsWith("..") || path.isAbsolute(rel)) {
   throw new Error(
     `Refusing to write outside project directory: ${candidatePath}`
   );
 }
 outputPath = candidatePath;
-// eslint-disable-next-line security/detect-non-literal-fs-filename
+const outDir = path.dirname(outputPath);
+fs.mkdirSync(outDir, { recursive: true });
 fs.writeFileSync(outputPath, outputContent, "utf8");
Suggestion importance[1-10]: 8

__

Why: Using path.relative for containment and creating parent directories avoids false positives and ensures safe writes across platforms. This directly fixes a potential path check bug, with higher impact on correctness and security.

Medium
General
Sanitize translation keys and values

Add a filesystem-safe guard for original keys to prevent huge or unsafe keys from
bloating or breaking code.json. Also ensure only string keys are stored and trim
excessively long strings. This hardens against malformed Notion content and reduces
the risk of corrupted translation files.

scripts/notion-fetch/generateBlocks.ts [911-946]

 function setTranslationString(
   lang: string,
   original: string,
   translated: string
 ) {
   const lPath = path.join(I18N_PATH, lang, "code.json");
   const dir = path.dirname(lPath);
-  // ensure directory exists
   fs.mkdirSync(dir, { recursive: true });
 
   let fileContents = "{}";
   try {
     const existing = fs.readFileSync(lPath, "utf8");
     if (typeof existing === "string" && existing.trim().length > 0) {
       fileContents = existing;
     }
   } catch {
     console.warn(
       chalk.yellow(
         `Translation file missing for ${lang}, creating a new one at ${lPath}`
       )
     );
   }
 
   let file: Record<string, any>;
   try {
     file = JSON.parse(fileContents);
   } catch (parseError) {
     console.warn(
       chalk.yellow(
         `Failed to parse translation file for ${lang}, resetting content`
       ),
       parseError
     );
     file = {};
   }
-  file[original] = { message: translated };
+
+  const safeKey =
+    typeof original === "string"
+      ? original.slice(0, 2000)
+      : String(original).slice(0, 2000);
+  const safeMessage =
+    typeof translated === "string"
+      ? translated.slice(0, 5000)
+      : String(translated).slice(0, 5000);
+
+  file[safeKey] = { message: safeMessage };
   fs.writeFileSync(lPath, JSON.stringify(file, null, 4));
 }
Suggestion importance[1-10]: 7

__

Why: The guard against excessively long or non-string keys/values is sensible and reduces risk of corrupting code.json. It's a moderate robustness improvement aligned with the new function. Not critical, but useful and accurate to the PR context.

Medium
Harden image failure logging

Ensure the log directory exists and guard against non-serializable entries to avoid
write failures. Also bound the size of individual log entries to keep the log file
performant and prevent accidental secrets leakage.

scripts/notion-fetch/generateBlocks.ts [168-223]

 async function logImageFailure(logEntry: any): Promise<void> {
   const logPath = path.join(process.cwd(), "image-failures.json");
+  const logDir = path.dirname(logPath);
   const tmpPath = `${logPath}.tmp`;
   const MAX_ENTRIES = 5000;
+  const MAX_FIELD_LEN = 2000;
+
+  const safeEntry = (() => {
+    try {
+      const clone: Record<string, unknown> = {};
+      for (const [k, v] of Object.entries(logEntry ?? {})) {
+        let val = v;
+        if (typeof val === "string") val = val.slice(0, MAX_FIELD_LEN);
+        else if (typeof val === "object") val = JSON.parse(JSON.stringify(val));
+        clone[k] = val;
+      }
+      return clone;
+    } catch {
+      return { message: "non-serializable log entry" };
+    }
+  })();
 
   imageLogWriting = imageLogWriting
     .then(async () => {
+      try {
+        fs.mkdirSync(logDir, { recursive: true });
+      } catch {
+        // ignore
+      }
+
       let existingLogs: any[] = [];
       try {
         if (fs.existsSync(logPath)) {
           const content = fs.readFileSync(logPath, "utf-8");
           const parsed = JSON.parse(content);
           existingLogs = Array.isArray(parsed) ? parsed : [];
         }
       } catch {
         existingLogs = [];
       }
-      existingLogs.push(logEntry);
-      // Prevent unbounded growth
+      existingLogs.push(safeEntry);
       if (existingLogs.length > MAX_ENTRIES) {
         existingLogs = existingLogs.slice(-MAX_ENTRIES);
       }
 
       const payload = JSON.stringify(existingLogs, null, 2);
       try {
         fs.writeFileSync(tmpPath, payload);
         fs.renameSync(tmpPath, logPath);
       } catch {
         console.warn(
           chalk.yellow("Failed to write image failure log atomically")
         );
         try {
           fs.writeFileSync(logPath, payload);
         } catch {
           console.warn(chalk.yellow("Failed to write image failure log"));
         }
       } finally {
         try {
           if (fs.existsSync(tmpPath)) fs.unlinkSync(tmpPath);
         } catch {
           // best-effort cleanup
         }
       }
       return undefined;
     })
     .catch((e) => {
-      // Reset mutex on failure to avoid blocking future writes
       console.warn(
         chalk.yellow("Image failure log write error; resetting queue"),
         e
       );
       return undefined;
     });
 
   await imageLogWriting;
 }
Suggestion importance[1-10]: 7

__

Why: Ensuring the log directory exists and serializing/trimming entries strengthens reliability without changing behavior. It fits the added logging and is correctly mapped. Impact is moderate on stability.

Medium
Honor statusFilter in final guard

Ensure the final defensive filter respects statusFilter consistently, not just
includeRemoved. Without this, statusFilter may be bypassed if upstream changes.
Apply both filters before transforming to PageWithStatus.

scripts/notion-fetch-all/fetchAll.ts [98-105]

 const { data: rawData = [], metrics } = await runFetchPipeline({
   filter,
   fetchSpinnerText:
     fetchSpinnerText ??
     "Fetching ALL pages from Notion (excluding removed items by default)...",
   generateSpinnerText:
     generateSpinnerText ?? "Exporting pages to markdown files",
   transform: (pages) => {
     try {
       fetchedCount = Array.isArray(pages) ? pages.length : 0;
       const transformed = applyFetchAllTransform(
         Array.isArray(pages) ? pages : [],
         {
           statusFilter,
           maxPages,
           includeRemoved,
         }
       );
       return Array.isArray(transformed) ? transformed : [];
     } catch (e) {
       console.warn(
         "fetchAll transform failed, using untransformed data:",
         (e as Error)?.message ?? e
       );
       return Array.isArray(pages) ? pages : [];
     }
   },
   onProgress: progressLogger,
   shouldGenerate: exportFiles,
 });
 
+// Apply defensive filters for both removal and explicit status
+const defensivelyFiltered = rawData.filter((p) => {
+  const status = getStatusFromRawPage(p);
+  if (!includeRemoved && status === "Remove") return false;
+  if (statusFilter && status !== statusFilter) return false;
+  return true;
+});
+
+const pages = defensivelyFiltered.map((page) => transformPage(page));
+
Suggestion importance[1-10]: 7

__

Why: Applying statusFilter in the final defensive filter is accurate and useful to ensure consistency if upstream transform is bypassed. It’s a reasonable enhancement with moderate impact on correctness.

Medium
Prevent double-ending spinners

Avoid calling both fail/succeed and stop on the same spinner to prevent
double-ending and potential inconsistent states. Only stop the spinner if it is
still spinning, and do not call stop after succeed/fail.

scripts/notion-fetch/runFetch.ts [40-55]

 const fetchSpinner = ora(fetchSpinnerText).start();
 const unregisterFetchSpinner = trackSpinner(fetchSpinner);
-...
+try {
+  let data = await fetchNotionData(filter);
+  data = Array.isArray(data) ? data : [];
+  data = await sortAndExpandNotionData(data);
+  data = Array.isArray(data) ? data : [];
+  if (transform) {
+    const transformed = await transform(data);
+    data = Array.isArray(transformed) ? transformed : [];
+  }
+  if (fetchSpinner.isSpinning) {
+    fetchSpinner.succeed(chalk.green("Data fetched successfully"));
+  }
+  if (!shouldGenerate) {
+    return { data };
+  }
+  const generateSpinner = ora(generateSpinnerText).start();
+  const unregisterGenerateSpinner = trackSpinner(generateSpinner);
+  try {
+    const metrics = await generateBlocks(data, (progress) => {
+      if (generateSpinner.isSpinning) {
+        generateSpinner.text = chalk.blue(
+          `${generateSpinnerText}: ${progress.current}/${progress.total}`
+        );
+      }
+      onProgress?.(progress);
+    });
+    if (generateSpinner.isSpinning) {
+      generateSpinner.succeed(chalk.green("Blocks generated successfully"));
+    }
+    return { data, metrics };
+  } catch (error) {
+    if (generateSpinner.isSpinning) {
+      generateSpinner.fail(chalk.red("Failed to generate blocks"));
+    }
+    throw error;
+  } finally {
+    unregisterGenerateSpinner();
+  }
 } catch (error) {
-  fetchSpinner.fail(chalk.red("Failed to fetch data from Notion"));
+  if (fetchSpinner.isSpinning) {
+    fetchSpinner.fail(chalk.red("Failed to fetch data from Notion"));
+  }
   throw error;
 } finally {
-  if (fetchSpinner.isSpinning) fetchSpinner.stop();
   unregisterFetchSpinner();
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly notes not to call stop after succeed/fail to avoid double-ending spinners, which aligns with the PR’s spinner handling pattern. It’s a maintainability/robustness improvement, not critical to functionality.

Low
Avoid conflicting file filters

Using both files and glob can cause Lefthook to ignore the glob filter, running
tests when unrelated files change. Constrain the hook to staged changes within the
intended path by removing files and enabling stage_fixed to pass only staged files.
This prevents unnecessary test runs and speeds up commits.

lefthook.yml [26-29]

 - name: notion-fetch-tests
   run: bunx vitest run scripts/notion-fetch/__tests__/
-  files: git ls-files -m
+  stage_fixed: true
   glob: "scripts/notion-fetch/**/*.{ts,js}"
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly targets the new hook lines (26-29) and highlights a plausible Lefthook config conflict where files may override glob, causing unnecessary runs. Replacing files with stage_fixed: true is a reasonable improvement for performance, though impact is moderate and depends on project workflow.

Low
Improve pagination anomaly diagnostics

If an anomaly triggers early break, attach context (counts and last cursor) to the
warning to aid debugging. This helps identify data loss or API issues without
changing logic.

scripts/fetchNotionData.ts [21-71]

 while (hasMore) {
   if (++safetyCounter > MAX_PAGES) {
     console.warn(
-      "Pagination safety limit exceeded; returning partial results."
+      `Pagination safety limit exceeded at ${safetyCounter} pages; returning ${results.length} items so far.`
     );
     break;
   }
 
   const response = await enhancedNotion.databasesQuery({
     database_id: DATABASE_ID,
     filter,
     start_cursor: startCursor,
     page_size: 100,
   });
 
   const pageResults = Array.isArray(response.results) ? response.results : [];
 
-  // Detect duplicate IDs to avoid stalling and data corruption
   let duplicateDetected = false;
   for (const r of pageResults) {
     const id = (r as any)?.id;
     if (id && seenIds.has(id)) {
       duplicateDetected = true;
       break;
     }
     if (id) seenIds.add(id);
   }
 
   results.push(...pageResults);
 
   const prevCursor = startCursor;
   const prevCount = pageResults.length;
   hasMore = Boolean(response.has_more);
   startCursor = response.next_cursor ?? undefined;
 
   if (
     hasMore &&
     (duplicateDetected ||
       !startCursor ||
       startCursor === prevCursor ||
       prevCount === 0)
   ) {
     console.warn(
-      "Notion API pagination anomaly detected; stopping early with partial results."
+      `Notion API pagination anomaly detected (dup=${duplicateDetected}, prevCount=${prevCount}, prevCursor=${prevCursor}, nextCursor=${startCursor}); returning ${results.length} items so far.`
     );
     break;
   }
 }
Suggestion importance[1-10]: 5

__

Why: Adding more detailed context to warnings is correct and can aid debugging without changing behavior. It’s helpful but minor, focusing on observability rather than core functionality.

Low
Suggestions up to commit 8a93e0c
CategorySuggestion                                                                                                                                    Impact
Possible issue
Use regex match indices for replacements

Using indexOf on the original markdown later can target the wrong occurrence for
duplicate matches, causing mis-replacements. Capture match indices from regex
execution and use them to apply replacements deterministically.

scripts/notion-fetch/generateBlocks.ts [1174-1204]

 const imgRegex = /!\[([^\]]*)\]\(\s*((?:\\\)|[^)])+?)\s*\)/g;
-...
+const imageMatches: Array<{ full: string; url: string; alt: string; idx: number; start: number; end: number }> = [];
+let m: RegExpExecArray | null;
+let tmpIndex = 0;
+let safetyCounter = 0;
+const SAFETY_LIMIT = 500;
 while ((m = imgRegex.exec(sourceMarkdown)) !== null) {
   if (++safetyCounter > SAFETY_LIMIT) {
-    console.warn(
-      chalk.yellow(
-        `⚠️  Image match limit (${SAFETY_LIMIT}) reached; skipping remaining.`
-      )
-    );
+    console.warn(chalk.yellow(`⚠️  Image match limit (${SAFETY_LIMIT}) reached; skipping remaining.`));
     break;
   }
+  const start = m.index;
+  const full = m[0];
+  const end = start + full.length;
   const rawUrl = m[2];
   const unescapedUrl = rawUrl.replace(/\\\)/g, ")");
   imageMatches.push({
-    full: m[0],
+    full,
     url: unescapedUrl,
     alt: m[1],
     idx: tmpIndex++,
+    start,
+    end,
   });
 }
+...
+// When building replacements
+for (const match of imageMatches) {
+  // compute replacementText for this match...
+  indexedReplacements.push({ start: match.start, end: match.end, text: replacementText });
+}
Suggestion importance[1-10]: 8

__

Why: The PR later uses indexOf on the whole markdown which can mis-target duplicates; capturing start/end from regex and using them for replacements is a robust fix that prevents incorrect substitutions and aligns with the current image-processing flow.

Medium
Prevent premature pagination termination

Avoid stopping early solely due to duplicate IDs within a page; this can occur
legitimately across pages. Instead, skip duplicates when pushing into results and
only early-exit on robust anomalies (no next_cursor with has_more or zero results
repeatedly). This prevents silent data loss.

scripts/fetchNotionData.ts [37-71]

 const pageResults = Array.isArray(response.results) ? response.results : [];
 
-// Detect duplicate IDs to avoid stalling and data corruption
-let duplicateDetected = false;
+// Append only unseen items; duplicates are skipped but do not trigger early stop
 for (const r of pageResults) {
   const id = (r as any)?.id;
-  if (id && seenIds.has(id)) {
-    duplicateDetected = true;
-    break;
+  if (id) {
+    if (seenIds.has(id)) continue;
+    seenIds.add(id);
   }
-  if (id) seenIds.add(id);
+  results.push(r as Record<string, unknown>);
 }
 
-results.push(...pageResults);
-
 const prevCursor = startCursor;
-const prevCount = pageResults.length;
+const gotCount = pageResults.length;
 hasMore = Boolean(response.has_more);
 startCursor = response.next_cursor ?? undefined;
 
+// Only treat as anomaly when Notion signals more but gives no cursor or no items,
+// or when cursor doesn't advance across iterations.
 if (
   hasMore &&
-  (duplicateDetected ||
-    !startCursor ||
-    startCursor === prevCursor ||
-    prevCount === 0)
+  (!startCursor || startCursor === prevCursor || gotCount === 0)
 ) {
   console.warn(
     "Notion API pagination anomaly detected; stopping early with partial results."
   );
   break;
 }
Suggestion importance[1-10]: 8

__

Why: Skipping duplicates instead of early exit avoids silent data loss and makes pagination more robust; this directly improves correctness in the new pagination logic.

Medium
Consolidate filtering into API request

Avoid using the upstream filter when statusFilter is also provided, as the transform
will then apply a second filter that can over‑constrain results. Instead, build the
Notion API filter to reflect both includeRemoved and statusFilter, and keep the
transform focused on slicing (e.g., maxPages). This prevents silently dropping pages
due to double filtering.

scripts/notion-fetch-all/fetchAll.ts [67-96]

+const notionFilter = buildStatusFilter(includeRemoved, statusFilter);
 const { data: rawData = [], metrics } = await runFetchPipeline({
-  filter,
+  filter: notionFilter,
   fetchSpinnerText:
     fetchSpinnerText ??
     "Fetching ALL pages from Notion (excluding removed items by default)...",
   generateSpinnerText:
     generateSpinnerText ?? "Exporting pages to markdown files",
   transform: (pages) => {
     try {
       fetchedCount = Array.isArray(pages) ? pages.length : 0;
-      const transformed = applyFetchAllTransform(
-        Array.isArray(pages) ? pages : [],
-        {
-          statusFilter,
-          maxPages,
-          includeRemoved,
-        }
-      );
-      return Array.isArray(transformed) ? transformed : [];
+      // Only apply client-side constraints not expressible server-side (e.g., maxPages)
+      const transformed = typeof maxPages === "number" && maxPages > 0
+        ? (Array.isArray(pages) ? pages.slice(0, maxPages) : [])
+        : (Array.isArray(pages) ? pages : []);
+      return transformed;
     } catch (e) {
       console.warn(
         "fetchAll transform failed, using untransformed data:",
         (e as Error)?.message ?? e
       );
       return Array.isArray(pages) ? pages : [];
     }
   },
   onProgress: progressLogger,
   shouldGenerate: exportFiles,
 });
Suggestion importance[1-10]: 7

__

Why: Pushing statusFilter into the API avoids double filtering and reduces data transfer; the change is reasonable and aligns with the code’s transform stage. However, it’s an optimization rather than a critical bug fix.

Medium
General
Use stable cache directory for logs

Writing logs into the current working directory can fail when the CWD is unwritable
and can scatter files. Use a stable, project-root-relative path (e.g., alongside
other caches) and ensure its directory exists. This prevents permission errors and
keeps artifacts organized.

scripts/notion-fetch/generateBlocks.ts [169-171]

-const logPath = path.join(process.cwd(), "image-failures.json");
+const logsDir = path.join(__dirname, "../../.cache");
+fs.mkdirSync(logsDir, { recursive: true });
+const logPath = path.join(logsDir, "image-failures.json");
 const tmpPath = `${logPath}.tmp`;
 const MAX_ENTRIES = 5000;
Suggestion importance[1-10]: 7

__

Why: Writing logs to process.cwd() can be brittle; using a stable cache dir and ensuring it exists improves reliability and organization. The improved code aligns with the intent and integrates cleanly with the PR’s logging feature.

Medium
Harden project-root path check

The path check can falsely reject valid paths on Windows (different separators) and
when the output file is exactly the CWD. Normalize both paths and handle equality to
avoid blocking legitimate writes. Then write the file.

scripts/notion-fetch-all/index.ts [428-439]

 const candidatePath = path.resolve(outputFile);
-const cwd = process.cwd();
-if (!candidatePath.startsWith(path.resolve(cwd) + path.sep)) {
-  throw new Error(
-    `Refusing to write outside project directory: ${candidatePath}`
-  );
+const projectRoot = path.resolve(process.cwd());
+const normalizedCandidate = path.normalize(candidatePath);
+const normalizedRoot = path.normalize(projectRoot);
+const isInside =
+  normalizedCandidate === normalizedRoot ||
+  normalizedCandidate.startsWith(normalizedRoot + path.sep);
+if (!isInside) {
+  throw new Error(`Refusing to write outside project directory: ${candidatePath}`);
 }
-outputPath = candidatePath;
-// eslint-disable-next-line security/detect-non-literal-fs-filename
+outputPath = normalizedCandidate;
 fs.writeFileSync(outputPath, outputContent, "utf8");
Suggestion importance[1-10]: 7

__

Why: Normalizing paths and allowing equality prevents false negatives on Windows and when writing directly to the project root, improving cross-platform correctness without reducing safety.

Medium
Push statusFilter into API query

Add optional support for a specific statusFilter to be pushed down to Notion,
avoiding client-side filtering inconsistencies. This ensures server-side filtering
is authoritative and reduces data transfer and post-processing.

scripts/notion-fetch-all/fetchAll.ts [117-134]

-function buildStatusFilter(includeRemoved: boolean) {
+function buildStatusFilter(includeRemoved: boolean, statusFilter?: string) {
+  // If a specific status is requested, prefer that exact match regardless of includeRemoved
+  if (typeof statusFilter === "string" && statusFilter.trim()) {
+    return {
+      property: NOTION_PROPERTIES.STATUS,
+      select: { equals: statusFilter.trim() },
+    };
+  }
   if (includeRemoved) {
     return undefined;
   }
-
   return {
     or: [
       {
         property: NOTION_PROPERTIES.STATUS,
         select: { is_empty: true },
       },
       {
         property: NOTION_PROPERTIES.STATUS,
         select: { does_not_equal: "Remove" },
       },
     ],
   };
 }
Suggestion importance[1-10]: 6

__

Why: Extending buildStatusFilter to accept statusFilter is consistent with the previous suggestion and improves efficiency, but it’s a modest improvement and not critical to correctness.

Low
Avoid conflicting file filters

Ensure the test step only runs when relevant staged files change; using both files
and glob may not filter as intended. Replace with a single glob or a proper files
pattern referencing staged files. This prevents unexpected execution and speeds up
pre-commit.

lefthook.yml [26-29]

 - name: notion-fetch-tests
   run: bunx vitest run scripts/notion-fetch/__tests__/
-  files: git ls-files -m
   glob: "scripts/notion-fetch/**/*.{ts,js}"
Suggestion importance[1-10]: 6

__

Why: Using both files: git ls-files -m and a glob in lefthook can be conflicting or ineffective; simplifying to a single glob is a reasonable improvement for correctness and speed. The change matches the PR context, but impact is moderate and depends on how lefthook interprets these fields.

Low
Make test hook non-interactive, concise

Prepend the test command with the package runner to fail fast on missing bunx or
vitest and ensure non-interactive CI behavior. Add --run and --reporter=dot plus
--silent to reduce noisy output in hooks and enforce exit codes.

lefthook.yml [26-29]

 - name: notion-fetch-tests
-  run: bunx vitest run scripts/notion-fetch/__tests__/
-  files: git ls-files -m
+  run: bunx vitest run --run --reporter=dot --silent scripts/notion-fetch/__tests__/
   glob: "scripts/notion-fetch/**/*.{ts,js}"
Suggestion importance[1-10]: 5

__

Why: Adding --run, --reporter=dot, and --silent can streamline output and ensure non-interactive behavior, but it's optional and environment-dependent; bunx already enforces exit codes. The suggestion is reasonable yet not critical, with modest impact.

Low

@luandro luandro marked this pull request as draft September 26, 2025 13:51
@socket-security
Copy link

socket-security bot commented Sep 26, 2025

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Updated@​docusaurus/​tsconfig@​3.8.1 ⏵ 3.9.11001005699 +1100
Updated@​docusaurus/​preset-classic@​3.8.1 ⏵ 3.9.1100 +2100100 +31100 +2100

View full report

@luandro luandro marked this pull request as ready for review September 26, 2025 16:52
@luandro luandro marked this pull request as draft September 26, 2025 16:56
@luandro luandro force-pushed the fix-fetch-all-align-export-with-notion-page-scoping branch from 6f70b58 to 5b2329a Compare September 26, 2025 17:30
@luandro luandro marked this pull request as ready for review September 26, 2025 18:48
…ilities

- Add notionFixtures.ts with reusable test data structures
- Enhance helpers.ts with proper axios mocking patterns
- Improve mocks.ts with better mock implementations
- Update index.ts to export new test utilities

These utilities provide foundation for reliable test data creation,
proper HTTP mocking, and hermetic test isolation across the test suite.
- Add downloadImage.test.ts with retry logic and error handling tests
- Add runFetchPipeline.test.ts with pipeline coordination tests
- Add integration.test.ts with test utility validation
- Fix generateBlocks.test.ts with proper test structure

Key improvements:
- Implements exponential backoff retry testing with fake timers
- Comprehensive error handling coverage (timeouts, DNS, HTTP errors)
- Proper test data structure with Sub-item relations for multi-language pages
- Full mocking of axios, SpinnerManager, and image processing pipeline
- Hermetic tests with proper fs mocking to prevent file system side effects

All tests now pass (30+ assertions) providing reliable regression protection
for the Notion fetch pipeline, replacing need for manual testing.
- Add index.cli.test.ts with comprehensive CLI component testing
- Fix static method mocking for PreviewGenerator, StatusAnalyzer, ComparisonEngine
- Correct test imports to work with actual static class methods
- Add proper environment variable validation testing

Key fixes:
- Mock static methods instead of instance methods (generatePreview,
  analyzePublicationStatus, compareWithPublished)
- Update test assertions to validate static class API instead of instance API
- Ensure proper CLI initialization and error handling coverage

All 7 CLI tests now pass, providing coverage for the notion-fetch-all
command-line interface and its core components.
- Fix existing generateBlocks.test.ts with proper runFetchPipeline import
- Update TASK.md to reflect all completed deliverables and blockers resolved
- Ensure test imports match actual function names in implementation

This completes the final pieces of the test reliability improvement task,
ensuring all existing tests work with the new comprehensive test suite.
All deliverables from the Notion fetch test suite stabilization task
have been successfully completed:

- ✅ Fixed axios mocking and runFetch typos
- ✅ Added comprehensive retry, error handling, and fallback behavior tests
- ✅ Implemented hermetic testing with proper fs mocking
- ✅ All 37 tests passing across notion-fetch and notion-fetch-all suites

The test suite is now reliable and trustworthy enough to replace
manual regression testing as originally requested.
…essing

- Add robust title fallback logic in resolvePageTitle() function
- Implement createStandalonePageGroup() for pages without sub-items
- Fix heading consumption bug in Toggle section processing
- Update test fixtures and enhance test coverage
- Achieve 100% test pass rate (13/13 tests passing)

Resolves all issues documented in TASK.md:
- Title property access with fallback to legacy "Title" property
- Pages without Sub-item relations now properly processed
- Heading titles correctly applied to subsequent Toggle sections
…vements

- Move spinner cleanup to finally blocks for proper resource management
- Replace top-level await with async IIFE for better compatibility
- Add test-environment-aware retry logic (10-50ms in tests vs 1000-4000ms in production)
- Implement pagination safety limits (10,000 pages max) to prevent infinite loops
- Enhance input validation for status extraction with type checking
- Improve optional chaining usage for cleaner code
- Add cursor validation to prevent infinite pagination loops

Test performance improvement: 19.76s → 1.52s (92% faster)
All 13 tests passing with proper error handling and robustness maintained.
- Add comprehensive URL validation to prevent empty/invalid image references
- Implement intelligent fallback system that replaces failed images with informative placeholders
- Add persistent image cache to prevent re-downloading and improve performance
- Create detailed failure logging system for manual recovery (image-failures.json)
- Enhance error handling with 5-phase processing: validate → download → cache → fallback → sanitize
- Add comprehensive statistics and progress reporting
- Ensure zero image loss - all images are either downloaded successfully or replaced with documented placeholders
- Prevent empty URLs in markdown that cause MDX compilation errors

Key improvements:
- URL validation catches empty, null, undefined, and malformed URLs before processing
- Caching system prevents re-downloading same images across runs
- Fallback placeholders preserve context with original URL for recovery
- Comprehensive logging enables manual recovery of failed images
- 100% protection against empty image references in final markdown
- Performance optimized with parallel processing and intelligent caching

All tests passing with 1.38s duration maintained.
@luandro luandro force-pushed the fix-fetch-all-align-export-with-notion-page-scoping branch from 57af855 to d0fb89e Compare September 26, 2025 21:46
@luandro luandro marked this pull request as draft September 26, 2025 21:47
@luandro luandro marked this pull request as ready for review September 26, 2025 21:48
@luandro luandro marked this pull request as draft September 26, 2025 21:51
- Fix image cache path normalization to prevent cache misses
- Ensure translation directory creation to prevent runtime errors
- Add test environment guard to graceful shutdown to prevent test runner exits
- Improve image failure logging atomicity with temp file strategy

All suggestions from PR #28 automated review have been applied.
Tests passing with enhanced error handling and robustness.
@luandro luandro marked this pull request as ready for review September 26, 2025 21:55
- Extended gracefulShutdown to return exit code when NODE_ENV=test
- Updated main() function to return exit code from gracefulShutdown
- Fixed all 11 failing test assertions to check returned exit codes
- Production behavior unchanged (still calls process.exit)
- All 15 integration tests now pass (previously 11 were failing)

Resolves issue where gracefulShutdown skipped process.exit in test mode,
leaving mocked process.exit unused and causing test expectation failures.
- Normalize cached image path format in downloadAndProcessImageWithCache
- Add safe error normalization in catch blocks for unknown error types
- Fix status comparison to use explicit string literal
- Add explicit pagination size to Notion API queries

These changes improve error handling robustness and ensure consistent
behavior across cached vs fresh image processing paths.
@luandro luandro marked this pull request as draft September 26, 2025 22:30
@luandro luandro marked this pull request as ready for review September 26, 2025 22:30
- Prevent URL mock overrides by implementing routing system in test mocks
- Fix brittle image regex matching by collecting matches before processing
- Avoid crashes when metrics are absent in index.ts
- Guard metrics when generation is skipped in fetchAll.ts
- Align image cache path resolution using consistent IMAGES_PATH constant
- Fix variable reference issue in image processing phase

These changes improve testing reliability, error handling robustness,
and image processing consistency based on pr-agent feedback.
@luandro luandro marked this pull request as draft September 27, 2025 03:51
@luandro luandro marked this pull request as ready for review September 27, 2025 03:51
@luandro luandro marked this pull request as draft September 27, 2025 04:22
- Add safety cap for image processing to prevent runaway loops (500 image limit)
- Await image failure logging promises to prevent unhandled rejections
- Add numeric guards for metrics logging to handle non-finite values
- Safely normalize content-type headers for robust format detection
- Add defensive checks for pipeline output validation with array guards
- Validate data array integrity across fetch pipeline transformations

These improvements enhance error handling, prevent resource exhaustion,
and ensure robust operation under various edge conditions.
@luandro luandro marked this pull request as ready for review September 27, 2025 04:27
@luandro luandro marked this pull request as draft September 27, 2025 04:35
- Add pagination loop guard to prevent infinite loops on empty pages
- Fix image sanitization regex to avoid false positives on valid URLs
- Make logging atomic and resilient with proper temp file cleanup
- Add spinner state checks to prevent double-stopping issues
- Add defensive post-pipeline filtering for Remove status pages
- Ensure unhandled promise rejections are properly caught

These changes improve the reliability and robustness of the Notion
data fetching and processing pipeline based on PR review feedback.
@luandro luandro marked this pull request as ready for review September 27, 2025 04:39
@luandro luandro marked this pull request as draft September 27, 2025 04:42
Implements several key improvements based on PR review:

- Use index-based safe replacements to avoid string replacement issues
- Harden pagination against duplicates and infinite loops
- Make logging mutex failure-safe with error recovery
- Safeguard spinner state checks for test environment compatibility
- Prevent regex backtracking and ReDoS attacks
- Sanitize error logging to avoid sensitive context leakage

These changes improve system reliability, security, and test compatibility
while maintaining existing functionality.
@luandro luandro marked this pull request as ready for review September 27, 2025 04:47
@luandro luandro marked this pull request as draft September 27, 2025 04:50
- Validate unified database ID source to properly check resolvedDatabaseId instead of just process.env.DATABASE_ID
- Add output path validation for security to prevent writing outside project directory
- Make pagination more graceful by using warnings instead of throwing errors on anomalies
- Add error handling around transform step to prevent pipeline crashes
- Remove ineffective path normalization on image file basenames
- Update test expectations to match new error messages
@luandro luandro marked this pull request as ready for review September 27, 2025 04:54
@luandro luandro marked this pull request as draft September 27, 2025 05:02
@luandro luandro marked this pull request as ready for review September 27, 2025 05:02
@luandro luandro marked this pull request as draft September 27, 2025 05:05
- Fix output path safety check using path.relative for cross-platform compatibility
- Sanitize translation keys and values to prevent corrupted code.json files
- Harden image failure logging with serialization guards and directory creation
- Honor statusFilter in final defensive guard to ensure consistent filtering
- Prevent double-ending spinners by removing redundant stop() calls
- Fix conflicting lefthook file filters by using stage_fixed instead of files
@luandro luandro marked this pull request as ready for review September 27, 2025 08:09
@luandro luandro marked this pull request as draft September 27, 2025 09:55
- Use recorded match indices for image replacements to prevent incorrect matches
- Sanitize cached image filenames to prevent directory traversal
- Harden output path containment check with better validation
- Add retry mechanism for pagination anomalies
- Improve spinner lifecycle guarding to prevent TTY issues

Addresses high and medium impact suggestions from PR review
@luandro luandro marked this pull request as ready for review September 27, 2025 10:01
@luandro luandro marked this pull request as draft September 27, 2025 10:04
- Preserve retry results during pagination anomaly handling to prevent data loss
- Update Lefthook config to use npx and scope tests to test files only
- Add --run flag to vitest commands for consistent CI behavior
- Quote specific test file paths to prevent shell expansion issues

Addresses latest review suggestions for improved reliability
@luandro luandro marked this pull request as ready for review September 27, 2025 10:06
@luandro luandro marked this pull request as draft September 27, 2025 10:11
@luandro luandro marked this pull request as ready for review September 27, 2025 10:11
@github-actions
Copy link
Contributor

Failed to generate code suggestions for PR

@luandro luandro merged commit 1b3b703 into main Sep 27, 2025
5 checks passed
@luandro luandro deleted the fix-fetch-all-align-export-with-notion-page-scoping branch February 13, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(fetch-all): align export with notion page scoping

1 participant