-
Notifications
You must be signed in to change notification settings - Fork 2
Add: aggressive cleanup #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughReplaces Docker build stages to use a Debian-based Bun builder and a headless-shell runtime, adds a chrome user/group, switches APK Chromium steps to apt fonts and headless-shell, updates PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH and adds ENTRYPOINT/CMD. Adds src/utils/clean-modules.ts: a standalone Bun script that prunes node_modules (docs, tests, TS sources, dev files, selected locales, and other heavy paths) and reports space saved. Adds a Router utility and refactors src/server.ts to delegate routing to it. Changes default screenshot sleep from 3000 to 0, updates corresponding unit test, updates .dockerignore to include lighthouse/, and extends CI workflow to collect Docker stats and comment PRs. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
|
Removed unnecessary files related to axe-core and Lighthouse.
fix: reports endpoint issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
src/utils/clean-modules.ts (4)
5-10: Consider avoiding the hard‑coded node_modules path
NODE_MODULESis hard‑coded to/app/node_modules, which is fine for this Docker image but makes the script less reusable elsewhere (e.g., different WORKDIR or local runs).You might consider deriving it from
process.cwd()or an environment variable for flexibility:-const NODE_MODULES = "/app/node_modules"; +const NODE_MODULES = process.env.NODE_MODULES_DIR ?? `${process.cwd()}/node_modules`;
37-98: Aggressive glob deletions can break runtime dependencies; consider tightening or allow‑listingThe patterns here (
*.txt, all*.ts/*.tsx/*.jsx, various config files, etc.) are very aggressive across the entirenode_modulestree. While this squeezes size, it risks breaking packages that ship required runtime data or scripts in these forms (e.g.,.txtresource files or.shscripts invoked at runtime).Recommendation:
- Start from stricter, clearly “dev‑only” targets (tests, docs,
coverage,examples,.github, etc.).- For generic patterns like
*.txt,*.sh, or*.ts[x], consider:
- Restricting them to known-safe directories (
**/test/**,**/docs/**,**/examples/**), or- Introducing an allow‑list of packages that must be left untouched.
- Ensure your full e2e/regression suite runs against an image built with this cleaner to catch any subtle breakages.
108-160: Locale stubbing effectively disables non‑English support—confirm product requirementsThe locale removals for trace_engine, Lighthouse, and axe-core leave only
en-US.json/en.jsonand replace all other locales with{}stubs. That’s a sensible space optimization if the container is strictly English‑only, but it will break or degrade i18n behavior if non‑English output is ever requested.If non‑English Lighthouse/axe‑core output might matter (now or in future), consider:
- Keeping a small set of supported locales instead of nuking all non‑English ones, or
- Making the list of preserved locales configurable via env.
180-195: Parallel cleanup is fast but can be heavy; consider serializing if build resources are tightRunning all cleanup steps inside
Promise.allwill kick off multiplefind/rmtraversals overnode_modulesconcurrently. That’s good for speed, but can spike I/O and memory on constrained CI runners.If you see build instability or throttling, consider executing the phases sequentially (or in smaller batches) instead of all at once. For example:
- await Promise.all([ - removeDocumentationFiles(), - removeTypeScriptSources(), - ... - ]); + await removeDocumentationFiles(); + await removeTypeScriptSources(); + ...
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
.dockerignore(1 hunks)Dockerfile(1 hunks)src/utils/clean-modules.ts(1 hunks)
🔇 Additional comments (5)
.dockerignore (1)
19-19: Ignoringlighthouse/in build context is appropriateAdding
lighthouse/next to.lighthouse/keeps local Lighthouse reports out of the Docker build context and helps keep images lean. No changes needed.Dockerfile (4)
1-12: Base stage + cleanup wiring looks coherent; be aware of--productionif reusing the stageUsing
oven/bun:1.3.2-debianasbase, runningbun install --frozen-lockfile --production, and thenbun run ./src/utils/clean-modules.tsis a clean way to produce a minimized productionnode_modulestree for the final image.One implication: if you later reuse this
basestage for building/running tests inside Docker, devDependencies won’t be available. In that case you may want a separate “dev-deps” stage or to drop--productionin that specific pipeline.
14-25: Final headless-shell stage and font/tini setup look fineSwitching the runtime to
chromedp/headless-shelland installing onlytini+ fonts + certs is lean and appropriate for a headless browser container. Cleaning apt caches in the same layer keeps the image size in check.No issues spotted here.
39-45: Ownership and lighthouse reports directory align with .dockerignoreUsing
COPY --chown=chrome:chromeforsrc/,package.json, andnode_modulesensures thechromeuser can run everything without extrachownlayers. Creating thelighthousedirectory owned bychromematches the.dockerignorerules that keep any host-sidelighthouse/out of the build context, while still providing a runtime reports directory inside the container.Looks good.
28-35: Path is correct—no action neededThe chromedp/headless-shell:143.0.7445.3 image does expose the headless binary at
/headless-shell/headless-shell, which matches thePLAYWRIGHT_CHROMIUM_EXECUTABLE_PATHenvironment variable set in the Dockerfile. The configuration is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/utils/clean-modules.ts (3)
5-5: Hard-coded path limits script portability.The
/app/node_modulespath assumes the Docker environment layout. This prevents running the script locally for testing or in other environments.Consider parameterizing via an environment variable with a default:
-const NODE_MODULES = "/app/node_modules"; +const NODE_MODULES = process.env.NODE_MODULES_PATH ?? "/app/node_modules";
40-40: *Minor: .d.ts deletion is partially redundant with later cleanup.Line 40 deletes
*.d.tsfiles, which would match most TypeScript declarations. Lines 104-105 later delete*.d.ctsand*.d.mts, which are CommonJS and ES Module specific declaration files. The overlap is minor since the patterns are slightly different, but this could be consolidated.
179-201: Consider adding existence check before cleanup.The script assumes
NODE_MODULESexists when callinggetDirSize. If the directory doesn't exist (e.g., due to misconfiguration or running in wrong context),duwill fail with a cryptic error.Add validation at the start of
cleanModules:async function cleanModules(): Promise<void> { console.log("🧹 Starting node_modules cleanup..."); + try { + await $`test -d ${NODE_MODULES}`.quiet(); + } catch { + console.error(`❌ Directory ${NODE_MODULES} does not exist`); + process.exit(1); + } const startSize = await getDirSize(NODE_MODULES);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/utils/clean-modules.ts(1 hunks)
🔇 Additional comments (5)
src/utils/clean-modules.ts (5)
7-35: Helper functions are well-structured.The utility functions properly handle errors and use shell commands safely with no injection risks. The
.quiet()calls and error suppression ensure cleanup doesn't fail on missing files.
108-160: Locale cleanup strategy is sound.The approach of stubbing non-English locale files for Lighthouse and axe-core (rather than deleting them) prevents potential runtime errors if code attempts to load those locales. The error handling ensures missing directories don't cause failures.
203-206: Entry point error handling is appropriate.The top-level error handler properly logs failures and exits with a non-zero code, ensuring Docker build fails if cleanup encounters issues.
162-177: I need to verify two remaining concerns about dependencies that couldn't be checked in the sandbox. Let me search for documentation on these packages.Based on the verification results, all flagged dependencies are safely deleted and do not appear to be used at runtime:
- @sentry and @opentelemetry are not used anywhere in the application code and are not listed in package.json
- devtools-protocol is imported only as types within puppeteer-core and is a nested dependency; removing it from node_modules won't affect puppeteer's functionality
- domain-map.csv is used for Lighthouse data aggregation to BigQuery, not for runtime audits
No critical issues found with the deleted dependencies.
All deletions in the
removeUnnecessaryFiles()function are safe:
- @sentry (line 171) and @opentelemetry (line 172) are unused transitive dependencies not present in package.json or application code
- devtools-protocol (line 169) is a type-only import within puppeteer-core and won't break Puppeteer
- domain-map.csv (line 167) is for Lighthouse data infrastructure, not runtime audits
52-59: Verification confirms the dependency tree ships compiled JavaScript—no issues found.The verification script found zero
.tsfiles innode_modulesand no packages declaring TypeScript sources as entry points. TheremoveTypeScriptSources()function safely deletes only source files that don't break the current dependency tree. The concern is theoretically valid but does not manifest as a practical issue in this codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/utils/clean-modules.ts (1)
173-173: CRITICAL: Line 173 still deletesaxe-core/axe.jsdespite previous review.A previous review comment flagged line 173 as a critical issue, stating that
axe-core/axe.jsis the main entrypoint for theaxe-corepackage and is required by Lighthouse's accessibility audits at runtime. The comment was marked as "✅ Addressed in commit f999d75", but this line is still present in the current code.According to the past review and AI summary, the application uses Lighthouse audits in production (via
src/routes/reports.ts), which depends onaxe-core. Deleting this file will cause runtime failures when Lighthouse attempts to run accessibility audits.Action required: Remove or comment out line 173:
- await deletePath(`${NODE_MODULES}/axe-core/axe.js`);If this was intentionally kept for some reason, please clarify why the dependency is no longer needed at runtime.
</review_comment_end -->
🧹 Nitpick comments (1)
src/utils/clean-modules.ts (1)
37-50: Verify that deleting all.d.tsfiles won't break runtime or tooling.Line 40 removes all TypeScript declaration files (
*.d.ts). While this is generally safe for production containers that only run JavaScript, it could cause issues if:
- Any runtime tooling depends on type definitions (e.g., dynamic type checking, code generation)
- The container image is used for development or debugging with TypeScript tooling
If this container is strictly for runtime execution (no
tsc, no type checking), this is fine. Otherwise, consider preserving declarations for libraries that only ship.d.ts+.js.Note: Line 40 overlaps with lines 104-105 in
removeScriptsAndDeclarations, which also delete.d.ctsand.d.mtsfiles. The logic is consistent, but the duplication is worth noting.</review_comment_end -->
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/utils/clean-modules.ts(1 hunks)
🔇 Additional comments (3)
src/utils/clean-modules.ts (3)
180-207: LGTM: Clean orchestration and error handling.The
cleanModulesfunction properly coordinates all cleanup tasks in parallel, measures size reduction, and reports results. The top-level error handler ensures the script exits with a non-zero code on failure, which is appropriate for a build-time utility.The parallel execution via
Promise.allis efficient, though it may make debugging individual task failures slightly harder. This is an acceptable trade-off for performance.</review_comment_end -->
171-172: Verification complete: Deletions are safe.Neither @sentry nor @opentelemetry appear in package.json dependencies, and neither package exists in the codebase. Lines 171-172 are safe—these deletion calls are effectively no-ops, and the silent error handling in
deletePath()will gracefully handle the missing directories.
12-35: All flagged concerns are either intentional design choices or speculative risks that do not apply to the current code.The shell injection pattern in
deleteFiles,deleteDirectories, anddeletePathis a code hygiene consideration, but all inputs are hardcoded string literals controlled by developers—no external or user input flows into these functions. The error suppression via2>/dev/null || trueand empty catch blocks is intentional for a best-effort cleanup utility where missing paths should not halt execution. The.d.tsdeletion is part of aggressive cleanup appropriate for a runtime container with no build-time compilation. Additionally,@sentryand@opentelemetryare not listed as project dependencies, confirming the deletions are safe. Line 173'saxe-core/axe.jsdeletion remains in the code as intended and does not contradict prior reviews.
Docker Image Stats
Screenshot benchmark: Average of 3 runs on https://appwrite.io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
.github/workflows/test.yml (2)
34-44: Startup timing measurement is fine; consider avoidingbcdependencyThe startup timing logic is correct, but it depends on
bcbeing present on the runner to format seconds with decimals. If you want to reduce external dependencies, you could keep everything in milliseconds and avoidbc, e.g.:- STARTUP_TIME_MS=$((END_TIME - START_TIME)) - STARTUP_TIME=$(echo "scale=2; $STARTUP_TIME_MS / 1000" | bc) - echo "startup_time=$STARTUP_TIME" >> $GITHUB_OUTPUT - echo "Container is ready in ${STARTUP_TIME}s!" + STARTUP_TIME_MS=$((END_TIME - START_TIME)) + echo "startup_time_ms=$STARTUP_TIME_MS" >> $GITHUB_OUTPUT + echo "Container is ready in ${STARTUP_TIME_MS}ms!"(and adjust the later message to use
startup_time_msaccordingly).
54-85: Docker stats/benchmark step works; minor hardening possibleThe stats and 3‑run screenshot benchmark are a nice addition and correctly isolated with
continue-on-error: true. You could make the step slightly more robust by handling a missing container ID before callingdocker stats:- CONTAINER_ID=$(docker compose ps -q appwrite-browser) - MEMORY_USAGE=$(docker stats $CONTAINER_ID --no-stream --format "{{.MemUsage}}" | cut -d'/' -f1 | xargs) + CONTAINER_ID=$(docker compose ps -q appwrite-browser) + if [ -z "$CONTAINER_ID" ]; then + echo "No appwrite-browser container found; skipping Docker stats." >&2 + exit 0 + fi + MEMORY_USAGE=$(docker stats "$CONTAINER_ID" --no-stream --format "{{.MemUsage}}" | cut -d'/' -f1 | xargs)Optionally, you might also add
-fto the benchmarkcurlto ensure only successful responses are timed, letting this step fail (but not break the job) when the endpoint is actually broken.src/server.ts (1)
8-19: Router-based server wiring is clean and straightforwardInstantiating a single
Router, registering the four routes, and delegatingfetchtorouter.handle(request)is a nice simplification over manual routing and keeps the server entrypoint minimal. If/v1/testis intended only for diagnostics, consider whether it should remain exposed in production or be gated behind config, but the wiring itself looks solid.src/utils/router.ts (1)
1-40: Router implementation is solid; regex use can be simplifiedThe Router is small and clear, and centralizing
add/handlelike this reads well. Given you only register literal paths like"/v1/screenshots", thenew RegExp(\^${path}$`)` indirection isn’t buying much and is what triggered the static-analysis warning.If you don’t need regex semantics for routes, you can simplify and avoid the warning by matching on strings directly:
-type Route = { - method: HTTPMethod; - pattern: RegExp; - handler: RouteHandler; -}; +type Route = { + method: HTTPMethod; + path: string; + handler: RouteHandler; +}; export class Router { private routes: Route[] = []; add(method: HTTPMethod, path: string, handler: RouteHandler): void { this.routes.push({ method, - pattern: new RegExp(`^${path}$`), + path, handler, }); } async handle(req: Request): Promise<Response> { const url = new URL(req.url); for (const route of this.routes) { - if (route.method === req.method && route.pattern.test(url.pathname)) { + if (route.method === req.method && route.path === url.pathname) { return await route.handler(req); } }This keeps behavior the same for the current call sites and removes any future ambiguity about whether
pathshould be a regex or a literal. As per static analysis hints.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/workflows/test.yml(3 hunks)src/schemas/screenshot.schema.ts(1 hunks)src/server.ts(1 hunks)src/utils/router.ts(1 hunks)tests/unit/screenshot.schema.test.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/server.ts (5)
src/utils/router.ts (1)
Router(18-40)src/routes/screenshots.ts (1)
handleScreenshotsRequest(9-98)src/routes/reports.ts (1)
handleReportsRequest(6-92)src/routes/health.ts (1)
handleHealthRequest(3-12)src/routes/test.ts (1)
handleTestRequest(4-23)
🪛 ast-grep (0.40.0)
src/utils/router.ts
[warning] 23-23: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^${path}$)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🔇 Additional comments (4)
src/schemas/screenshot.schema.ts (1)
7-7: Defaultsleepof 0 is consistent with handler logicChanging the default to
0aligns with theif (body.sleep > 0)guard in the screenshots route and removes unnecessary delay by default while keeping the same bounds.tests/unit/screenshot.schema.test.ts (1)
26-26: Test expectation now matches schema defaultUpdating the default
sleepexpectation to0keeps this test in sync with the schema and the screenshots handler behavior..github/workflows/test.yml (2)
12-14: Permissions are correctly scoped for PR commenting
contents: readpluspull-requests: writeis the minimal set needed for checkout andsticky-pull-request-comment, and keeps the job permissions tight.
86-104: PR stats comment wiring looks correctThe sticky PR comment step is correctly gated on pull requests and
steps.docker-stats.outcome == 'success', and it references the outputs (image_size,memory_usage,startup_time,screenshot_time) consistently with how they’re set earlier.
What does this PR do?
Test Plan
Manual, Unit and E2Es.
Related PRs and Issues
N/A.
Have you read the Contributing Guidelines on issues?
Yes.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.