Skip to content

Drop legacy task-html storage bucket#391

Merged
simonsmallchua merged 3 commits into
mainfrom
work/drop-task-html-bucket
May 21, 2026
Merged

Drop legacy task-html storage bucket#391
simonsmallchua merged 3 commits into
mainfrom
work/drop-task-html-bucket

Conversation

@simonsmallchua
Copy link
Copy Markdown
Contributor

@simonsmallchua simonsmallchua commented May 21, 2026

Summary

  • Adds migration supabase/migrations/20260521000000_drop_task_html_bucket.sql which removes the service-role RLS policy on storage.objects for bucket_id = 'task-html' and deletes the bucket row from storage.buckets.
  • Updates CHANGELOG.md (Unreleased) with the rationale and operational precondition.
  • Marks section 11 of docs/plans/page-content-storage-plan.md as historical/superseded.

Why

Page HTML has been written direct to Cloudflare R2 since 2026-04-25 (issue #332). No Go code path still references the task-html Supabase Storage bucket, but the bucket retained the objects written during the four-week window when it was the hot store. The accumulated bytes pushed the Supabase project past its 100 GB storage allowance, which triggered Supabase's connection-slot restriction and surfaced in Sentry as pgconn.ConnectError: FATAL: remaining connection slots are reserved for roles with the SUPERUSER attribute (HOVER-JG).

Operational precondition (manual step before applying migration)

Bucket contents must be cleared via the Supabase Storage dashboard or API before this migration is applied. The foreign key from storage.objects to storage.buckets will block the migration otherwise — that is the intended safety net.

What this PR does NOT do

  • Does not delete bucket contents (manual, owner-driven step above).
  • Does not drop tasks.html_storage_* columns. Historical rows still reference them and a follow-up PR will review removal once retrieval flows are confirmed unused.
  • Does not touch the archive sweep or any R2 code paths.

Test plan

  • Confirm via the per-bucket size query that task-html is the bucket consuming the overage (already inspected; objects total ~117 GB).
  • Empty the task-html bucket from the Supabase Storage dashboard.
  • Apply the migration in a non-production environment first; verify it runs cleanly and that SELECT * FROM storage.buckets WHERE id = 'task-html' returns zero rows afterwards.
  • Confirm no new errors appear in Sentry HOVER-JG once the connection-slot restriction lifts (Supabase usage refresh is up to one hour).
  • Apply in production.

View with Codesmith Autofix with Codesmith
Need help on this PR? Tag @codesmith with what you need. Autofix is disabled.

Summary by CodeRabbit

  • Documentation

    • Marked the legacy storage as historical, removed guidance for its use, and clarified that per-task page HTML now uses private Cloudflare R2; updated changelog with migration notes.
  • Chores

    • Removed legacy storage access policy and cleared dangling references so tasks now rely on Cloudflare R2; preserved HTML metadata for historical analysis and documented the manual cleanup step.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb562376-ef72-42da-bf1a-d20bc1d6db37

📥 Commits

Reviewing files that changed from the base of the PR and between 2c20949 and 77ff653.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • docs/plans/page-content-storage-plan.md
  • supabase/migrations/20260521000000_drop_task_html_bucket.sql
✅ Files skipped from review due to trivial changes (1)
  • docs/plans/page-content-storage-plan.md

📝 Walkthrough

Walkthrough

This PR clears dangling task-html pointers on tasks, drops the legacy service-role RLS policy on storage.objects, and updates CHANGELOG and the page-content storage plan to mark task-html as historical and document manual bucket-deletion and data-cleanup details.

Changes

Storage Bucket Cleanup

Layer / File(s) Summary
Clear dangling task-html pointers
supabase/migrations/20260521010000_clear_task_html_pointers.sql
Updates tasks rows where html_storage_bucket = 'task-html', setting html_storage_bucket and html_storage_path to NULL while preserving other HTML metadata fields for historical analysis.
Drop service-role storage.objects policy
supabase/migrations/20260521000000_drop_task_html_bucket.sql
Migration drops the Service role can manage task html RLS policy on storage.objects; the migration notes that the storage.buckets row is not removed by SQL and requires manual deletion after emptying the bucket.
Changelog and plan doc updates
CHANGELOG.md, docs/plans/page-content-storage-plan.md
Adds an Unreleased Changed entry documenting retirement of task-html in favor of Cloudflare R2 and updates the Suggested Bucket Setup to mark task-html as historical/superseded and point to ARCHIVE_BUCKET usage.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Good-Native/hover#312: Both PRs touch tasks.html_storage_*—this PR clears pointers to task-html, while #312 adds partial indexes on tasks.html_storage_path for archive/maintenance queries.

Poem

🐰 I hopped through rows and nulled a name,
The old bucket slept while R2 took aim.
Policies dropped with a gentle bite,
Docs updated by lantern light.
A tidy burrow — storage set right.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: dropping the legacy task-html storage bucket. It is concise, clear, and directly reflects the primary objective documented in the PR objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch work/drop-task-html-bucket

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

Release Versions

App patch: v0.34.15v0.34.16

Changelog

Changed

  • Retired the legacy task-html Supabase Storage bucket. Page HTML has been
    written directly to Cloudflare R2 since 2026-04-25, so the bucket was no
    longer referenced by any code path but had retained the objects written during
    the four-week window when it was the hot store. The accumulated bytes pushed
    the Supabase project past its 100 GB allowance and triggered connection-slot
    restrictions on the pooler, surfacing as pgconn.ConnectError events in
    Sentry (HOVER-JG). The migration drops only the service-role RLS policy on
    storage.objects. Removal of the bucket row itself cannot be done via SQL
    (Supabase blocks direct deletes from storage.buckets with SQLSTATE 42501)
    and must be performed via the Supabase Storage dashboard or API as a manual
    operational step, after the bucket has been emptied.
  • Cleared dangling task-html pointers on the tasks table. Rows written
    between 2026-03-21 and 2026-04-25 had html_storage_bucket = 'task-html' and
    a html_storage_path referencing the now-removed bucket. Both columns are
    NULLed for those rows; the remaining HTML metadata columns
    (html_content_type, html_content_encoding, html_size_bytes,
    html_compressed_size_bytes, html_sha256, html_captured_at) are kept for
    historical analysis. The html_storage_* columns remain in active use for
    newer rows, which point at the Cloudflare R2 bucket.

@supabase
Copy link
Copy Markdown

supabase Bot commented May 21, 2026

Updates to Preview Branch (work/drop-task-html-bucket) ↗︎

Deployments Status Updated
Database Thu, 21 May 2026 08:32:43 UTC
Services Thu, 21 May 2026 08:32:43 UTC
APIs Thu, 21 May 2026 08:32:43 UTC

Tasks are run on every commit but only new migration files are pushed.
Close and reopen this PR if you want to apply changes from existing seed or migration files.

Tasks Status Updated
Configurations Thu, 21 May 2026 08:32:44 UTC
Migrations Thu, 21 May 2026 08:32:47 UTC
Seeding Thu, 21 May 2026 08:32:54 UTC
Edge Functions Thu, 21 May 2026 08:32:57 UTC

View logs for this Workflow Run ↗︎.
Learn more about Supabase for Git ↗︎.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
docs/plans/page-content-storage-plan.md (1)

196-199: ⚡ Quick win

Do a consistency pass so this plan has one canonical storage path.

This section now correctly says R2 is canonical, but other sections still describe Supabase Storage as the active upload target. Please align the remaining sections to avoid contradictory operator guidance.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/plans/page-content-storage-plan.md` around lines 196 - 199, Several
sections still describe Supabase Storage (the `task-html` bucket) as the active
upload target while the plan's canonical storage is Cloudflare R2 via
`ARCHIVE_BUCKET` (production: `native-hover-archive`); update the document so R2
is the single source of truth. Locate all references to `task-html`, "Supabase
Storage", or instructions that tell operators to upload/read page HTML from
Supabase and replace them with the R2 workflow: mention `ARCHIVE_BUCKET` as the
configured env var, update examples and CLI/config snippets to use the R2
path/keys, remove any "active upload target" language for Supabase or mark it
explicitly as historical/superseded, and ensure the "## 3. Recommended Approach"
section and any upload/read procedures are consistent with the R2 design.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Line 34: Replace the typo phrase "written direct to Cloudflare R2" with
"written directly to Cloudflare R2" in the CHANGELOG entry so the wording reads
correctly; search for the exact string "written direct to Cloudflare R2" and
update it to "written directly to Cloudflare R2".

In `@docs/plans/page-content-storage-plan.md`:
- Around line 201-204: The superseded/historical section still lists
active-sounding bucket settings—remove the stale bullets referencing the
`task-html` bucket (bucket name: `task-html`, visibility: private, access:
signed URLs only) or convert them to an explicit archived note; update the
`task-html` bullet list to either delete these entries or prefix them with a
clear "Archived / Historical" label and brief context so readers know they are
no longer current guidance.

In `@supabase/migrations/20260521000000_drop_task_html_bucket.sql`:
- Line 17: The migration file 20260521000000_drop_task_html_bucket.sql contains
a destructive statement "DELETE FROM storage.buckets WHERE id = 'task-html';"
which violates the additive-only migration rule; replace this destructive delete
with a non-destructive deprecation step (e.g., update a bucket metadata/flags
column or insert a deprecation record) or, if deletion is required, add an
explicit approved-exception record and reference it in the migration (capture
approval id or ticket) before merging; locate the DELETE statement in
20260521000000_drop_task_html_bucket.sql and either change it to an
UPDATE/INSERT that marks the 'task-html' bucket as deprecated/inactive or add
the documented approved-exception metadata so the destructive delete is
explicitly allowed.
- Around line 15-17: Wrap the two statements — the DROP POLICY IF EXISTS
"Service role can manage task html" ON storage.objects and DELETE FROM
storage.buckets WHERE id = 'task-html' — in an explicit transaction so they run
atomically; open a transaction (e.g., BEGIN or START TRANSACTION), run both
statements, and COMMIT at the end (so if the DELETE fails the DROP is rolled
back), ensuring proper semicolons and error-safe transactional semantics.

---

Nitpick comments:
In `@docs/plans/page-content-storage-plan.md`:
- Around line 196-199: Several sections still describe Supabase Storage (the
`task-html` bucket) as the active upload target while the plan's canonical
storage is Cloudflare R2 via `ARCHIVE_BUCKET` (production:
`native-hover-archive`); update the document so R2 is the single source of
truth. Locate all references to `task-html`, "Supabase Storage", or instructions
that tell operators to upload/read page HTML from Supabase and replace them with
the R2 workflow: mention `ARCHIVE_BUCKET` as the configured env var, update
examples and CLI/config snippets to use the R2 path/keys, remove any "active
upload target" language for Supabase or mark it explicitly as
historical/superseded, and ensure the "## 3. Recommended Approach" section and
any upload/read procedures are consistent with the R2 design.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bffa4f2c-d987-448c-8e3f-c84dbfbf3192

📥 Commits

Reviewing files that changed from the base of the PR and between 598f771 and dbe969e.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • docs/plans/page-content-storage-plan.md
  • supabase/migrations/20260521000000_drop_task_html_bucket.sql

Comment thread CHANGELOG.md Outdated
Comment thread docs/plans/page-content-storage-plan.md Outdated
Comment thread supabase/migrations/20260521000000_drop_task_html_bucket.sql Outdated
Comment thread supabase/migrations/20260521000000_drop_task_html_bucket.sql Outdated
@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-391.fly.dev
Dashboard: https://hover-pr-391.fly.dev/dashboard

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-391.fly.dev
Dashboard: https://hover-pr-391.fly.dev/dashboard

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-391.fly.dev
Dashboard: https://hover-pr-391.fly.dev/dashboard

@simonsmallchua simonsmallchua merged commit d3faaf5 into main May 21, 2026
20 of 21 checks passed
@simonsmallchua simonsmallchua deleted the work/drop-task-html-bucket branch May 21, 2026 08:38
simonsmallchua added a commit that referenced this pull request May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant