Skip to content

Purge dataset storage blobs in background job on delete#103

Merged
antosubash merged 6 commits intomainfrom
claude/review-background-jobs-OJCOG
Apr 11, 2026
Merged

Purge dataset storage blobs in background job on delete#103
antosubash merged 6 commits intomainfrom
claude/review-background-jobs-OJCOG

Conversation

@antosubash
Copy link
Copy Markdown
Owner

DeleteAsync previously soft-deleted the row but never removed the
original upload, normalized GeoJSON cache, or any derivative blobs —
so every deleted dataset leaked its entire storage footprint (often
GBs). Add PurgeDatasetJob that runs after the soft-delete and walks
the metadata to delete every referenced blob off the critical path.

claude and others added 5 commits April 11, 2026 06:47
DeleteAsync previously soft-deleted the row but never removed the
original upload, normalized GeoJSON cache, or any derivative blobs —
so every deleted dataset leaked its entire storage footprint (often
GBs). Add PurgeDatasetJob that runs after the soft-delete and walks
the metadata to delete every referenced blob off the critical path.
The Datasets module's background jobs shell out to native GIS tools:
gdal_translate/gdalwarp/ogr2ogr for rasters + non-GeoJSON vector
formats, SpatiaLite for GeoPackage reads, and tippecanoe for
vector→PMTiles. None of these ship with the dotnet/aspnet base image.

- Runtime stage now installs gdal-bin, libsqlite3-mod-spatialite, unzip
- New tippecanoe-builder stage compiles felt/tippecanoe from source
  (not in Debian repos) and the runtime copies the 6 installed binaries
  from /opt/tippecanoe/bin/

Adds ~300 MB to the runtime image (GDAL pulls libproj/libgeos/libnetcdf)
but is unavoidable for real GIS support.
Split concerns so only the process that actually runs IModuleJob handlers
carries the ~300 MB of native GIS dependencies. The Host runs in
BackgroundJobs:WorkerMode=Producer and only enqueues jobs, so gdal-bin,
SpatiaLite, and tippecanoe have no business being on its image.

- Dockerfile: reverted to its original state (no GIS apt installs, no
  tippecanoe-builder stage)
- Dockerfile.worker: new, builds template/SimpleModule.Worker with its
  own tippecanoe-builder stage, gdal-bin / libsqlite3-mod-spatialite /
  unzip in the runtime, and dotnet/runtime:10.0 base (no ASP.NET stack
  needed since the Worker is a Generic Host). Installs Node in the
  build stage because the ExtractDtoTypeScript / ExtractRoutes MSBuild
  targets in SimpleModule.Hosting.targets shell out to `node tools/*.mjs`
  after CoreCompile.
- SimpleModule.Worker.csproj: add reference to SimpleModule.Datasets so
  the source generator registers ProcessDatasetJob / ConvertDatasetJob /
  PurgeDatasetJob for the consumer to execute.
- docker-compose.yml: add worker service built from Dockerfile.worker,
  pin api service to WorkerMode=Producer for explicitness, introduce
  a shared storage_data named volume so uploaded dataset blobs written
  by the Host are visible to the Worker when the background job picks
  them up.

Validated Worker publish against a git-archive-clean source tree
(mirroring the Docker build context with no node_modules, no wwwroot) —
builds clean with only the expected "node_modules not found" warnings
and produces SimpleModule.{Worker,Datasets,BackgroundJobs,Email}.dll.
- PurgeDatasetJob: add .AsNoTracking() on the row load (read-only), drop
  the duplicate DeserializeMetadata helper in favour of inline
  JsonSerializer.Deserialize, collect blob paths up front and parallelize
  deletes with Task.WhenAll instead of awaiting each one sequentially.
  Removes the mutable deleted counter in the process.
- DatasetsContractsService.DeleteAsync: drop the narrative comment above
  the EnqueueAsync call — the PurgeDatasetJob summary already explains why.
- Dockerfile.worker: drop per-stage banner comments that just restate the
  FROM ... AS label. Keep the top-of-file block and the tippecanoe
  source-build rationale.
- docker-compose.yml: drop the apologetic "set here for visibility"
  comment on the worker's BackgroundJobs__WorkerMode env var.
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 11, 2026

Deploying simplemodule-website with  Cloudflare Pages  Cloudflare Pages

Latest commit: a3fbe55
Status: ✅  Deploy successful!
Preview URL: https://aeb0c080.simplemodule-website.pages.dev
Branch Preview URL: https://claude-review-background-job.simplemodule-website.pages.dev

View logs

PR #102 moved the Dataset entity out of SimpleModule.Datasets.Entities
and into SimpleModule.Datasets.Contracts alongside the other module
entities. The merge into this branch broke CS0234 on the now-missing
namespace import in my test file; drop the using and rely on the
existing SimpleModule.Datasets.Contracts import for the Dataset type.
@antosubash antosubash merged commit 88ed99a into main Apr 11, 2026
5 checks passed
@antosubash antosubash deleted the claude/review-background-jobs-OJCOG branch April 11, 2026 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants