Purge dataset storage blobs in background job on delete#103
Merged
antosubash merged 6 commits intomainfrom Apr 11, 2026
Merged
Conversation
DeleteAsync previously soft-deleted the row but never removed the original upload, normalized GeoJSON cache, or any derivative blobs — so every deleted dataset leaked its entire storage footprint (often GBs). Add PurgeDatasetJob that runs after the soft-delete and walks the metadata to delete every referenced blob off the critical path.
The Datasets module's background jobs shell out to native GIS tools: gdal_translate/gdalwarp/ogr2ogr for rasters + non-GeoJSON vector formats, SpatiaLite for GeoPackage reads, and tippecanoe for vector→PMTiles. None of these ship with the dotnet/aspnet base image. - Runtime stage now installs gdal-bin, libsqlite3-mod-spatialite, unzip - New tippecanoe-builder stage compiles felt/tippecanoe from source (not in Debian repos) and the runtime copies the 6 installed binaries from /opt/tippecanoe/bin/ Adds ~300 MB to the runtime image (GDAL pulls libproj/libgeos/libnetcdf) but is unavoidable for real GIS support.
Split concerns so only the process that actually runs IModuleJob handlers
carries the ~300 MB of native GIS dependencies. The Host runs in
BackgroundJobs:WorkerMode=Producer and only enqueues jobs, so gdal-bin,
SpatiaLite, and tippecanoe have no business being on its image.
- Dockerfile: reverted to its original state (no GIS apt installs, no
tippecanoe-builder stage)
- Dockerfile.worker: new, builds template/SimpleModule.Worker with its
own tippecanoe-builder stage, gdal-bin / libsqlite3-mod-spatialite /
unzip in the runtime, and dotnet/runtime:10.0 base (no ASP.NET stack
needed since the Worker is a Generic Host). Installs Node in the
build stage because the ExtractDtoTypeScript / ExtractRoutes MSBuild
targets in SimpleModule.Hosting.targets shell out to `node tools/*.mjs`
after CoreCompile.
- SimpleModule.Worker.csproj: add reference to SimpleModule.Datasets so
the source generator registers ProcessDatasetJob / ConvertDatasetJob /
PurgeDatasetJob for the consumer to execute.
- docker-compose.yml: add worker service built from Dockerfile.worker,
pin api service to WorkerMode=Producer for explicitness, introduce
a shared storage_data named volume so uploaded dataset blobs written
by the Host are visible to the Worker when the background job picks
them up.
Validated Worker publish against a git-archive-clean source tree
(mirroring the Docker build context with no node_modules, no wwwroot) —
builds clean with only the expected "node_modules not found" warnings
and produces SimpleModule.{Worker,Datasets,BackgroundJobs,Email}.dll.
- PurgeDatasetJob: add .AsNoTracking() on the row load (read-only), drop the duplicate DeserializeMetadata helper in favour of inline JsonSerializer.Deserialize, collect blob paths up front and parallelize deletes with Task.WhenAll instead of awaiting each one sequentially. Removes the mutable deleted counter in the process. - DatasetsContractsService.DeleteAsync: drop the narrative comment above the EnqueueAsync call — the PurgeDatasetJob summary already explains why. - Dockerfile.worker: drop per-stage banner comments that just restate the FROM ... AS label. Keep the top-of-file block and the tippecanoe source-build rationale. - docker-compose.yml: drop the apologetic "set here for visibility" comment on the worker's BackgroundJobs__WorkerMode env var.
Deploying simplemodule-website with
|
| Latest commit: |
a3fbe55
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://aeb0c080.simplemodule-website.pages.dev |
| Branch Preview URL: | https://claude-review-background-job.simplemodule-website.pages.dev |
PR #102 moved the Dataset entity out of SimpleModule.Datasets.Entities and into SimpleModule.Datasets.Contracts alongside the other module entities. The merge into this branch broke CS0234 on the now-missing namespace import in my test file; drop the using and rely on the existing SimpleModule.Datasets.Contracts import for the Dataset type.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DeleteAsync previously soft-deleted the row but never removed the
original upload, normalized GeoJSON cache, or any derivative blobs —
so every deleted dataset leaked its entire storage footprint (often
GBs). Add PurgeDatasetJob that runs after the soft-delete and walks
the metadata to delete every referenced blob off the critical path.