Skip to content

fix: prevent excessive disk usage from repo backups#235

Merged
arunavo4 merged 2 commits intomainfrom
fix/234-disk-space-backup-retention
Mar 18, 2026
Merged

fix: prevent excessive disk usage from repo backups#235
arunavo4 merged 2 commits intomainfrom
fix/234-disk-space-backup-retention

Conversation

@arunavo4
Copy link
Collaborator

Summary

Fixes #234 — repo-backups directory growing to 17GB+ due to a backward-compatibility trap where legacy configs silently resolve to "always backup" mode.

  • Fix backward-compat trap: backupBeforeSync: true (the mapper default) was silently mapping to "always" in resolveBackupStrategy, creating full git bundles on every sync. Now maps to "on-force-push" (smart backup) instead.
  • Fix UI round-trip regression: Legacy backupBeforeSync: false configs were being overwritten to "on-force-push" after any auto-save. mapDbToUiConfig now correctly preserves the "disabled" state.
  • Lower default retention: Reduced backupRetentionCount from 20 to 5 bundles per repo across all layers (schema, defaults, mapper, UI).
  • Add time-based retention: New backupRetentionDays field (default 30 days) deletes old bundles alongside count-based retention, with a safety net to always keep at least 1 bundle.
  • Expose retention days in UI: Added "Snapshot retention days" input to the backup settings form (was documented but missing from UI).
  • Add UI warning: "Always Backup" option now shows "(high disk usage)" in the description.
  • Update docs: FORCE_PUSH_PROTECTION.md updated with new defaults and backward-compat table.

Test plan

  • All 171 tests pass (2 test assertions updated for new behavior)
  • Production build succeeds
  • Verified legacy backupBeforeSync: false round-trip with direct mapper repro
  • Verified all 4 legacy config scenarios map correctly:
    • backupBeforeSync: false + no strategy → "disabled"
    • backupBeforeSync: true + no strategy → "on-force-push"
    • Explicit backupStrategy → preserved as-is
    • No fields at all → "on-force-push" (default)

Legacy configs with backupBeforeSync: true but no explicit backupStrategy
silently resolved to "always", creating full git bundles on every sync
cycle. This caused repo-backups to grow to 17GB+ for users with many
repositories.

Changes:
- Fix resolveBackupStrategy to map backupBeforeSync: true → "on-force-push"
  instead of "always", so legacy configs only backup when force-push is detected
- Fix config mapper to always set backupStrategy explicitly ("on-force-push")
  preventing the backward-compat fallback from triggering
- Lower default backupRetentionCount from 20 to 5 bundles per repo
- Add time-based retention (backupRetentionDays, default 30 days) alongside
  count-based retention, with safety net to always keep at least 1 bundle
- Add "high disk usage" warning on "Always Backup" UI option
- Update docs and tests to reflect new defaults and behavior
…se retention days

P1: mapDbToUiConfig now checks backupBeforeSync === false before
defaulting backupStrategy, preventing legacy "disabled" configs from
silently becoming "on-force-push" after any auto-save round-trip.

P3: Added "Snapshot retention days" input field to the backup settings
UI, matching the documented setting in FORCE_PUSH_PROTECTION.md.
@cloudflare-workers-and-pages
Copy link

Deploying gitea-mirror-website with  Cloudflare Pages  Cloudflare Pages

Latest commit: 67e085f
Status: ✅  Deploy successful!
Preview URL: https://85fe8e95.gitea-mirror-website.pages.dev
Branch Preview URL: https://fix-234-disk-space-backup-re.gitea-mirror-website.pages.dev

View logs

@github-actions
Copy link
Contributor

🐳 Docker Image Built Successfully

Your PR image is available for testing:

Image Tag: pr-235
Full Image Path: ghcr.io/raylabshq/gitea-mirror:pr-235

Pull and Test

docker pull ghcr.io/raylabshq/gitea-mirror:pr-235
docker run -d   -p 4321:4321   -e BETTER_AUTH_SECRET=your-secret-here   -e BETTER_AUTH_URL=http://localhost:4321   --name gitea-mirror-test ghcr.io/raylabshq/gitea-mirror:pr-235

Docker Compose Testing

services:
  gitea-mirror:
    image: ghcr.io/raylabshq/gitea-mirror:pr-235
    ports:
      - "4321:4321"
    environment:
      - BETTER_AUTH_SECRET=your-secret-here
      - BETTER_AUTH_URL=http://localhost:4321
      - BETTER_AUTH_TRUSTED_ORIGINS=http://localhost:4321

💡 Note: PR images are tagged as pr-<number> and built for both linux/amd64 and linux/arm64.
Production images (latest, version tags) use the same multi-platform set.


📦 View in GitHub Packages

@github-actions
Copy link
Contributor

🔍 Vulnerabilities of gitea-mirror:scan

📦 Image Reference gitea-mirror:scan
digestsha256:416159814b999bec1fa112a4f01db206745af3c136e6fd7573049e9c28db5253
vulnerabilitiescritical: 0 high: 4 medium: 0 low: 0
platformlinux/amd64
size283 MB
packages800
📦 Base Image debian:trixie
digestsha256:13f29b6806e531c3ff3b565bb6eed73f2132506c8c9d41bb996065ca20fb27f2
vulnerabilitiescritical: 0 high: 3 medium: 1 low: 24
critical: 0 high: 3 medium: 0 low: 0 glibc 2.41-12+deb13u1 (deb)

pkg:deb/debian/glibc@2.41-12%2Bdeb13u1?os_distro=trixie&os_name=debian&os_version=13

high : CVE--2026--0861

Affected range<2.41-12+deb13u2
Fixed version2.41-12+deb13u2
EPSS Score0.008%
EPSS Percentile1st percentile
Description

Passing too large an alignment to the memalign suite of functions (memalign, posix_memalign, aligned_alloc) in the GNU C Library version 2.30 to 2.42 may result in an integer overflow, which could consequently result in a heap corruption. Note that the attacker must have control over both, the size as well as the alignment arguments of the memalign function to be able to exploit this. The size parameter must be close enough to PTRDIFF_MAX so as to overflow size_t along with the large alignment argument. This limits the malicious inputs for the alignment for memalign to the range [1<<62+ 1, 1<<63] and exactly 1<<63 for posix_memalign and aligned_alloc. Typically the alignment argument passed to such functions is a known constrained quantity (e.g. page size, block size, struct sizes) and is not attacker controlled, because of which this may not be easily exploitable in practice. An application bug could potentially result in the input alignment being too large, e.g. due to a different buffer overflow or integer overflow in the application or its dependent libraries, but that is again an uncommon usage pattern given typical sources of alignments.


high : CVE--2026--0915

Affected range<2.41-12+deb13u2
Fixed version2.41-12+deb13u2
EPSS Score0.019%
EPSS Percentile5th percentile
Description

Calling getnetbyaddr or getnetbyaddr_r with a configured nsswitch.conf that specifies the library's DNS backend for networks and queries for a zero-valued network in the GNU C Library version 2.0 to version 2.42 can leak stack contents to the configured DNS resolver.


high : CVE--2025--15281

Affected range<2.41-12+deb13u2
Fixed version2.41-12+deb13u2
EPSS Score0.053%
EPSS Percentile16th percentile
Description

Calling wordexp with WRDE_REUSE in conjunction with WRDE_APPEND in the GNU C Library version 2.0 to version 2.42 may cause the interface to return uninitialized memory in the we_wordv member, which on subsequent calls to wordfree may abort the process.


critical: 0 high: 1 medium: 0 low: 0 fast-xml-parser 5.5.5 (npm)

pkg:npm/fast-xml-parser@5.5.5

high 7.5: CVE--2026--33036 Improper Restriction of Recursive Entity References in DTDs ('XML Entity Expansion')

Affected range>=4.0.0-beta.3
<=5.5.5
Fixed version5.5.6
CVSS Score7.5
CVSS VectorCVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H
Description

Summary

The fix for CVE-2026-26278 added entity expansion limits (maxTotalExpansions, maxExpandedLength, maxEntityCount, maxEntitySize) to prevent XML entity expansion Denial of Service. However, these limits are only enforced for DOCTYPE-defined entities. Numeric character references (&#NNN; and &#xHH;) and standard XML entities (&lt;, &gt;, etc.) are processed through a separate code path that does NOT enforce any expansion limits.

An attacker can use massive numbers of numeric entity references to completely bypass all configured limits, causing excessive memory allocation and CPU consumption.

Affected Versions

fast-xml-parser v5.x through v5.5.3 (and likely v5.5.5 on npm)

Root Cause

In src/xmlparser/OrderedObjParser.js, the replaceEntitiesValue() function has two separate entity replacement loops:

  1. Lines 638-670: DOCTYPE entities — expansion counting with entityExpansionCount and currentExpandedLength tracking. This was the CVE-2026-26278 fix.
  2. Lines 674-677: lastEntities loop — replaces standard entities including num_dec (/&#([0-9]{1,7});/g) and num_hex (/&#x([0-9a-fA-F]{1,6});/g). This loop has NO expansion counting at all.

The numeric entity regex replacements at lines 97-98 are part of lastEntities and go through the uncounted loop, completely bypassing the CVE-2026-26278 fix.

Proof of Concept

const { XMLParser } = require('fast-xml-parser');

// Even with strict explicit limits, numeric entities bypass them
const parser = new XMLParser({
  processEntities: {
    enabled: true,
    maxTotalExpansions: 10,
    maxExpandedLength: 100,
    maxEntityCount: 1,
    maxEntitySize: 10
  }
});

// 100K numeric entity references — should be blocked by maxTotalExpansions=10
const xml = `<root>${'&#65;'.repeat(100000)}</root>`;
const result = parser.parse(xml);

// Output: 500,000 chars — bypasses maxExpandedLength=100 completely
console.log('Output length:', result.root.length);  // 500000
console.log('Expected max:', 100);  // limit was 100

Results:

  • 100K &#65; references → 500,000 char output (5x default maxExpandedLength of 100,000)
  • 1M references → 5,000,000 char output, ~147MB memory consumed
  • Even with maxTotalExpansions=10 and maxExpandedLength=100, 10K references produce 50,000 chars
  • Hex entities (&#x41;) exhibit the same bypass

Impact

Denial of Service — An attacker who can provide XML input to applications using fast-xml-parser can cause:

  • Excessive memory allocation (147MB+ for 1M entity references)
  • CPU consumption during regex replacement
  • Potential process crash via OOM

This is particularly dangerous because the application developer may have explicitly configured strict entity expansion limits believing they are protected, while numeric entities silently bypass all of them.

Suggested Fix

Apply the same entityExpansionCount and currentExpandedLength tracking to the lastEntities loop (lines 674-677) and the HTML entities loop (lines 680-686), similar to how DOCTYPE entities are tracked at lines 638-670.

Workaround

Set htmlEntities:false

@github-actions
Copy link
Contributor

Recommended fixes for local gitea-mirror:scan

Base image is debian:trixie

Nametrixie
Digestsha256:13f29b6806e531c3ff3b565bb6eed73f2132506c8c9d41bb996065ca20fb27f2
Vulnerabilitiescritical: 0 high: 3 medium: 1 low: 24
Pushed3 weeks ago
Size49 MB
Packages111

Refresh base image

Rebuild the image using a newer base image version. Updating this may result in breaking changes.

✅ This image version is up to date.

Change base image

✅ There are no tag recommendations at this time.

@github-actions
Copy link
Contributor

Overview

Image reference ghcr.io/raylabshq/gitea-mirror:latest gitea-mirror:scan
- digest 2aa51a15990b 416159814b99
- tag latest scan
- provenance 67e085f oven-sh/bun@30e609e
- vulnerabilities critical: 0 high: 4 medium: 4 low: 40 critical: 0 high: 4 medium: 4 low: 40
- platform linux/amd64 linux/amd64
- size 245 MB 283 MB (+38 MB)
- packages 800 800
Base Image debian:trixie debian:trixie
- vulnerabilities critical: 0 high: 3 medium: 1 low: 24 critical: 0 high: 3 medium: 1 low: 24
Labels (8 changes)
  • ± 8 changed
-org.opencontainers.image.created=2026-03-18T09:27:42.378Z
+org.opencontainers.image.created=2026-02-26T07:10:54.054Z
-org.opencontainers.image.description=Gitea Mirror auto-syncs GitHub repos to your self-hosted Gitea/Forgejo, with a sleek Web UI and easy Docker deployment.
+org.opencontainers.image.description=Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one
-org.opencontainers.image.licenses=AGPL-3.0
+org.opencontainers.image.licenses=NOASSERTION
-org.opencontainers.image.revision=67e085f6c0729a6d3abcabe28425d06de73363c7
+org.opencontainers.image.revision=30e609e08073cf7114bfb278506962a5b19d0677
-org.opencontainers.image.source=https://github.com/RayLabsHQ/gitea-mirror
+org.opencontainers.image.source=https://github.com/oven-sh/bun
-org.opencontainers.image.title=gitea-mirror
+org.opencontainers.image.title=bun
-org.opencontainers.image.url=https://github.com/RayLabsHQ/gitea-mirror
+org.opencontainers.image.url=https://github.com/oven-sh/bun
-org.opencontainers.image.version=pr-235
+org.opencontainers.image.version=1.3.10-debian

@arunavo4 arunavo4 merged commit ddd071f into main Mar 18, 2026
9 checks passed
@arunavo4 arunavo4 deleted the fix/234-disk-space-backup-retention branch March 18, 2026 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] App uses a lot of disk space

1 participant