Skip to content

Fix GitHub 2gb file size limit#12

Merged
KellyJDavis merged 2 commits into
mainfrom
fix-github-2gb-file-size-limit
Nov 29, 2025
Merged

Fix GitHub 2gb file size limit#12
KellyJDavis merged 2 commits into
mainfrom
fix-github-2gb-file-size-limit

Conversation

@KellyJDavis

Copy link
Copy Markdown
Owner

No description provided.

Implements automatic file splitting for compressed files exceeding 1.8GB
to work around GitHub's 2GB per-asset limit. Also refactors manifest
version handling to separate latest_manifest_version from default_toolchain.

Features:
- Automatic splitting: Files > 1.8GB are split using Unix 'split' command
- Split file metadata: Manifest includes 'parts' array with checksums and sizes
- Download/reassembly: 'leanexplore data fetch' automatically downloads and
  reassembles split files with per-part checksum verification
- Backward compatible: Non-split files continue to work as before

Changes:
- scripts/generate_manifest.py:
  * Add split_file() and process_split_file_parts() functions
  * Modify process_file() to detect and split large files
  * Replace --version with --latest-manifest-version and --default-toolchain
  * Add import error handling and parameter validation
  * Generate manifest with separate latest_manifest_version and default_toolchain

- src/lean_explore/cli/data_commands.py:
  * Add _download_split_file_parts() for downloading all parts
  * Add _reassemble_file_parts() for concatenating parts
  * Add _cleanup_split_file_artifacts() for cleanup
  * Update fetch() to handle split files in main loop
  * Support optional remote_name for split files (derives from local_name)

- src/lean_explore/defaults.py:
  * Add LATEST_MANIFEST_VERSION constant ("0.3.0")

- scripts/README.md:
  * Update documentation for new parameters and split file support

Manifest structure changes:
- Split files include a 'parts' array with metadata for each part
- Parts are named with .000, .001, .002 suffixes (e.g., file.gz.000)
- remote_name for split files is optional (used only as temp filename)
- Latest manifest version and default toolchain are now separate values

This maintains backward compatibility while enabling support for files that
exceed GitHub's asset size limits.
@KellyJDavis KellyJDavis merged commit c8d9b54 into main Nov 29, 2025
2 checks passed
@KellyJDavis KellyJDavis deleted the fix-github-2gb-file-size-limit branch November 29, 2025 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant