Fix GitHub 2gb file size limit#12
Merged
Merged
Conversation
Implements automatic file splitting for compressed files exceeding 1.8GB
to work around GitHub's 2GB per-asset limit. Also refactors manifest
version handling to separate latest_manifest_version from default_toolchain.
Features:
- Automatic splitting: Files > 1.8GB are split using Unix 'split' command
- Split file metadata: Manifest includes 'parts' array with checksums and sizes
- Download/reassembly: 'leanexplore data fetch' automatically downloads and
reassembles split files with per-part checksum verification
- Backward compatible: Non-split files continue to work as before
Changes:
- scripts/generate_manifest.py:
* Add split_file() and process_split_file_parts() functions
* Modify process_file() to detect and split large files
* Replace --version with --latest-manifest-version and --default-toolchain
* Add import error handling and parameter validation
* Generate manifest with separate latest_manifest_version and default_toolchain
- src/lean_explore/cli/data_commands.py:
* Add _download_split_file_parts() for downloading all parts
* Add _reassemble_file_parts() for concatenating parts
* Add _cleanup_split_file_artifacts() for cleanup
* Update fetch() to handle split files in main loop
* Support optional remote_name for split files (derives from local_name)
- src/lean_explore/defaults.py:
* Add LATEST_MANIFEST_VERSION constant ("0.3.0")
- scripts/README.md:
* Update documentation for new parameters and split file support
Manifest structure changes:
- Split files include a 'parts' array with metadata for each part
- Parts are named with .000, .001, .002 suffixes (e.g., file.gz.000)
- remote_name for split files is optional (used only as temp filename)
- Latest manifest version and default toolchain are now separate values
This maintains backward compatibility while enabling support for files that
exceed GitHub's asset size limits.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.