Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NIT-2589] init: define parts with manifest file #2376

Merged
merged 18 commits into from
Jul 9, 2024

Conversation

gligneul
Copy link
Contributor

@gligneul gligneul commented Jun 7, 2024

In a previous PR, we added the capability of downloading the initialization snapshot in parts. This PR enhances this feature by requiring a manifest file with the parts' names and checksums. We use a manifest file to determine how many parts the archive was divided into. Thus, avoiding missing a part and loading a corrupted database.

When downloading a snapshot, the initialization code will look for the snapshot under --init.url. Notice that this URL can be set automatically using the --init.latest option, which is already merged.

For instance, let's assume the downloaded snapshot URL is https://snapshot.arbitrum.foundation/sepolia-rollup/2024/20/archive.tar. If this file is not found on the server (404 HTTP status code), nitro will look for the manifest file containing information about the parts. This manifest file is the archive URL with the suffix .manifest.txt. The format of the manifest file is the output of the sha256sum command.

So, assuming we have the following file structure on the server:

└── sepolia-rollup
    └── 2024
        └── 20
            ├── archive.tar.manifest.txt
            ├── archive.tar.part0
            ├── archive.tar.part1
            ├── archive.tar.part2
            ├── archive.tar.part3
            ├── archive.tar.part4
            └── archive.tar.part5

The contents of the manifest file will be:

a938e029605b81e03cd4b9a916c52d96d74c985ac264e2f298b90495c619af74  archive.tar.part0
9e095ce82e70fa62bb6e7b4421e7f2c04b2cd9e21d2bc62cbbaaeb877408357b  archive.tar.part1
e92172d6eaf770a76c7477e6768f742fc51555a5050de606bd0f837e59c7a61d  archive.tar.part2
d1b6fb9aeeb23903cdbb2a7cca8e6909bff4ee8e51c8a5acac2a142b3e3a5437  archive.tar.part3
f37e4552453202f2044e58b307bab7e466205bd280426abbc84f8646c6430cfa  archive.tar.part4
972c5f513faca6ac4fadd22c70bea97707c6d38e9a646432bc311f0ca10497ed  archive.tar.part5

The init code will read the checksum and file names and download each part from the server. If the checksum validation option is enabled, the checksum will also be validated after downloading the file. Finally, the code will join all the parts into a single archive, extract it, and proceed to load the database.

This PR also includes the changes from the #2384 PR.

@gligneul gligneul self-assigned this Jun 7, 2024
@cla-bot cla-bot bot added the s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA. label Jun 7, 2024
@gligneul gligneul marked this pull request as ready for review June 7, 2024 19:29
Base automatically changed from gligneul/search-snapshot to master June 11, 2024 15:28
Tristan-Wilson
Tristan-Wilson previously approved these changes Jun 14, 2024
amsanghi
amsanghi previously approved these changes Jun 17, 2024
amsanghi
amsanghi previously approved these changes Jun 18, 2024
Copy link
Contributor

@magicxyyz magicxyyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one question about partUrl

// Download parts
for i, partName := range partNames {
log.Info("Downloading database part", "part", partName)
partUrl := url.JoinPath("..", partName).String()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we join ".." instead of "."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .. removes the last element from the path. For instance, http://path/to/archive.tar becomes http://path/to/archive.tar/../archive.tar.part0, which then becomes http://path/to/archive.tar.part0.

I noticed the code is misleading because the variable url shadows the url module from the standard library. I will rename it to archiveUrl.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks!

@magicxyyz magicxyyz self-requested a review June 18, 2024 19:55
magicxyyz
magicxyyz previously approved these changes Jun 18, 2024
Copy link
Contributor

@magicxyyz magicxyyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gligneul gligneul dismissed stale reviews from amsanghi and magicxyyz via 3b62f43 June 18, 2024 20:13
Copy link
Contributor

@magicxyyz magicxyyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joshuacolvin0 joshuacolvin0 merged commit 81aa1b8 into master Jul 9, 2024
10 of 11 checks passed
@joshuacolvin0 joshuacolvin0 deleted the gligneul/download-manifest branch July 9, 2024 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-approved s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants