Add retries/resume to snapshot fetching #2544

LesnyRumcajs · 2023-02-14T06:32:33Z

Issue summary

The mainnet hosting seems volatile. Forest has been failing numerous times in the CI (so dockerized, with aria2) and locally (without aria2). The error is the same, something failed in the middle of downloading.

Dockerized (aria2)

2023-02-13T20:45:35.637704Z  INFO forest_cli_shared::cli::snapshot_fetch: Snapshot url: https://pub-fd31751bcb69400eb39e694385c19457.r2.dev/minimal/2599920_2023_02_13T16_00_00Z.car    
2023-02-13T20:45:35.637744Z  INFO forest_cli_shared::cli::snapshot_fetch: Snapshot will be downloaded to /volumes/forest_data/snapshots/mainnet/filecoin_snapshot_mainnet_2023-02-13_height_2599920.car (117.14 GiB)    
2023-02-13T20:46:28.219805Z ERROR forest_cli::cli: Error: Failed fetching the snapshot: error reading a body from connection: unexpected end of file

No logs for the local build but it's roughly the same.

This fails the CI from time to time and is annoying when setting up a server for a night job.

The goal is to implement a retry mechanism (or, even better, retry with a resume - something like this should be available with aria2 almost out of the box; for vanilla, we can be pragmatic and retry from scratch). Only if the download fails a few times should the daemon fail.

Task summary

Add retry parameter to config. It doesn't have to be exposed as a flag. Set it to a sane default.
Add a retry interval parameter. Set it to a sane default.
Use the parameters above to implement retry and retry/resume for snapshot fetch. This should work for both the daemon trying to download the snapshot and the forest-cli snapshot fetch.
Implement tests validating the logic

Acceptance Criteria

Forest daemon no longer fails after a single failure in the upstream host
Feature is properly tested
Changelog is updated

Other information and links

The text was updated successfully, but these errors were encountered:

sudo-shashank self-assigned this Feb 17, 2023

sudo-shashank mentioned this issue Feb 21, 2023

Retries to snapshot fetch daemon and cli #2571

Merged

4 tasks

sudo-shashank closed this as completed in #2571 Feb 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add retries/resume to snapshot fetching #2544

Add retries/resume to snapshot fetching #2544

LesnyRumcajs commented Feb 14, 2023 •

edited by sudo-shashank

Add retries/resume to snapshot fetching #2544

Add retries/resume to snapshot fetching #2544

Comments

LesnyRumcajs commented Feb 14, 2023 • edited by sudo-shashank

LesnyRumcajs commented Feb 14, 2023 •

edited by sudo-shashank