allow export of sha256sum for snapshots #1979

LesnyRumcajs · 2022-09-30T14:49:11Z

Summary of changes
Changes introduced in this pull request:

adds an option to export a sha256 checksum after exporting the snapshot. This step adds around 20 seconds to the total export time for calibnet. Unfortunately, it would be tricky to do it with chunk hashing during snapshot creation.

Reference issue to close (if applicable)

Other information and links

Part of #1899

lemmih · 2022-09-30T14:53:04Z

It's tricky to compute the hash concurrently with snapshot creation before it would have to happen in the daemon rather than in the client?

LesnyRumcajs · 2022-09-30T15:20:31Z

It's tricky to compute the hash concurrently with snapshot creation before it would have to happen in the daemon rather than in the client?

We would need to implement our own AsyncWrite as the non-trivial logic is nested in https://github.com/filecoin-project/ref-fvm/blob/8d3a6148cbb65b4ec469f5115277aa6c2ab6df31/ipld/car/src/lib.rs#L31-L51
I am not sure it's worth it though I'm happy to give it a go if you feel otherwise.

lemmih · 2022-09-30T15:35:14Z

It's tricky to compute the hash concurrently with snapshot creation before it would have to happen in the daemon rather than in the client?

We would need to implement our own AsyncWrite as the non-trivial logic is nested in https://github.com/filecoin-project/ref-fvm/blob/8d3a6148cbb65b4ec469f5115277aa6c2ab6df31/ipld/car/src/lib.rs#L31-L51 I am not sure it's worth it though I'm happy to give it a go if you feel otherwise.

Nah, let's leave it for the future. Or maybe @hanabi1224 could have a look at it.

LesnyRumcajs · 2022-09-30T16:01:25Z

It's tricky to compute the hash concurrently with snapshot creation before it would have to happen in the daemon rather than in the client?

We would need to implement our own AsyncWrite as the non-trivial logic is nested in https://github.com/filecoin-project/ref-fvm/blob/8d3a6148cbb65b4ec469f5115277aa6c2ab6df31/ipld/car/src/lib.rs#L31-L51 I am not sure it's worth it though I'm happy to give it a go if you feel otherwise.

Nah, let's leave it for the future. Or maybe @hanabi1224 could have a look at it.

It shouldn't be that difficult, just a wrapper around a BufWriter. But still, it introduces some logic and would obviously require proper coverage. 😁

LesnyRumcajs · 2022-10-03T17:41:14Z

@lemmih I revamped the hashing technology, please take a look again as it has changed a bit.

LesnyRumcajs · 2022-10-03T18:33:30Z

Something is not right after merging with mainstream, will investigate why.

LesnyRumcajs · 2022-10-03T20:28:39Z

Should be fine now. I did not handle partial write on Poll::Ready.

lemmih · 2022-10-03T21:13:21Z

Can forest_utils and forest_utils_sink be merged?

We'll end up using tokio::io::AsyncWrite rather than futures::AsyncWrite but not a big deal since we can freely convert between the two.

LesnyRumcajs · 2022-10-04T07:56:44Z

@lemmih

Can forest_utils and forest_utils_sink be merged?

I can rename forest_utils_sink to forest_utils_io and move the contents of forest_utils there. Right now the code seems to sit in the node/ which is a bit counterintuitive. What do you think?

lemmih · 2022-10-04T08:20:21Z

@lemmih

Can forest_utils and forest_utils_sink be merged?

I can rename forest_utils_sink to forest_utils_io and move the contents of forest_utils there. Right now the code seems to sit in the node/ which is a bit counterintuitive. What do you think?

I feel like we have too may forest_utils crates and having a single utility crate would be much easier to handle. Right now we have forest_net_utils, forest_hash_utils, forest_json_utils, forest_test_utils, and forest_utils. We also have other utility crates with different names: forest_macros and forest_auth. How about we merge these into a single crate? For now, you could put your code in forest_utils::io.

LesnyRumcajs · 2022-10-04T08:31:10Z

@lemmih

Can forest_utils and forest_utils_sink be merged?

I can rename forest_utils_sink to forest_utils_io and move the contents of forest_utils there. Right now the code seems to sit in the node/ which is a bit counterintuitive. What do you think?

I feel like we have too may forest_utils crates and having a single utility crate would be much easier to handle. Right now we have forest_net_utils, forest_hash_utils, forest_json_utils, forest_test_utils, and forest_utils. We also have other utility crates with different names: forest_macros and forest_auth. How about we merge these into a single crate?

I kind of prefer many smaller crates partitioned by their categories than one big crate. Makes it easier to reason about dependencies and cuts compilations times.

For now, you could put your code in forest_utils::io.

This also doesn't work for me, conceptually. Importing utils from node (which forest_utils is actually) is a bit off for blockchain/chain, no? I'd recommend the approach mentioned in previous comment but if you feel strongly about this then of course I can go ahead with what you proposed.

LesnyRumcajs · 2022-10-04T08:34:14Z

We'll end up using tokio::io::AsyncWrite rather than futures::AsyncWrite but not a big deal since we can freely convert between the two.

I moved it to tokio::io::AsyncWrite.

lemmih · 2022-10-04T08:42:42Z

I kind of prefer many smaller crates partitioned by their categories than one big crate. Makes it easier to reason about dependencies and cuts compilations times.

What does it mean to "reason about dependencies" and I don't think it would cut compilation time. In what scenario would it cut compilation time?

LesnyRumcajs · 2022-10-04T08:55:12Z

What does it mean to "reason about dependencies" and I don't think it would cut compilation time.

If I would see tide dependency in utils_hash I would be really surprised and would investigate it further. Same dependency in utils and I wouldn't even blink because I'd just ignore the entire wall of dependencies.

In what scenario would it cut compilation time?

To my understanding, if you change something in one significant crate dependency (especially the Cargo.lock), all dependents would have to rebuild. Same goes for having different binaries, you would not need to pull everything from crates.io if you'd have only small subset of dependencies.

lemmih · 2022-10-04T09:03:41Z

What does it mean to "reason about dependencies" and I don't think it would cut compilation time.

If I would see tide dependency in utils_hash I would be really surprised and would investigate it further. Same dependency in utils and I wouldn't even blink because I'd just ignore the entire wall of dependencies.

While nice to have, I don't think that justifies the significant administrative cost of having a plethora of crates.

In what scenario would it cut compilation time?

To my understanding, if you change something in one significant crate dependency (especially the Cargo.lock), all dependents would have to rebuild. Same goes for having different binaries, you would not need to pull everything from crates.io if you'd have only small subset of dependencies.

But the bulk of our code will use all of the utility crates. I have a hard time imagining that having two crates for forest_hash_utils and forest_json_utils could ever save any compilation time. How would we avoid recompilation by keeping those two crates separate? Do we have significant code bases that only use one crate but not the other? And do we modify those utility crates frequently enough to justify the administrative cost?

LesnyRumcajs · 2022-10-04T09:27:17Z

How could it not save compilation time? You change something in metrics, a dependency of only forest. Only this dependency tree will need to get checked and re-compiled.

❯ cargo build
   Compiling forest_metrics v0.2.0 (/home/rumcajs-work/prj/forest/utils/metrics)
   Compiling forest v0.4.0 (/home/rumcajs-work/prj/forest/forest)
    Finished dev [unoptimized] target(s) in 18.61s

You change something in forest_json which has a larger dependency tree

❯ cargo build
   Compiling forest_json v0.2.0 (/home/rumcajs-work/prj/forest/utils/json)
   Compiling forest_vm v0.4.0 (/home/rumcajs-work/prj/forest/vm)
   Compiling forest_networks v0.2.0 (/home/rumcajs-work/prj/forest/types/networks)
   Compiling forest_statediff v0.2.0 (/home/rumcajs-work/prj/forest/utils/statediff)
   Compiling forest_fil_types v0.4.0 (/home/rumcajs-work/prj/forest/types)
   Compiling forest_message v0.8.0 (/home/rumcajs-work/prj/forest/vm/message)
   Compiling forest_state_migration v0.2.0 (/home/rumcajs-work/prj/forest/vm/state_migration)
   Compiling forest_blocks v0.2.0 (/home/rumcajs-work/prj/forest/blockchain/blocks)
   Compiling forest_actor_interface v0.2.0 (/home/rumcajs-work/prj/forest/vm/actor_interface)
   Compiling forest_paramfetch v0.2.0 (/home/rumcajs-work/prj/forest/utils/paramfetch)
   Compiling forest_interpreter v0.2.0 (/home/rumcajs-work/prj/forest/vm/interpreter)
   Compiling forest_chain v0.2.0 (/home/rumcajs-work/prj/forest/blockchain/chain)
   Compiling forest_state_manager v0.1.0 (/home/rumcajs-work/prj/forest/blockchain/state_manager)
   Compiling forest_libp2p v0.2.0 (/home/rumcajs-work/prj/forest/node/forest_libp2p)
   Compiling forest_genesis v0.2.0 (/home/rumcajs-work/prj/forest/utils/genesis)
   Compiling forest_message_pool v0.2.0 (/home/rumcajs-work/prj/forest/blockchain/message_pool)
   Compiling forest_chain_sync v0.2.0 (/home/rumcajs-work/prj/forest/blockchain/chain_sync)
   Compiling forest_rpc-api v0.2.0 (/home/rumcajs-work/prj/forest/node/rpc-api)
   Compiling forest_fil_cns v0.2.0 (/home/rumcajs-work/prj/forest/blockchain/consensus/fil_cns)
   Compiling forest_deleg_cns v0.2.0 (/home/rumcajs-work/prj/forest/blockchain/consensus/deleg_cns)
   Compiling forest_rpc v0.2.0 (/home/rumcajs-work/prj/forest/node/rpc)
   Compiling forest_rpc-client v0.2.0 (/home/rumcajs-work/prj/forest/node/rpc-client)
   Compiling forest v0.4.0 (/home/rumcajs-work/prj/forest/forest)
    Finished dev [unoptimized] target(s) in 22.58s

Of course, it won't cut the time by 80%, but when you do changes and monitor everything with cargo watch it does matter.

All in all, up to you. Maybe it's only my unpopular opinion.

lemmih · 2022-10-04T09:36:00Z

Your example highlights my point. Even in such an extreme case where lots of crates are recompiled, the difference is only a few seconds: 18.61s vs 22.58s.

It was good discussing the pros and cons. Let's not add any more crates at this time.

hanabi1224 · 2022-10-04T11:16:19Z

a few seconds: 18.61s vs 22.58s

In this case I think the linkage of forest executable takes ~15s (on my machine) for both cases, maybe cargo check is a better command to make the comparison. BTW, I tried RUSTFLAGS="-Cprefer-dynamic" locally to reduce linkage time for a dev build but it failed with tons of errors. : (

elmattic

Good!

elmattic · 2022-10-05T14:40:10Z

node/rpc/src/chain_api.rs

@@ -70,32 +77,56 @@ where
    }

    let file = File::create(&out).await.map_err(JsonRpcError::from)?;
-    let writer = BufWriter::new(file);
+    let writer = AsyncWriterWithChecksum::<Sha256, _>::new(BufWriter::new(file));


In the case we want to skip this checksum, it seems to me we are still paying the price of computing it, only skipping saving the file no?

Hm, indeed. We'll address that in another PR.

LesnyRumcajs requested review from connormullett, elmattic, lemmih, jdjaustin, tyshko5 and hanabi1224 as code owners September 30, 2022 14:49

LesnyRumcajs force-pushed the checksum-for-exported-snapshots branch from 2eddfad to c2fe947 Compare September 30, 2022 15:51

lemmih approved these changes Oct 3, 2022

View reviewed changes

lemmih enabled auto-merge (squash) October 3, 2022 09:55

github-actions bot requested a review from creativcoder as a code owner October 3, 2022 09:56

LesnyRumcajs added 3 commits October 3, 2022 19:29

allow export of sha256sum for snapshots

e789ea2

make lints happy

e8a8ac7

custom impl of async writer with checksum

3490e35

LesnyRumcajs force-pushed the checksum-for-exported-snapshots branch from d4dcf45 to 3490e35 Compare October 3, 2022 17:38

LesnyRumcajs requested a review from lemmih October 3, 2022 17:40

make linters happy

c221c57

LesnyRumcajs force-pushed the checksum-for-exported-snapshots branch from 79f6adf to c221c57 Compare October 3, 2022 17:53

LesnyRumcajs disabled auto-merge October 3, 2022 18:32

LesnyRumcajs marked this pull request as draft October 3, 2022 18:33

fix digesting fiasco

6a71d12

LesnyRumcajs marked this pull request as ready for review October 3, 2022 20:27

use tokio AsyncWrite trait

e423677

move utils sink to node utils io

3119af5

LesnyRumcajs mentioned this pull request Oct 4, 2022

use checksums for calibnet fetches #1999

Merged

elmattic approved these changes Oct 5, 2022

View reviewed changes

lemmih enabled auto-merge (squash) October 6, 2022 09:31

Merge branch 'main' into checksum-for-exported-snapshots

a1bc92d

lemmih merged commit 83db96b into main Oct 6, 2022

lemmih deleted the checksum-for-exported-snapshots branch October 6, 2022 10:00

elmattic mentioned this pull request Oct 6, 2022

Avoid hashing during snapshot export when --skip_checksum is used #2013

Closed

2 tasks

jdjaustin pushed a commit that referenced this pull request Oct 10, 2022

allow export of sha256sum for snapshots (#1979)

e21ee71

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow export of sha256sum for snapshots #1979

allow export of sha256sum for snapshots #1979

LesnyRumcajs commented Sep 30, 2022

lemmih commented Sep 30, 2022

LesnyRumcajs commented Sep 30, 2022 •

edited

lemmih commented Sep 30, 2022

LesnyRumcajs commented Sep 30, 2022 •

edited

LesnyRumcajs commented Oct 3, 2022

LesnyRumcajs commented Oct 3, 2022

LesnyRumcajs commented Oct 3, 2022

lemmih commented Oct 3, 2022

LesnyRumcajs commented Oct 4, 2022

lemmih commented Oct 4, 2022

LesnyRumcajs commented Oct 4, 2022 •

edited

LesnyRumcajs commented Oct 4, 2022

lemmih commented Oct 4, 2022

LesnyRumcajs commented Oct 4, 2022

lemmih commented Oct 4, 2022

LesnyRumcajs commented Oct 4, 2022 •

edited

lemmih commented Oct 4, 2022

hanabi1224 commented Oct 4, 2022

elmattic left a comment

elmattic Oct 5, 2022

lemmih Oct 6, 2022

allow export of sha256sum for snapshots #1979

allow export of sha256sum for snapshots #1979

Conversation

LesnyRumcajs commented Sep 30, 2022

lemmih commented Sep 30, 2022

LesnyRumcajs commented Sep 30, 2022 • edited

lemmih commented Sep 30, 2022

LesnyRumcajs commented Sep 30, 2022 • edited

LesnyRumcajs commented Oct 3, 2022

LesnyRumcajs commented Oct 3, 2022

LesnyRumcajs commented Oct 3, 2022

lemmih commented Oct 3, 2022

LesnyRumcajs commented Oct 4, 2022

lemmih commented Oct 4, 2022

LesnyRumcajs commented Oct 4, 2022 • edited

LesnyRumcajs commented Oct 4, 2022

lemmih commented Oct 4, 2022

LesnyRumcajs commented Oct 4, 2022

lemmih commented Oct 4, 2022

LesnyRumcajs commented Oct 4, 2022 • edited

lemmih commented Oct 4, 2022

hanabi1224 commented Oct 4, 2022

elmattic left a comment

Choose a reason for hiding this comment

elmattic Oct 5, 2022

Choose a reason for hiding this comment

lemmih Oct 6, 2022

Choose a reason for hiding this comment

LesnyRumcajs commented Sep 30, 2022 •

edited

LesnyRumcajs commented Sep 30, 2022 •

edited

LesnyRumcajs commented Oct 4, 2022 •

edited

LesnyRumcajs commented Oct 4, 2022 •

edited