Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helper agg init: remove copy of report share data. #3303

Merged
merged 1 commit into from
Jul 15, 2024

Conversation

branlwyd
Copy link
Member

Previously, we copied report share data into the transaction callback (re-copying it on transaction retry). Notably, this required us to keep two copies of the output shares in memory. This was done to allow modification of the report aggregations based on information retrieved from the datastore, i.e. if the report was replayed.

Now, we remove the copy by passing an Arc of the relevant data into the transaction callback. Modification of the report aggregation is implemented via a Cow, which will be borrowed from the data in the Arc in the common case of no modification. The aggregation job writer is augmented with the ability to accept report aggregations already stored in a Cow, and is taught to be smart enough not to double-wrap Cow around its already-Cow'ed internal state.

Closes #3285.

Previously, we copied report share data into the transaction callback
(re-copying it on transaction retry). Notably, this required us to keep
two copies of the output shares in memory. This was done to allow
modification of the report aggregations based on information retrieved
from the datastore, i.e. if the report was replayed.

Now, we remove the copy by passing an Arc of the relevant data into the
transaction callback. Modification of the report aggregation is
implemented via a Cow, which will be borrowed from the data in the Arc
in the common case of no modification. The aggregation job writer is
augmented with the ability to accept report aggregations already stored
in a Cow, and is taught to be smart enough not to double-wrap Cow around
its already-Cow'ed internal state.
@branlwyd branlwyd requested a review from a team as a code owner July 13, 2024 00:31
@branlwyd branlwyd changed the title Helper agg init: Remove copy of report share data. Helper agg init: remove copy of report share data. Jul 13, 2024
Copy link
Contributor

@inahga inahga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM and I think they accomplish the goal.

Do you have memory profiles that show improvement? Since a few LGTMs are not necessarily evidence of it working. Broadly, it would be nice to document how we're profiling, if for no other reason than I'm unfamiliar with Rust profiling strategies (but I don't think that should hold up this PR).

@@ -507,7 +507,7 @@ where
aggregation_job: Cow::Borrowed(aggregation_job),
report_aggregations: report_aggregations
.iter()
.map(Cow::Borrowed)
.map(RA::borrow)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check my understanding: at this point, report_aggregations is (roughly) a Vec<impl ReportAggregationUpdate>, which, concretely, could be a struct ReportAggregationMetadata or a Cow of it?

In which case a naive map(Cow::Borrowed) would double Cow a Cow<'_, ReportAggregationMetadata>, so we need a trait method to avoid this.

Copy link
Member Author

@branlwyd branlwyd Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. The implementation of ReportAggregationUpdate over Cow exists to allow a Cow<WritableReportAggregation> (or a Cow of anything else that implements ReportAggregationUpdate) to be provided by the user of AggregationJobWriter, without needing to deal with Cow<Cow<T>> internally.

@@ -2373,24 +2375,23 @@ impl VdafOps {
}

// Write report shares, and ensure this isn't a repeated report aggregation.
try_join_all(report_share_data.iter_mut().map(|rsd| {
let report_aggregations = try_join_all(report_share_data.iter().map(|rsd| {
let task = Arc::clone(&task);
Copy link
Contributor

@inahga inahga Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we accomplish the goal of the PR by doing .iter_mut() and modifying the elements of ReportShareData in place? We would need to at least wrap it in a std::sync::Mutex for this to work, but it would avoid copying in all cases and save a bunch of code.

(Just checking if you've considered it, it probably is more involved of a change than I'm thinking).

Copy link
Member Author

@branlwyd branlwyd Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't modify report_share_data: it's in an Arc to avoid copying the ReportShareData inside the Vec, and Arc doesn't allow mutable references. Or, to look at it another way, we can't modify report_share_data because we might need the unmodified value if we retry the transaction callback.

(We used to be able to modify report_share_data, but that's because we were cloning the entire Vec & its contents with each transaction retry -- which is the copying we are trying to avoid with this PR.)

Copy link
Contributor

@inahga inahga Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I was able to coerce my suggestion into "working" by having report_share_data being in an Arc<tokio::sync::Mutex<_>>, but it then becomes difficult to take a .report_aggregation for use in the AggregationJobWriter without .clone()ing it--i.e. I landed on needing to pass around a Cow anyway. So I think your approach is super-correct 👍. (Not that I doubted it, was just curious where exactly Mutex breaks down).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- N.B. the problem with Arc<Mutex<Vec<T>>> is that, without a Cow, ultimately we would be modifying the report_share_data, which would remain modified in the case of a transaction retry (and on retry, we want to start from the same data as in the previous attempt, to ensure correctness).

As you note, we could do Arc<Mutex<Vec<Cow<T>>>>, but at that point, I think we're better off just generating a new Vec<Cow<T>> based off of the report_share_data since we need to modify the vector anyway.

@divergentdave
Copy link
Contributor

I re-tested this and it demonstrated the desired drop in peak memory usage. (something a bit below 50%) I merged https://github.com/divergentdave/janus/tree/david/experiment-dhat-harness-2 onto this branch, built it in release mode with debug symbols, populated test fixture files by running the different subcommands several times, and then ran valgrind --tool=dhat --dhat-out-file=dhat.out.helper-arc-cow ./target/profiling/profiler_harness helper. I popped the resulting file into the DHAT viewer static webapp, and looked at the "At t-gmax (bytes)" metric.

@branlwyd
Copy link
Member Author

Thank you for profiling this!

@branlwyd branlwyd merged commit 7e6eca7 into main Jul 15, 2024
9 checks passed
@branlwyd branlwyd deleted the bran/remove-output-share-copy branch July 15, 2024 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Helper peak memory usage can be reduced
4 participants