Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compaction seems to be broken #2170

Open
frankmcsherry opened this issue Mar 2, 2020 · 3 comments
Open

Compaction seems to be broken #2170

frankmcsherry opened this issue Mar 2, 2020 · 3 comments
Labels
A-dataflow Area: dataflow C-bug Category: something is broken T-memory Theme: memory
Projects
Milestone

Comments

@frankmcsherry
Copy link
Contributor

Reading some data from files, with tailing set to true, we end up in a state with a materialized tripdata collection where

materialize=> select * from mz_catalog.mz_records_per_dataflow_global;
 id  |                                name                                | records 
-----+--------------------------------------------------------------------+---------
 779 | Dataflow: materialize.public.tripdata_primary_idx                  | 1829830
 197 | Dataflow: mz_catalog.mz_dataflow_names_primary_idx                 |    1016
 316 | Dataflow: mz_catalog.mz_records_per_dataflow_primary_idx           |     595
 520 | Dataflow: mz_catalog.mz_perf_peek_durations_core_primary_idx       |       0
 485 | Dataflow: mz_catalog.mz_perf_arrangement_records_primary_idx       |     974
 375 | Dataflow: mz_catalog.mz_perf_dependency_frontiers_primary_idx      |     208
 567 | Dataflow: mz_catalog.mz_perf_peek_durations_bucket_primary_idx     |       0
 167 | Dataflow: mz_catalog.mz_addresses_with_unit_length_primary_idx     |    1368
 355 | Dataflow: mz_catalog.mz_records_per_dataflow_global_primary_idx    |     504
 239 | Dataflow: mz_catalog.mz_dataflow_operator_dataflows_primary_idx    |    1684
 287 | Dataflow: mz_catalog.mz_records_per_dataflow_operator_primary_idx  |     895
 724 | Dataflow: mz_catalog.mz_perf_peek_durations_aggregates_primary_idx |       0
(12 rows)

Time: 1.829 ms
materialize=> select count(*) from (select distinct * from tripdata);
 count 
-------
  3105
(1 row)

which remains steady with time. This is with DIFFERENTIAL_EAGER_MERGE=1000 set, so in principle this should be compacting down to the limiting size given enough time. For some reason this isn't happening. Quick println! action indicates that the coordinator is providing AllowCompaction messages for the relevant arrangement u15.

When we check out the arrangement information

materialize=> select * from mz_arrangement_sizes where operator = 855;
 operator | worker | records | batches 
----------+--------+---------+---------
      855 |      0 | 1829830 |       2
(1 row)

Time: 1.046 ms
materialize=> 

which suggests that physical compaction continues to happen, but logical compaction (where times would align and we would collapse down to just 3105 records) doesn't seem to occur.

@frankmcsherry frankmcsherry added C-bug Category: something is broken A-dataflow Area: dataflow labels Mar 2, 2020
@frankmcsherry frankmcsherry self-assigned this Mar 2, 2020
@frankmcsherry
Copy link
Contributor Author

frankmcsherry commented Mar 14, 2020

More detail on this.

Compaction in materialized isn't broken. What is going on is that differential has a hard time automatically deciding between two reasonable choices:

  1. "you have advanced time but haven't added any data; I should go quiet" and
  2. "you have advanced time but haven't added any data; I should apply logical compaction to otherwise physically compact traces".

The former is popular and was requested. The latter is what we should be doing here, at least if we want the footprint to drop once the compaction window catches up to the moment we stopped adding data.

It is a bit tricky for differential to automatically sort this out. It could just keep running hot, continually compacting traces; this is fine, but a weird experience on personal computers. It could have specialized logic for total orders where there is easier to see; a lot more abstraction breaking here. Probably the compaction machinery should just be more "user programmable".

There are some exacerbating factors in materialized:

  1. We have a 60 second compaction window by default. This means that there is plenty of time for the collection to reach "physically compacted" form (one non-empty batch) before we would notice the benefits of logical compaction. Reducing this made the issue start to go away. It may not be as important to have such a window if we have more reliable timestamp advances in our input streams.

  2. We load static data (pre-existing Kafka streams and files) with increasing timestamps, to give an "interactive" feel. This results in some wasted computation (like Create sources "as of" a timestamp/frontier #1210) but also a lot of uncompacted changes. Had the sources been loaded up with one time, there would be no need for compaction (but also no fun partial / incorrect answers streaming in). Resolving Create sources "as of" a timestamp/frontier #1210 and getting folks to use it would mitigate a fair bit of the pain here.

  3. We don't allow users to control the compaction frontier. This is totally something that could be exposed, either explicitly (e.g. to "pin" a view at a logical time to ensure it stays available for queries at that time) or implicitly (if we cannot select a timestamp for a query due to tight compaction, we can stop compacting each input arrangement until the upper bounds of each align and allow the query).

Putting a single record in reduces the size to compacted state. Not a great solution, but this might also be something that we only expect in some cases (worst case: lots of changes all of a sudden, and then quiet; like when loading static data).

@frankmcsherry
Copy link
Contributor Author

Filed #2310 to track the issue that we may want to use fewer timestamps in loading data. That isn't the only potential resolution to this issue, and wouldn't resolve all cases, but it would probably mitigate the most obvious cases.

@frankmcsherry frankmcsherry added the T-memory Theme: memory label Sep 3, 2020
@frankmcsherry frankmcsherry removed their assignment May 13, 2021
@awang awang added this to Icebox in Compute via automation Jun 30, 2021
@awang awang removed the G-dataflow label Jun 30, 2021
@uce uce added this to the Later milestone Jul 19, 2021
@ruchirK
Copy link
Contributor

ruchirK commented Dec 4, 2021

I hit this yesterday with restarting from a persisted kafka upsert source. Just to make sure I clearly understand the issue here:

The main reason we don't want to do logical compaction over and over again is because we would then continually be updating times and potentially consolidating records, and most likely, after some time, that would just be wasted cycles right? It's not so much "logical compaction is expensive" as "at some point, the data in the batch are fully logically compacted, and after that, doing more logical compaction is kind of silly" -- do I have that right?

It seems like when there are single dimensional times, you can detect whether a batch is fully logically compacted by looking at the number of distinct (key, val) vs the number of distinct ((key, val), time) -- if theres more of the latter than the former, then there's potentially some compaction we could still do to consolidate things down.

I dunno yet how that generalizes to partially ordered times, and I know you said "could have specialized logic for total orders where there is easier to see; a lot more abstraction breaking here.". I'm not really advocating that we should do this, just want to double check

  1. if we had a heuristic for "could doing logical compaction lead to data reduction" then it would be easier for the system to know what to do?

  2. the heuristic mentioned above is a valid one for totally ordered times

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dataflow Area: dataflow C-bug Category: something is broken T-memory Theme: memory
Projects
No open projects
Compute
Icebox
Development

No branches or pull requests

4 participants