Skip to content

Conversation

@mheinzel
Copy link
Collaborator

Description

Should go on top of #608 and #609. We should also add some union-related labelling.

@mheinzel mheinzel mentioned this pull request Mar 10, 2025
@mheinzel mheinzel force-pushed the mheinzel/qls-enable-unions branch from a96a9cb to 46b6c90 Compare March 10, 2025 18:50
mheinzel added a commit that referenced this pull request Mar 10, 2025
This commit should be dropped before merging, in favour of #614.
@dcoutts
Copy link
Collaborator

dcoutts commented Mar 13, 2025

We can rebase this once #608 is merged.

@mheinzel mheinzel force-pushed the mheinzel/qls-enable-unions branch from 46b6c90 to 402b713 Compare March 13, 2025 17:56
@mheinzel mheinzel marked this pull request as ready for review March 13, 2025 17:56
Copy link
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice nice!

Comment on lines 2007 to 2010
genUnionTableVar = QC.frequency [
(9 * length unionDescendantTableVars, genUnionDescendantTableVar)
, (1 * length notUnionDescendantTableVars, genNotUnionDescendantTableVar)
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is to skew more towards union descendants?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though in a follow-up patch I made this even more extreme and moved the trivial cases of non-union tables to a unit test, and replaced genUnionTableVar with genUnionDescendantTableVar so the QLS test only covers the non-trivial cases of union-derived tables.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the original code does not do what it intends to do. It intends to have 90% : 10% skew toward union tables, but in fact the result is 40% : 60% skew, i.e. 60% are trivial non-union tables.

I fix this in a later patch (by removing the trivial case entirely), but it could also be fixed by moving the two cases up into the parent QC.frequency, each with their corresponding guard. That way we don't get forced into picking the trivial case when there are no non-trivial cases available. It's very common that there are no union tables available.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 but even in the parent QC.frequency you'd get a different distribution than the integers you supply, depending on what actions are available and which guards are true/false. But I agree it's still better

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you would, but that should then work as intended.

@dcoutts dcoutts force-pushed the mheinzel/qls-enable-unions branch from 402b713 to 72d1f12 Compare March 14, 2025 21:32
@dcoutts
Copy link
Collaborator

dcoutts commented Mar 14, 2025

@jorisdral I see my changes raced with your review, but I happened to address some of your comments.

@dcoutts dcoutts requested a review from jorisdral March 14, 2025 21:43
@dcoutts dcoutts force-pushed the mheinzel/qls-enable-unions branch from 449d67f to c66028b Compare March 14, 2025 22:56
Copy link
Collaborator

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously I'm also reviewing my own changes here, so I think @mheinzel should review this one too now.

Comment on lines +1808 to +1811
-- We already want to enable unions, but some operations on tables don't
-- support unions yet. Therefore, we want to only run them on tables that
-- don't descend from a union.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the extra changes I've added to this PR, there's now another reason: because we don't want to squander QLS coverage on boring trivial cases that are now covered in unit tests.

Comment on lines 2007 to 2010
genUnionTableVar = QC.frequency [
(9 * length unionDescendantTableVars, genUnionDescendantTableVar)
, (1 * length notUnionDescendantTableVars, genNotUnionDescendantTableVar)
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the original code does not do what it intends to do. It intends to have 90% : 10% skew toward union tables, but in fact the result is 40% : 60% skew, i.e. 60% are trivial non-union tables.

I fix this in a later patch (by removing the trivial case entirely), but it could also be fixed by moving the two cases up into the parent QC.frequency, each with their corresponding guard. That way we don't get forced into picking the trivial case when there are no non-trivial cases available. It's very common that there are no union tables available.

--
-- Tests for 0-way and 1-way unions are included in the UnitTests
-- module. n-way unions for n>3 lead to large unions, which are less
-- Tests for 1-way unions are included in the UnitTests module.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point being that there are no 0-way tests (the type forbids it), so it's not actually covered in the unit tests.

[ genActionsSession
, genActionsTables
, genUnionActions
, genActionsUnion
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for consistency, while we're at it

Comment on lines +2251 to +2242
-- | The state of model tables at the point they were closed. This is used
-- to augment the tables from the final model state (which of course has
-- only tables still open in the final state).
, closedTables :: !(Map Model.TableID Model.SomeTable)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out that I never needed this change (as I had expected I would) but I've left it here anyway since it's a cheap and sane generalisation, and may indeed be useful later.

Comment on lines +2264 to +2257
-- | The subset of tables (open or closed) that were created as a result
-- of a union operation. This can be used for example to select subsets of
-- the other per-table tracking maps above, or the state from the model.
-- The map value is the size of the union table at the point it was created,
-- so we can distinguish trivial empty unions from non-trivial.
, unionTables :: !(Map Model.TableID Int)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the reason we need this is that the model tables do not in fact tell us that they're derived from a union, despite the existence of isUnionDescendant :: Model.Table k v b -> IsUnionDescendant. That's because Model.Table k v b and Model.Table k v b are not the same type. One is Database.LSMTree.Model.Session as Model and the other is Database.LSMTree.Model.Table as Model. The latter is the one we get from the model state, and does not tell us about unions. The former is really a table handle, while the latter is a table state.

Otherwise I would have used the model final state + the closed tables (generalised in the previous patch) and selected a subset using isUnionDescendant.

But as it is, we have to track union tables again in the stats.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because Model.Table k v b and Model.Table k v b are not the same type. One is Database.LSMTree.Model.Session as Model and the other is Database.LSMTree.Model.Table as Model.

This is a slightly confusing sentence. Maybe we should qualify the imports differently? Or is there a typo?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a confusing sentence reflecting the confusion I encountered in debugging it :-)

In each individual module the imports are clear, but it goes like this:

module Test.Database.LSMTree.StateMachine where
import qualified Database.LSMTree.Model.Session as Model

module Database.LSMTree.Model.Session where
import qualified Database.LSMTree.Model.Table as Model

so for code like

        , let unionTables =
                Map.filter (Model.withSomeTable (\t -> Model.isUnionDescendant t == Model.IsUnionDescendant))
                           (snd <$> Model.tables finalState)

we get

    • Couldn't match expected type: Model.Table k40 v0 b0
                  with actual type: Database.LSMTree.Model.Table.Table k v b
      NB: ‘Model.Table’ is defined in ‘Database.LSMTree.Model.Session’
          ‘Database.LSMTree.Model.Table.Table’
            is defined in ‘Database.LSMTree.Model.Table’
    • In the first argument of ‘Model.isUnionDescendant’, namely ‘t’

which is sort-of clear actually. GHC does tell us the definition sites.

Comment on lines +162 to +165
inserts table' [(Key1 17, Value1 43, Nothing)]
inserts table'' [(Key1 17, Value1 44, Nothing)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! The untested unit test was buggy!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which no doubt git blame will tell me was my fault!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was all me 😜

Comment on lines 847 to 849
| credits <= 0 = do
_ <- guardTableIsOpen t
pure c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit more consistent/regular to specify it this way. We don't really expect the impl to allow supplying 0 credits to closed tables without an error. All operations on closed tables are an error, trivial or not.

Comment on lines 2002 to 2005
genUnionTableVar = QC.frequency [
(9 * length unionDescendantTableVars, genUnionDescendantTableVar)
, (1 * length notUnionDescendantTableVars, genNotUnionDescendantTableVar)
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This nesting of QC.frequency in an outer QC.frequency doesn't do what one might expect. The decisions are dependent rather than independent.

Comment on lines +1639 to +1657
--TODO: remove this 0 special case once the general case covers it.
-- We do not need to optimise the 0 case. It is just here to
-- simplify test coverage.
| otherwise -> error "supplyUnionCredits: not yet implemented"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured we would more reliably notice this TODO and resolve it when we implement the general case, rather than notice a TODO on an out-of-the-way unit test to say to enable it once supplyUnionCredits was implemented. So better to have the unit test running, and temporarily implement the special case in the real impl.

@dcoutts
Copy link
Collaborator

dcoutts commented Mar 14, 2025

If @jorisdral and @mheinzel are happy, we can squash the fixes and merge.

Comment on lines +2264 to +2257
-- | The subset of tables (open or closed) that were created as a result
-- of a union operation. This can be used for example to select subsets of
-- the other per-table tracking maps above, or the state from the model.
-- The map value is the size of the union table at the point it was created,
-- so we can distinguish trivial empty unions from non-trivial.
, unionTables :: !(Map Model.TableID Int)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because Model.Table k v b and Model.Table k v b are not the same type. One is Database.LSMTree.Model.Session as Model and the other is Database.LSMTree.Model.Table as Model.

This is a slightly confusing sentence. Maybe we should qualify the imports differently? Or is there a typo?

Comment on lines +162 to +165
inserts table' [(Key1 17, Value1 43, Nothing)]
inserts table'' [(Key1 17, Value1 44, Nothing)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was all me 😜

Comment on lines 2007 to 2010
genUnionTableVar = QC.frequency [
(9 * length unionDescendantTableVars, genUnionDescendantTableVar)
, (1 * length notUnionDescendantTableVars, genNotUnionDescendantTableVar)
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 but even in the parent QC.frequency you'd get a different distribution than the integers you supply, depending on what actions are available and which guards are true/false. But I agree it's still better

@jorisdral jorisdral force-pushed the mheinzel/qls-enable-unions branch from 1cde123 to 07c4354 Compare March 17, 2025 10:25
mheinzel and others added 11 commits March 17, 2025 12:02
The parentTable in the Stats tracks the ultimate parent for each table.
This is used along with the action log to measure coverage of
operations on table duplicates.

With unions, a table can now have multiple ultimate parents. A union
table's ultimate parents are the ultimate parents of each of its input
tables.

Also deal with the consequences of multiple parent tables for the
dupTableActionLog tracking: we extend the action log for _each_ ultimate
parent table separately.
Co-authored-by: Joris Dral <joris@well-typed.com>
Track the whole closed table, not just the table size. This will let us
reuse the same tracking for additional purposes.
Label the number of tables with trivial and non-trivial unions.

Label the number of actions on tables from non-trivial unions.
Can be enabled now that unions are implemented.

And of course, because the test was not being run, there was a
previously undetected bug in the unit test. Now fixed.
…redits

The trivial case is when the table has no union level, as then there is
no union debt, and supplying credits returns them all as leftovers.

And implement these trivial cases.
We cover the trivial case of querying or supplying credits to tables
that are not derived from a union operation in the unit tests, so we do
not need to spend coverage space in the QLS tests on the trivial cases.

So only gnerate RemainingUnionDebt, SupplyUnionCredits for tables that
are derived from union operations.
Same principle as previous patches. Keep the precious coverage space in
the QLS tests for the non-trivial cases, and cover the trivial case in a
unit test.
@dcoutts dcoutts force-pushed the mheinzel/qls-enable-unions branch from e837a83 to a3c5cd0 Compare March 17, 2025 12:02
Slightly better memory use in the tests.
@dcoutts dcoutts enabled auto-merge March 17, 2025 16:34
@dcoutts dcoutts added this pull request to the merge queue Mar 17, 2025
Merged via the queue into main with commit 6c219ca Mar 17, 2025
27 checks passed
@dcoutts dcoutts deleted the mheinzel/qls-enable-unions branch March 17, 2025 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants