Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework column_knowledge so it is recursion_safe #18161

Closed
Tracked by #17915
aalexandrov opened this issue Mar 16, 2023 · 0 comments · Fixed by #18323
Closed
Tracked by #17915

Rework column_knowledge so it is recursion_safe #18161

aalexandrov opened this issue Mar 16, 2023 · 0 comments · Fixed by #18323
Assignees

Comments

@aalexandrov
Copy link
Contributor

aalexandrov commented Mar 16, 2023

Size estimate

L

Notes

  • The transform maintains a Let-based context of Vec<DatumKnowledge> estimated for each binding that is currently in scope.
  • The transform then acts as a bottom up transform that both:
    • Computes Vec<DatumKnowledge> for the current node based on the Vec<DatumKnowledge> for its children and the current LocalId ⇒ Vec<DatumKnowledge> context, and
    • Uses the Vec<DatumKnowledge> for the current node to simplify its MirScalarExpr children.

LetRec implementation (basic)

Update: This was not sound as highlighted by @ggevay in this PR comment. We're abandoning this in #18323 in favor of the advanced solution sketched below.

As a first approximation, we can just:

  • Initialize knowledge[i][j] = DatumKnowledge::default() for each column j and CTE `i.
  • Using the initial estimates, descend with harvest into the original values, updating knowledge[i][j] sequentially using absorb, and then continue to the body.

LetRec implementation (advanced)

First, we need to rework DatumKnowledge to be a proper complete lattice. The proposed structure is

enum DatumKnowledge {
  Any(/* nullable: */ bool),
  Literal(/* value: */ Result<mz_repr::Row, EvalError>, /* typ: */ ScalarType),
  Nothing
}

where

DatumKnowledge::Nothing < DatumKnowledge::Literal(_, _) < DatumKnowledge::Any(true)`

and

DatumKnowledge::Any(false) < DatumKnowledge::Any(true) 

As a diagram using the INT type:

graph BT;
    Nothing --> lit_dots("Literal(..., Int64)");
    Nothing --> lit_1("Literal(Ok(1), Int64)");
    Nothing --> lit_2("Literal(Ok(2), Int64)");
    Nothing --> lit_err("Literal(Err(...), Int64)");
    Nothing --> lit_null("Literal(Ok(null))");
    lit_dots --> any_false("Any(false)");
    lit_1 --> any_false("Any(false)");
    lit_2 --> any_false("Any(false)");
    lit_err --> any_false("Any(false)");
    lit_null --> any_true("Any(true)");
    any_false --> any_true;

For the LetRec case, we try to find a fixpoint using forward analysis and the join operator of that fixpoint.

  • In the base case, we initialize knowledge[i][j] = DatumKnowledge::bottom() for each column j and CTE i because each CTE is initialized to the empty collection.
  • In the inductive case, we sequentially update knowledge[i][j] to the union of its current state and the knowledge that can be computed on a clone of values[i] using the current estimates.
  • We bound our attempts to at most N = min(100, 4 * C) where C is the sum of all LetRec binding arities. If no fixpoint is reached in N iterations, we set knowledge[i][j] = DatumKnowledge::top().

Once the above loop is done, we descend with harvest into the original values and then to the body using the estimated LetRec binding knowledge.

Open questions

  1. Can the inductive case from the Implementation sketch above diverge? (cc: @mgree I think in our offline discussion earlier today you came up with some good reasons against that concern)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant