Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated left-join is too complicated for Postgres to plan well #72

Open
mitchellwrosen opened this issue Jun 22, 2021 · 9 comments · Fixed by #81
Open

Generated left-join is too complicated for Postgres to plan well #72

mitchellwrosen opened this issue Jun 22, 2021 · 9 comments · Fixed by #81

Comments

@mitchellwrosen
Copy link

mitchellwrosen commented Jun 22, 2021

Hi folks,

My colleague @tstat and I were playing around with the library for the first time and we ran into a performance issue with a generated query that may or may not be totally expected. I understand that translating monadic syntax to SQL queries is perhaps going to be unavoidably "for-loopy" at times, so please feel free to close this issue as a #wontfix, although if that's the case I think we could put together some documentation that mention some known performance gotchas so that users won't necessarily have to examine plan output themselves.

Anyways, the query we were interested in writing with rel8 was a simple left-join. The schema is unimportant here, but let me know if it would be helpful to provide a repo that reproduces this issue exactly. (I think it will be clear enough from the following high-level description).

We have some students table, and students have first and last names, and we're interested in pairing students whose first name maybe matches the last name of another student:

select *
from students s1
left join students s2 on s1.first = s2.last;

This generates the following plan:

 Hash Left Join  (cost=2.35..4.78 rows=60 width=114)
   Hash Cond: (s1.first = s2.last)
   ->  Seq Scan on students s1  (cost=0.00..1.60 rows=60 width=57)
   ->  Hash  (cost=1.60..1.60 rows=60 width=57)
         ->  Seq Scan on students s2  (cost=0.00..1.60 rows=60 width=57)

In rel8, we translated this as:

s1 <- each studentsSchema
s2 <- optional do
  s2 <- each studentsSchema
  where_ (first s1 ==. last s2)
  pure s2
pure (s1, s2)

which generated a messier version of the following query:

select *
from students s1
cross join lateral (
  select *
  from (select 0) t1
  left outer join (
    select *
    from students s2
    where s1.first = s2.last
  ) t2 on true
) t2;

Unfortunately, Postgres' optimizer does not seem to consider a hash join in this case, and falls back to a much more expensive nested loop join:

 Nested Loop  (cost=0.00..109.00 rows=60 width=118)
   ->  Seq Scan on students s1  (cost=0.00..1.60 rows=60 width=57)
   ->  Nested Loop Left Join  (cost=0.00..1.77 rows=1 width=61)
         ->  Result  (cost=0.00..0.01 rows=1 width=0)
         ->  Seq Scan on students s2  (cost=0.00..1.75 rows=1 width=57)
               Filter: (s1.first = last)

Curiously, this is not merely due to the existence of a lateral join with a pushed-in where-clause. We tried rewriting the original SQL in this style:

select *
from students s1
left join lateral (
  select *
  from students s2
  where s1.first = s2.last
) t2 on true;

and in this case Postgres did again consider the hash join:

 Hash Left Join  (cost=2.35..4.78 rows=60 width=114)
   Hash Cond: (s1.first = s2.last)
   ->  Seq Scan on students s1  (cost=0.00..1.60 rows=60 width=57)
   ->  Hash  (cost=1.60..1.60 rows=60 width=57)
         ->  Seq Scan on students s2  (cost=0.00..1.60 rows=60 width=57)
@ocharles
Copy link
Contributor

Very interesting! I imagine there are no indices here. In practice we tend to index these join conditions, at which point we usually see a nested loop and index scan - but I don't think we ever looked at an un-indexed join. Will look into this!

@ocharles
Copy link
Contributor

Could you share select version(), too?

@ocharles
Copy link
Contributor

ocharles commented Jun 23, 2021

Ok, we think this is a slightly deeper problem in Opaleye itself. At work, we use Opaleye with tomjaguarpaw/haskell-opaleye#480 applied, which produces the plan that you're expecting. There are a few solutions (one is to just use that PR, another is to write smarter code). We'll see where this PR goes first, but we at least do have a handle on the problem now.

Thanks for reporting this!

ocharles added a commit that referenced this issue Jun 28, 2021
0.7.3 has important changes to where lateral joins are introduced, and fixes #72.
shane-circuithub pushed a commit that referenced this issue Jun 28, 2021
0.7.3 has important changes to where lateral joins are introduced, and fixes #72.
@tstat
Copy link

tstat commented Jun 29, 2021

I believe that the original query was an example of the following problem: a lateral subquery that does not get pulled up constrains the planner to do a nested loop to perform the join. and a nested loop isn't the most efficient way to do every join. This problem is described a bit here.

While the changes to opaleye simplified the original example enough to allow it to be pulled up It is probably worth noting that there remain many examples where the subquery would not be pulled up and the original problem as described above returns. For example, I think this query would be natural to write with rel8's api:

    s1 <- each studentsSchema
    s2 <- many $ do
      s2 <- each studentsSchema
      where_ $ stlastname s1 ==. stlastname s2
      pure (stfirstname s2)
    pure (stlastname s1, s2)

The array_agg in the lateral subquery prevents it from getting pulled up, and the lateral forces a nested loop, resulting in the following plan:

: Nested Loop Left Join  (cost=0.00..6643.83 rows=430 width=64)
:   ->  Seq Scan on students "T1"  (cost=0.00..14.30 rows=430 width=32)
:   ->  Subquery Scan on "T1_1"  (cost=0.00..15.41 rows=1 width=33)
:         ->  GroupAggregate  (cost=0.00..15.40 rows=1 width=36)
:               Group Key: 0
:               ->  Seq Scan on students "T1_2"  (cost=0.00..15.38 rows=2 width=36)
:                     Filter: ("T1".stlastname = stlastname)

compared to a non-lateral variant:

  explain
  select
  s1.stlastname,
  s2.first_names
  from students s1
  left outer join (
   select array_agg(s2.stfirstname) as first_names,
   s2.stlastname
   from students s2
   group by s2.stlastname
  ) s2 using (stlastname)

which allows postgres to consider a hash join:

: Hash Left Join  (cost=23.45..38.90 rows=430 width=64)
:   Hash Cond: (s1.stlastname = s2.stlastname)
:   ->  Seq Scan on students s1  (cost=0.00..14.30 rows=430 width=32)
:   ->  Hash  (cost=20.95..20.95 rows=200 width=64)
:         ->  HashAggregate  (cost=16.45..18.95 rows=200 width=64)
:               Group Key: s2.stlastname
:               ->  Seq Scan on students s2  (cost=0.00..14.30 rows=430 width=64)

@ocharles
Copy link
Contributor

ocharles commented Jul 1, 2021

Thanks @tstat, this is a great point. I'm not entirely sure what to do, but two options come to mind:

  1. Do nothing. This is really an appeal to hope PostgreSQL gets smarter and can optimize what we give it into something better.
  2. Try and do the "pulling up" ourselves, if possible. This is basically getting into writing an optimizer, which we've been trying very hard to avoid as it's really hard to do right.

The possibility of introducing a special non-lateral leftJoin function is hard. The obvious leftJoin :: Query a -> (a -> Expr Bool) -> Query (MaybeTable a) isn't enough - that Query a could still refers to other bound variables in Query. It's a specific goal of Rel8 to not need ST like tricks to deal with scoping. That said, maybe doing a LATERAL LEFT JOIN would be fine, and if it's not actually LATERAL, PostgreSQL will be smart enough to figure out what's going on. Either way, I'm not very excited about this because it doesn't play well with "join functions" that I advocate for in my Rel8 talk.

I dunno if @shane-circuithub has any ideas here.

@shane-circuithub
Copy link
Contributor

shane-circuithub commented Jul 8, 2021

I appreciate all the input from @tstat and and @mitchellwrosen here. I still don't have a good intuition for this but hopefully that will come!

Having said that, I don't think of the examples in @tstat's comment as being the same query (the Rel8 version and the Postgres version). If I were to translate the Postgres version into Rel8, I would use Rel8.Tabulate (which unfortunately we haven't released yet).

studentsByLastname :: Tabulation (Expr Text) (Student Expr)
studentsByLastname = fromQuery $ do
  student@Student {lastname <- each studentsSchema
  pure (lastname, student)

example :: Tabulation (Expr Text) (ListTable Expr (Expr Text))
example = do
  _ <- studentsByLastname
  manyTabulation $ firstname <$> studentsByLastname

I haven't tested this (if you could post the SQL to create this students table and any indices, that would be helpful), but I'm almost certain that would produce the same query plan as the handwritten SQL.

@ocharles
Copy link
Contributor

ocharles commented Jul 8, 2021

@shane-circuithub has asked me to elaborate on the above.

First, I can reproduce the bug/poor query plan using just values:

students :: Query (Expr Text, Expr  Text)
students = values [(lit firstName, lit lastName) | firstName <- ["A", "B", "C"], lastName <- [ "D", "E"] ]

query1 = do
    s1 <- students
    s2 <- many $ do
      s2 <- students
      where_ $ snd s1 ==. snd s2
      pure (fst s2)
    pure (snd s1, s2)

query1 produces:

SELECT
CAST("values1_1" AS text) as "_1",
CAST(CASE WHEN ("rebind0_7") IS NULL THEN CAST(ARRAY[] AS text[]) ELSE "result0_6" END AS text[]) as "_2"
FROM (SELECT *
      FROM
      (SELECT
       *
       FROM (SELECT "column1" as "values0_1",
                    "column2" as "values1_1"
             FROM
             (VALUES
              (CAST(E'A' AS text),CAST(E'D' AS text)),
              (CAST(E'A' AS text),CAST(E'E' AS text)),
              (CAST(E'B' AS text),CAST(E'D' AS text)),
              (CAST(E'B' AS text),CAST(E'E' AS text)),
              (CAST(E'C' AS text),CAST(E'D' AS text)),
              (CAST(E'C' AS text),CAST(E'E' AS text))) as "V") as "T1") as "T1"
      LEFT OUTER JOIN
      LATERAL
      (SELECT
       TRUE as "rebind0_7",
       *
       FROM (SELECT
             *
             FROM (SELECT
                   ARRAY_AGG("inner0_6") as "result0_6"
                   FROM (SELECT
                         "values0_3" as "inner0_6",
                         *
                         FROM (SELECT
                               *
                               FROM (SELECT "column1" as "values0_3",
                                            "column2" as "values1_3"
                                     FROM
                                     (VALUES
                                      (CAST(E'A' AS text),CAST(E'D' AS text)),
                                      (CAST(E'A' AS text),CAST(E'E' AS text)),
                                      (CAST(E'B' AS text),CAST(E'D' AS text)),
                                      (CAST(E'B' AS text),CAST(E'E' AS text)),
                                      (CAST(E'C' AS text),CAST(E'D' AS text)),
                                      (CAST(E'C' AS text),CAST(E'E' AS text))) as "V") as "T1"
                               WHERE (("values1_1") = ("values1_3"))) as "T1") as "T1"
                   GROUP BY COALESCE(0)) as "T1") as "T1") as "T2"
      ON
      TRUE) as "T1"

which has the plan

 Nested Loop Left Join  (cost=0.00..0.84 rows=6 width=64)
   ->  Values Scan on "*VALUES*"  (cost=0.00..0.08 rows=6 width=32)
   ->  Subquery Scan on "T1"  (cost=0.00..0.12 rows=1 width=33)
         ->  GroupAggregate  (cost=0.00..0.11 rows=1 width=36)
               Group Key: 0
               ->  Values Scan on "*VALUES*_1"  (cost=0.00..0.09 rows=1 width=36)
                     Filter: ("*VALUES*".column2 = column2)
(7 rows)

If we use the upcoming Rel8.Tabulate interface, we can write:

query2 = toQuery do
  _ <- studentsByLastName
  manyTabulation (fst <$> studentsByLastName)
  where
    studentsByLastName :: Tabulation (Expr Text) (Expr Text, Expr Text)
    studentsByLastName = fromQuery do
      s <- students
      return (snd s, s)

which produces

SELECT
CAST("values1_1" AS text) as "_1",
CAST(CASE WHEN ("rebind0_13") IS NULL THEN CAST(ARRAY[] AS text[]) ELSE "result1_6" END AS text[]) as "_2"
FROM (SELECT *
      FROM
      (SELECT
       *
       FROM (SELECT "column1" as "values0_1",
                    "column2" as "values1_1"
             FROM
             (VALUES
              (CAST(E'A' AS text),CAST(E'D' AS text)),
              (CAST(E'A' AS text),CAST(E'E' AS text)),
              (CAST(E'B' AS text),CAST(E'D' AS text)),
              (CAST(E'B' AS text),CAST(E'E' AS text)),
              (CAST(E'C' AS text),CAST(E'D' AS text)),
              (CAST(E'C' AS text),CAST(E'E' AS text))) as "V") as "T1") as "T1"
      LEFT OUTER JOIN
      LATERAL
      (SELECT
       TRUE as "rebind0_13",
       *
       FROM (SELECT
             *
             FROM (SELECT
                   "inner0_6" as "result0_6",
                   ARRAY_AGG("inner1_6") as "result1_6"
                   FROM (SELECT
                         "values1_4" as "inner0_6",
                         "values0_4" as "inner1_6",
                         *
                         FROM (SELECT
                               *
                               FROM (SELECT "column1" as "values0_4",
                                            "column2" as "values1_4"
                                     FROM
                                     (VALUES
                                      (CAST(E'A' AS text),CAST(E'D' AS text)),
                                      (CAST(E'A' AS text),CAST(E'E' AS text)),
                                      (CAST(E'B' AS text),CAST(E'D' AS text)),
                                      (CAST(E'B' AS text),CAST(E'E' AS text)),
                                      (CAST(E'C' AS text),CAST(E'D' AS text)),
                                      (CAST(E'C' AS text),CAST(E'E' AS text))) as "V") as "T1") as "T1") as "T1"
                   GROUP BY "inner0_6") as "T1"
             WHERE (("values1_1") = ("result0_6"))) as "T1") as "T2"
      ON
      TRUE) as "T1"

which has the plan

 Hash Right Join  (cost=0.26..0.47 rows=6 width=64)
   Hash Cond: ("T1".result0_6 = "*VALUES*".column2)
   ->  Subquery Scan on "T1"  (cost=0.11..0.24 rows=6 width=65)
         ->  HashAggregate  (cost=0.11..0.18 rows=6 width=64)
               Group Key: "*VALUES*_1".column2
               ->  Values Scan on "*VALUES*_1"  (cost=0.00..0.08 rows=6 width=64)
   ->  Hash  (cost=0.08..0.08 rows=6 width=32)
         ->  Values Scan on "*VALUES*"  (cost=0.00..0.08 rows=6 width=32)

There's nothing really special about Rel8.Tabulate, it's just a convenient way to write these types of queries. We think of a Tabulation k v as like a Map k v. The monad instance for a Tabulation is intersection.

@tstat
Copy link

tstat commented Jul 13, 2021

Interesting, it looks ilke in the Tabulate plan that the left outer join lateral doesn't contain an array_agg because that is pushed into another subquery within the lateral subquery, and this inner subquery doesn't contain any lateral references. This results in the inner subquery not being pulled up (due to the aggregate), but the lateral subquery is pulled up and the lateral is eliminated.

As far as what to do about the problem: I think some documentation about performance considerations would be helpful to the users of rel8.

We have already discussed how lateral queries force a particular join method, but another constraint to be mindful of is that they force a join order. Consider the following example:

  create table s (
    x int8 not null primary key,
    y int8 not null
  );
  
  insert into s
  select n, mod(n, 2)
  from generate_series(1, 5) as s(n)
  order by random();
  
  create table t (
    x int8 not null primary key,
    y int8 not null
  );

  insert into t
  select n, mod(n, 5)
  from generate_series(1, 1000000) as t(n)
  order by random();

  create table u (
    x int8 not null primary key,
    y int8 not null
  );
  
  insert into u
  select n, mod(n, 5)
  from generate_series(1, 1000000) as t(n)
  order by random();
  
  analyze s, t, u;

So s has 5 rows while t and u both have 1,000,000. If we want to join them all by their x columns then we could write:

  select *
  from t
  inner join u on t.x = u.x
  inner join s on s.x = u.x

and postgres will consider different join methods and join orders (because inner join is commutative and associative)

Here is what it planned on my machine

QUERY PLAN
Nested Loop  (cost=1.96..4.27 rows=5 width=48) (actual time=0.025..0.071 rows=5 loops=1)
  ->  Merge Join  (cost=1.53..1.88 rows=5 width=32) (actual time=0.021..0.044 rows=5 loops=1)
        Merge Cond: (u.x = s.x)
        ->  Index Scan using u_pkey on u  (cost=0.42..51341.77 rows=1000000 width=16) (actual time=0.005..0.026 rows=6 loops=1)
        ->  Sort  (cost=1.11..1.12 rows=5 width=16) (actual time=0.013..0.013 rows=5 loops=1)
              Sort Key: s.x
              Sort Method: quicksort  Memory: 25kB
              ->  Seq Scan on s  (cost=0.00..1.05 rows=5 width=16) (actual time=0.001..0.002 rows=5 loops=1)
  ->  Index Scan using t_pkey on t  (cost=0.42..0.48 rows=1 width=16) (actual time=0.005..0.005 rows=1 loops=5)
        Index Cond: (x = u.x)
Planning Time: 0.515 ms
Execution Time: 0.103 ms

The statistics postgres keeps on every table helps inform postgres that it can use a merge join to read 6 rows of u's 1,000,000 since there are only 5 rows in s and the primary key btree on u is in u.x order. Postgres expects only 5 rows to result from this join (the statistics indicate that both s.x and u.x are unique). A nested loop with the 5 row outer and an index scan inner is seen as the cheapest way to join to t as index scans are cheap and we are only doing 5 loops.

If I rewrote this to use lateral joins like this (using offset 0 to prevent subquery pullup):

  select *
  from t
  , lateral (
    select *
    from u
    where u.x = t.x
    offset 0
  ) as u,
  lateral (
    select *
    from s
    where u.x = s.x
    offset 0
  ) as s

then I force the join order (t <> u) <> s and the join method to nested loop and get a terrible plan:

QUERY PLAN
Nested Loop  (cost=0.42..9560406.00 rows=1000000 width=48) (actual time=386.602..2168.792 rows=5 loops=1)
  ->  Nested Loop  (cost=0.42..8477906.00 rows=1000000 width=32) (actual time=0.050..1595.613 rows=1000000 loops=1)
        ->  Seq Scan on t  (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.006..50.673 rows=1000000 loops=1)
        ->  Index Scan using u_pkey on u  (cost=0.42..8.44 rows=1 width=16) (actual time=0.001..0.001 rows=1 loops=1000000)
              Index Cond: (x = t.x)
  ->  Seq Scan on s  (cost=0.00..1.06 rows=1 width=16) (actual time=0.000..0.000 rows=0 loops=1000000)
        Filter: (u.x = x)
        Rows Removed by Filter: 5
Planning Time: 0.288 ms
Execution Time: 2168.823 ms

Although all of the joined columns are indexed, doing 2,000,000 index scans is expensive and not necessary.

If I reorder my lateral query to read from s first:

  select *
  from s
  , lateral (
    select *
    from t
    where s.x = t.x
    offset 0
  ) as t,
  lateral (
    select *
    from u
    where u.x = t.x
    offset 0
  ) as u

Then I will read the 5 rows from s , do 5 index scans to join to t, and 5 index scans to join to u, resulting in a much nicer plan:

QUERY PLAN
Nested Loop  (cost=0.85..85.67 rows=5 width=48) (actual time=0.053..0.100 rows=5 loops=1)
  ->  Nested Loop  (cost=0.42..43.36 rows=5 width=32) (actual time=0.039..0.068 rows=5 loops=1)
        ->  Seq Scan on s  (cost=0.00..1.05 rows=5 width=16) (actual time=0.005..0.006 rows=5 loops=1)
        ->  Index Scan using t_pkey on t  (cost=0.42..8.44 rows=1 width=16) (actual time=0.012..0.012 rows=1 loops=5)
              Index Cond: (x = s.x)
  ->  Index Scan using u_pkey on u  (cost=0.42..8.44 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=5)
        Index Cond: (x = t.x)
Planning Time: 0.295 ms
Execution Time: 0.124 ms

So, when writing queries with rel8, if the subquery is not simple (and thus cannot be pulled up) then one should be mindful of the bind order.

@ocharles
Copy link
Contributor

ocharles commented Jul 13, 2021

Edit: I was mistaken, you can ignore my original post - but it's here if you want to see it!

I was ready to say "yea, we should just write documentation", but I think you might have found another query that's not currently expressible by Rel8. I wanted to express

  select *
  from s
  , lateral (
    select *
    from t
    where s.x = t.x
    offset 0
  ) as t,
  lateral (
    select *
    from u
    where u.x = t.x
    offset 0
  ) as u

In Rel8 but without LATERAL references, and wanted SQL like:

select * from t, (select * from u offset 0) u, (select * from s offset 0) as s where t.x = u.x and s.x = u.x;

This has a good plan:

                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.59..29159.65 rows=5 width=48) (actual time=7.064..88.236 rows=5 loops=1)
   ->  Hash Join  (cost=1.16..29157.21 rows=5 width=32) (actual time=7.059..88.215 rows=5 loops=1)
         Hash Cond: (u.x = s.x)
         ->  Seq Scan on u  (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.019..44.101 rows=1000000 loops=1)
         ->  Hash  (cost=1.10..1.10 rows=5 width=16) (actual time=0.004..0.004 rows=5 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  Seq Scan on s  (cost=0.00..1.05 rows=5 width=16) (actual time=0.002..0.002 rows=5 loops=1)
   ->  Index Scan using t_pkey on t  (cost=0.42..0.48 rows=1 width=16) (actual time=0.002..0.003 rows=1 loops=5)
         Index Cond: (x = u.x)
 Planning time: 0.188 ms
 Execution time: 88.255 ms
(11 rows)

However, I don't seem to be able to express this. The obvious

putStrLn $ showQuery $ liftA3 (,,) (offset 0 (each tSchema)) (offset 0 (each sSchema)) (offset 0 (each uSchema)) >>= \(t,s,u) -> where_ (fst u ==. fst t &&. fst u ==. fst s) >> return (t, s, u)

produces

explain analyze SELECT                                                                                                       
CAST("x0_1" AS int8) as "_1/_1",
CAST("y1_1" AS int8) as "_1/_2",
CAST("x0_3" AS int8) as "_2/_1",
CAST("y1_3" AS int8) as "_2/_2",
CAST("x0_6" AS int8) as "_3/_1",
CAST("y1_6" AS int8) as "_3/_2"
FROM (SELECT
      *
      FROM (SELECT
            *
            FROM (SELECT
                  *
                  FROM (SELECT
                        "x" as "x0_1",
                        "y" as "y1_1"
                        FROM "t" as "T1") as "T1") as "T1"
            OFFSET 0) as "T1",
           LATERAL
           (SELECT
            *
            FROM (SELECT
                  *
                  FROM (SELECT
                        "x" as "x0_3",
                        "y" as "y1_3"
                        FROM "s" as "T1") as "T1") as "T1"
            OFFSET 0) as "T2",
           LATERAL
           (SELECT
            *
            FROM (SELECT
                  *
                  FROM (SELECT
                        "x" as "x0_6",
                        "y" as "y1_6"
                        FROM "u" as "T1") as "T1") as "T1"
            OFFSET 0) as "T3"
      WHERE ((("x0_6") = ("x0_1")) AND (("x0_6") = ("x0_3")))) as "T1"

But this doesn't plan as well:

                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=29157.27..58313.32 rows=5 width=48) (actual time=96.610..177.602 rows=5 loops=1)
   Hash Cond: ("T1".x = "T1_1".x)
   ->  Seq Scan on u "T1"  (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.023..44.073 rows=1000000 loops=1)
   ->  Hash  (cost=29157.21..29157.21 rows=5 width=32) (actual time=89.632..89.632 rows=5 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Hash Join  (cost=1.16..29157.21 rows=5 width=32) (actual time=6.532..89.631 rows=5 loops=1)
               Hash Cond: ("T1_1".x = "T1_2".x)
               ->  Seq Scan on t "T1_1"  (cost=0.00..15406.00 rows=1000000 width=16) (actual time=0.007..45.434 rows=1000000 loops=1)
               ->  Hash  (cost=1.10..1.10 rows=5 width=16) (actual time=0.005..0.005 rows=5 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 9kB
                     ->  Seq Scan on s "T1_2"  (cost=0.00..1.05 rows=5 width=16) (actual time=0.002..0.002 rows=5 loops=1)
 Planning time: 0.174 ms
 Execution time: 177.623 ms

I wonder if a custom Applicative instance is in order that doesn't use LATERAL?

I made a mistake in my Rel8 query. I wrote

liftA3 (,,) (offset 0 (each tSchema)) (offset 0 (each sSchema)) (offset 0 (each uSchema))

But the SQL I wanted is actually produced by

liftA3 (,,) (each tSchema) (offset 0 (each sSchema)) (offset 0 (each uSchema))

This does have the correct query plan.

So at this point, I think it's just a case of taking care when writing Rel8 queries, and we should write documentation as to what this care is. Fortunately there's a goldmine of information in this issue, so we can be a lot more precise than before!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants