Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cautionary note about dropping preloads - distinct required? #44

Closed
astjohn opened this issue Dec 30, 2017 · 3 comments
Closed

Cautionary note about dropping preloads - distinct required? #44

astjohn opened this issue Dec 30, 2017 · 3 comments

Comments

@astjohn
Copy link

astjohn commented Dec 30, 2017

Thanks for the great library. I was recently building a query and noticed a small quirk.

query = Trip
        |> join(:left, [t], d in assoc(t, :destinations))
        |> join(:left, [t, _d], c in assoc(t, :companies))
        |> join(:left, [t, _d, _c], u in assoc(t, :travellers))
        |> where([_t, _d, c, _u], c.id == ^company.id)
        |> order_by([t, _d, _c, _u], desc: t.start)
        |> preload([_t, d, _c, u], destinations: d, travellers: u)

query
==> #Ecto.Query<from t0 in Radar.Trips.Trip,
 left_join: d in assoc(t0, :destinations),
 left_join: c in assoc(t0, :companies), 
 left_join: t1 in assoc(t0, :travellers),
 where: c.id == ^"91955970-34eb-4a1b-97e6-53e9084abbfa",
 order_by: [desc: t0.start], preload: [travellers: t1, destinations: d]>

query |> Repo.all() |> length()
==> 3424

# to mimic the count query
query |> exclude(:preload) |> exclude(:order_by) |> Repo.all() |> length()
==> 4556

Notice that by excluding the preloads, the count jumps. This is likely because the joins are kept, but the removal of preloads results in duplicate rows during the count query.

Adding distinct removes the issue.

query = Trip
        |> distinct(true)
        |> join(:left, [t], d in assoc(t, :destinations))
        |> join(:left, [t, _d], c in assoc(t, :companies))
        |> join(:left, [t, _d, _c], u in assoc(t, :travellers))
        |> where([_t, _d, c, _u], c.id == ^company.id)
        |> order_by([t, _d, _c, _u], desc: t.start)
        |> preload([_t, d, _c, u], destinations: d, travellers: u)

query |> Repo.all() |> length()
==> 3424

query |> exclude(:preload) |> exclude(:order_by) |> Repo.all() |> length()
==> 3424

Should a small note be made in the readme about it?

@drewolson
Copy link
Owner

Thanks for the detailed explanation. I agree on adding a note in the readme. Would you like to take a pass at it in a PR? If so I'd be happy to merge it.

@astjohn
Copy link
Author

astjohn commented Jan 2, 2018

Gladly. However, our specific use case might be a bit more involved and is more related to proper pagination when joins are involved so that repeated rows are avoided while maintaining the true count.

This seemed to address something similar:
drewolson/scrivener#30

Did it ever make it into the code base here?

The SQL we need to execute is something like the following:

SELECT t.*, d.*, u.* FROM 
(
    SELECT x.* FROM trips AS x
    INNER JOIN trips_companies AS tc ON tc.trip_id = x.id
    INNER JOIN companies AS c ON tc.company_id = c.id    
    WHERE c.id = '91955970-34eb-4a1b-97e6-53e9084abbfa'
    ORDER BY x.start DESC
    LIMIT 100 
    OFFSET 0
) AS t
INNER JOIN destinations AS d ON d.trip_id = t.id
LEFT JOIN trips_travellers AS tt ON tt.trip_id = t.id
LEFT JOIN users AS u ON tt.user_id = u.id
ORDER BY t.start DESC

Any suggestions?

@astjohn
Copy link
Author

astjohn commented Jan 8, 2018

I'll attempt to implement the solution proposed from that old pull request unless a better approach is recommended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants