New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance of including relationships #1800

Merged
merged 2 commits into from Apr 12, 2018

Conversation

Projects
3 participants
@alechirsch
Copy link
Contributor

alechirsch commented Mar 26, 2018

Including a relationship causes very slow performance. I first noticed this when including a relationship that had a .through, It was sometimes taking over 2 seconds to return from bookshelf. I did some digging and found that one of the queries that was supposed to return only 2 results, returned over 4k results. All of the 4k results were duplicate rows. Here is an example of a query that was generated for the following relationship:

// relationship
	supplier(){
		return this.belongsTo(Models.Supplier).through(Models.ClientSupplier, 'client_supplier_id');
	},

// query
select "suppliers".*, "clients_suppliers"."id" as "_pivot_id", "clients_suppliers"."supplier_id" as "_pivot_supplier_id"
from "suppliers" inner join "clients_suppliers" on "clients_suppliers"."supplier_id" = "suppliers"."id"
inner join "invoices" on "clients_suppliers"."id" = "invoices"."clients_supplier_id"
where "invoices"."client_supplier_id" in (17, 18)

Since my invoices table had thousands of rows, this query returns many duplicates of the same supplier.

Making the request use the distinct keyword ensures that the result only includes the desired results without duplicates.

@alechirsch alechirsch changed the title fixed issue where relation query returns thousands of duplicate rows Fix performance of including relationships Mar 26, 2018

@alechirsch alechirsch referenced this pull request Mar 26, 2018

Open

Performance concern #1774

@ricardograca
Copy link
Member

ricardograca left a comment

Great find! However the tests must pass before this gets merged. Can you take a look at it? It's probably just a matter of changing the expected query.

@ricardograca ricardograca added this to To Do in Version 0.14.0 via automation Mar 27, 2018

@ricardograca ricardograca moved this from To Do to In progress in Version 0.14.0 Mar 27, 2018

@alechirsch

This comment has been minimized.

Copy link
Contributor

alechirsch commented Mar 27, 2018

@ricardograca Will do, I just got my environment all set up for testing

@alechirsch

This comment has been minimized.

Copy link
Contributor

alechirsch commented Mar 27, 2018

@ricardograca All of the tests pass on my local machine, have you ever run into issues where tests differ between machines?

@alechirsch

This comment has been minimized.

Copy link
Contributor

alechirsch commented Mar 27, 2018

screen shot 2018-03-27 at 9 43 31 am

@ricardograca

This comment has been minimized.

Copy link
Member

ricardograca commented Mar 27, 2018

That's weird. Looking at the failed test it seems like the two expected objects in the response array have their order swapped, even though they are correct individually. I've never seen this particular error before. Can you check if there could be some kind of race condition here? It's important that any async tests always return promises or call done() when done.

Also, what Node version are you using?

@alechirsch

This comment has been minimized.

Copy link
Contributor

alechirsch commented Mar 27, 2018

I am using Node 8.9.2

@ricardograca

This comment has been minimized.

Copy link
Member

ricardograca commented Mar 27, 2018

Looking at the test more closely it's a bit weak since there shouldn't be any requirement (or guarantee) that the columns are retrieved in any particular order. I'll ignore that failed test, but I still need to review this more carefully to make sure no new bugs or breaking changes are introduced and I don't have time for that right now.

@alechirsch

This comment has been minimized.

Copy link
Contributor

alechirsch commented Mar 27, 2018

It is very strange, the test that is failing here is passing consistently on my end.

@ricardograca
Copy link
Member

ricardograca left a comment

I'm merging this even with the tests failing because they only fail for PostgreSQL and only because the order of the expected results isn't as expected, but there's also no ORDER BY clause on the queries, so some randomness isn't too surprising.

The issue of failing tests will be fixed with a later PR.

@ricardograca ricardograca merged commit 880f3bf into bookshelf:master Apr 12, 2018

1 check failed

continuous-integration/travis-ci/pr The Travis CI build failed
Details

Version 0.14.0 automation moved this from In progress to Done Apr 12, 2018

@jwhitmarsh

This comment has been minimized.

Copy link

jwhitmarsh commented Jan 15, 2019

Apologies if this is not the place to put this - thank you for this great fix, but one (rare) GOTCHA is that case statements in an order statement stop working now that all relationships are called with the distinct keyword. It's taken me a while to work out what was causing my errors so just wanted to put this out there.

My solution was to create a "computed" column using case in the select and to apply the order to that. Again, just in case someone in the future is experiencing the same thing.

select * 
from table 
order by 
    case when col1 is not null then col1 else col2 end

becomes

select *, 
case when col1 is not null then col1 else col2 end as order_col 
from table
order by order_col

(obviously all achieved via knex, not raw sql)

@ricardograca

This comment has been minimized.

Copy link
Member

ricardograca commented Jan 15, 2019

@jwhitmarsh Can you share your actual code solution to the issue you describe?

@jwhitmarsh

This comment has been minimized.

Copy link

jwhitmarsh commented Jan 15, 2019

No probs. In the instance that I'm fixing it it's a fairly complicated query, but this is the key bit:

query.column([
    'sections.*', 
    this.database.knex.raw(`
        case when section_order > 0 then section_order else section_order_draft end as order_col
    `)
]);

query.orderByRaw(`order_col`);

for reference, this used to look like:

// no need to define query.columns()

query.orderByRaw( 
    `case when section_order > 0 then section_order end, case when section_order < 0 then section_order_draft end`
)

The "fixed" way is actually much neater and, like I said, I only wanted to post this in case someone stumbled across it in the future (relevant XKCD https://xkcd.com/979/)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment