Skip to content

Fix performance of including relationships#1800

Merged
ricardograca merged 2 commits into
bookshelf:masterfrom
alechirsch:master
Apr 12, 2018
Merged

Fix performance of including relationships#1800
ricardograca merged 2 commits into
bookshelf:masterfrom
alechirsch:master

Conversation

@alechirsch

Copy link
Copy Markdown
Contributor

Including a relationship causes very slow performance. I first noticed this when including a relationship that had a .through, It was sometimes taking over 2 seconds to return from bookshelf. I did some digging and found that one of the queries that was supposed to return only 2 results, returned over 4k results. All of the 4k results were duplicate rows. Here is an example of a query that was generated for the following relationship:

// relationship
	supplier(){
		return this.belongsTo(Models.Supplier).through(Models.ClientSupplier, 'client_supplier_id');
	},

// query
select "suppliers".*, "clients_suppliers"."id" as "_pivot_id", "clients_suppliers"."supplier_id" as "_pivot_supplier_id"
from "suppliers" inner join "clients_suppliers" on "clients_suppliers"."supplier_id" = "suppliers"."id"
inner join "invoices" on "clients_suppliers"."id" = "invoices"."clients_supplier_id"
where "invoices"."client_supplier_id" in (17, 18)

Since my invoices table had thousands of rows, this query returns many duplicates of the same supplier.

Making the request use the distinct keyword ensures that the result only includes the desired results without duplicates.

@alechirsch alechirsch changed the title fixed issue where relation query returns thousands of duplicate rows Fix performance of including relationships Mar 26, 2018

@ricardograca ricardograca left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find! However the tests must pass before this gets merged. Can you take a look at it? It's probably just a matter of changing the expected query.

@alechirsch

alechirsch commented Mar 27, 2018

Copy link
Copy Markdown
Contributor Author

@ricardograca Will do, I just got my environment all set up for testing

@alechirsch

Copy link
Copy Markdown
Contributor Author

@ricardograca All of the tests pass on my local machine, have you ever run into issues where tests differ between machines?

@alechirsch

Copy link
Copy Markdown
Contributor Author

screen shot 2018-03-27 at 9 43 31 am

@ricardograca

ricardograca commented Mar 27, 2018

Copy link
Copy Markdown
Member

That's weird. Looking at the failed test it seems like the two expected objects in the response array have their order swapped, even though they are correct individually. I've never seen this particular error before. Can you check if there could be some kind of race condition here? It's important that any async tests always return promises or call done() when done.

Also, what Node version are you using?

@alechirsch

Copy link
Copy Markdown
Contributor Author

I am using Node 8.9.2

@ricardograca

Copy link
Copy Markdown
Member

Looking at the test more closely it's a bit weak since there shouldn't be any requirement (or guarantee) that the columns are retrieved in any particular order. I'll ignore that failed test, but I still need to review this more carefully to make sure no new bugs or breaking changes are introduced and I don't have time for that right now.

@alechirsch

alechirsch commented Mar 27, 2018

Copy link
Copy Markdown
Contributor Author

It is very strange, the test that is failing here is passing consistently on my end.

@ricardograca ricardograca left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm merging this even with the tests failing because they only fail for PostgreSQL and only because the order of the expected results isn't as expected, but there's also no ORDER BY clause on the queries, so some randomness isn't too surprising.

The issue of failing tests will be fixed with a later PR.

@jwhitmarsh

Copy link
Copy Markdown

Apologies if this is not the place to put this - thank you for this great fix, but one (rare) GOTCHA is that case statements in an order statement stop working now that all relationships are called with the distinct keyword. It's taken me a while to work out what was causing my errors so just wanted to put this out there.

My solution was to create a "computed" column using case in the select and to apply the order to that. Again, just in case someone in the future is experiencing the same thing.

select * 
from table 
order by 
    case when col1 is not null then col1 else col2 end

becomes

select *, 
case when col1 is not null then col1 else col2 end as order_col 
from table
order by order_col

(obviously all achieved via knex, not raw sql)

@ricardograca

Copy link
Copy Markdown
Member

@jwhitmarsh Can you share your actual code solution to the issue you describe?

@jwhitmarsh

Copy link
Copy Markdown

No probs. In the instance that I'm fixing it it's a fairly complicated query, but this is the key bit:

query.column([
    'sections.*', 
    this.database.knex.raw(`
        case when section_order > 0 then section_order else section_order_draft end as order_col
    `)
]);

query.orderByRaw(`order_col`);

for reference, this used to look like:

// no need to define query.columns()

query.orderByRaw( 
    `case when section_order > 0 then section_order end, case when section_order < 0 then section_order_draft end`
)

The "fixed" way is actually much neater and, like I said, I only wanted to post this in case someone stumbled across it in the future (relevant XKCD https://xkcd.com/979/)

@juliusza

juliusza commented Feb 18, 2019

Copy link
Copy Markdown

The extra distinct table.* causes problems with JSON fields.

"could not identify an equality operator for type json"

See #1941

@juliusza

Copy link
Copy Markdown

@ricardogama do you think this change is adequate?

It seems to me that this particular query can be optimized with .query(). Or add DISTINCT only to queries generated by .through().
Surely there is no reason to add DISTINCT to every single select query ?

See also #1941

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants