Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing Sequelize issue using duplicating=false on Generator.Collectors scope definition. #921

Merged
merged 5 commits into from Jul 9, 2018

Conversation

cassiodias
Copy link
Contributor

@cassiodias cassiodias commented Jul 6, 2018

According to Sequelize team, when limits are set they enforce the use of subqueries and JOINed tables (FK) are missing in sub-query in the generated SQL (issues opened since 2015).

Apparently, the Sequelize PR (2018 March) solved, but unfortunately, we were still having the issue.

I have created a previous Refocus PR forcing the FK to be in the subquery, however, it also required a (few changes in order to avoid those FK to be returned to the client)[https://github.com//pull/918/files/bf29c7c4dfb881e324aaac892ff47ac3be2c9ac0].

So I've got a reading on Sequelize (issues/code) and found the attribute duplicating=false which tells to sequelize to never return a duplication and saw many users saying that they could solve the issue.

I have added the flag in Generator.User and Generator.Collector, but seems just adding to collector it solves the problem.

Generator.addScope('collectors', {
      include: [
        {
          duplicating: false, -- Added here!
          association: assoc.collectors,

As you can see below we have the generated SQL before and now. They are the same, the difference is that now we are enforcing to return all the attributes from the sub-query.

The application still the same and there is no any extra "remove from response" back to the client API.

-- BEFORE: without duplicating, sub query doesn't return createdBy (User fk)

FROM 
    (
        SELECT "Generator"."name", "Generator"."id"
          FROM "Generators" AS "Generator"
         WHERE ("Generator"."deletedAt" > '2018-07-06 13:34:11.276 +00:00' OR "Generator"."deletedAt" IS NULL) 
         ORDER BY "Generator"."name" LIMIT 10000 OFFSET 0
    ) AS "Generator" 
      LEFT OUTER JOIN "Users" AS "user" 
        ON "Generator"."createdBy" = "user"."id"
        AND ("user"."deletedAt" > '2018-06-25 14:56:34.957 +00:00' OR "user"."deletedAt" IS NULL) 
    LEFT OUTER JOIN "Profiles" AS "user->profile" 
        ON "user"."profileId" = "user->profile"."id" 
        AND ("user->profile"."deletedAt" > '2018-06-25 14:56:34.957 +00:00' OR "user->profile"."deletedAt" IS NULL) 
    LEFT OUTER JOIN ( 
        "GeneratorCollectors" AS "collectors->GeneratorCollectors" 
        INNER JOIN "Collectors" AS "collectors" 
        ON "collectors"."id" = "collectors->GeneratorCollectors"."collectorId"
    ) 
    ON "Generator"."id" = "collectors->GeneratorCollectors"."generatorId" 
    AND ("collectors"."deletedAt" > '2018-06-25 14:56:34.957 +00:00' OR "collectors"."deletedAt" IS NULL)
ORDER BY "Generator"."name";

-- THEN: With duplicating

FROM 
	(
		SELECT "Generator"."description", "Generator"."helpEmail", "Generator"."helpUrl", "Generator"."id", "Generator"."intervalSecs", 
		"Generator"."isDeleted", "Generator"."isActive", "Generator"."name", "Generator"."subjectQuery", "Generator"."subjects", 
		"Generator"."aspects", "Generator"."tags", "Generator"."generatorTemplate", "Generator"."connection", "Generator"."context", 
		"Generator"."currentCollector", "Generator"."createdAt", "Generator"."updatedAt", "Generator"."deletedAt", "Generator"."createdBy" 
		FROM "Generators" AS "Generator" 
		WHERE ("Generator"."deletedAt" > '2018-07-06 13:34:11.276 +00:00' OR "Generator"."deletedAt" IS NULL) 
		ORDER BY "Generator"."name" LIMIT 10000 OFFSET 0
	) AS "Generator" 
	LEFT OUTER JOIN "Users" AS "user" 
		ON "Generator"."createdBy" = "user"."id" 
		AND ("user"."deletedAt" > '2018-07-06 13:34:11.276 +00:00' OR "user"."deletedAt" IS NULL) 
	LEFT OUTER JOIN "Profiles" AS "user->profile" 
		ON "user"."profileId" = "user->profile"."id" 
		AND ("user->profile"."deletedAt" > '2018-07-06 13:34:11.276 +00:00' OR "user->profile"."deletedAt" IS NULL)
	LEFT OUTER JOIN (
		"GeneratorCollectors" AS "collectors->GeneratorCollectors" 
		INNER JOIN "Collectors" AS "collectors" 
		ON "collectors"."id" = "collectors->GeneratorCollectors"."collectorId"
	) 
	ON "Generator"."id" = "collectors->GeneratorCollectors"."generatorId" 
	AND ("collectors"."deletedAt" > '2018-07-06 13:34:11.276 +00:00' OR "collectors"."deletedAt" IS NULL) 
 ORDER BY "Generator"."name";

Sources:
sequelize/sequelize#3007
sequelize/sequelize#6073
https://www.thelacunablog.com/5-things-learned-sequelize.html
sequelize/sequelize#222

… to return all attributes when there is limit/offset
@jgraff2
Copy link
Contributor

jgraff2 commented Jul 6, 2018

Awesome! This is much cleaner than adding and removing the keys!
Can you add a comment explaining why we are using the "duplicating" option?
Are there any side effects of doing this that could cause other problems?

Copy link
Contributor

@iamigo iamigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Jon - pls. add comment here describing why we're using this flag, etc.

@cassiodias
Copy link
Contributor Author

@jgraff2 @iamigo

It's hard to say with 100% of sure that there is no side effect.

I have debugged Sequelize last Friday and today trying to make sure that there is no side effect - all the tests are working sucessfully and when we run multiple associations with LIMIT (the corner case) the generated query is the same (same joins) but without subQuery (check below).

The flag duplicating avoids cartesian product and it's true by default.

Sequelize code that does that:

     // sequelize/model.js
     if (options.topModel === options.model && options.subQuery === undefined && options.topLimit) {
        if (include.subQuery) {
          options.subQuery = include.subQuery;
        } else if (include.hasDuplicating) {
          options.subQuery = true; // Its getting here because has LIMIT, there is no sub-query and has duplicating.
        }
      }

The combination of LIMIT + duplicating = true generates the sub-query.

Query without sub-query (duplicating:false):

SELECT 
"Generator"."name", 
"Generator"."id", 
"collectors"."id" AS "collectors.id", 
"collectors"."name" AS "collectors.name", 
"collectors"."registered" AS "collectors.registered",
"collectors"."status" AS "collectors.status", 
"collectors"."lastHeartbeat" AS "collectors.lastHeartbeat", 
"collectors"."isDeleted" AS "collectors.isDeleted", 
"collectors"."createdAt" AS "collectors.createdAt", 
"collectors"."updatedAt" AS "collectors.updatedAt", 
"collectors->GeneratorCollectors"."createdAt" AS "collectors.GeneratorCollectors.createdAt",
"collectors->GeneratorCollectors"."updatedAt" AS "collectors.GeneratorCollectors.updatedAt",
"collectors->GeneratorCollectors"."collectorId" AS "collectors.GeneratorCollectors.collectorId",
"collectors->GeneratorCollectors"."generatorId" AS "collectors.GeneratorCollectors.generatorId",
"user"."id" AS "user.id", 
"user"."name" AS "user.name",
"user"."email" AS "user.email",
"user"."fullName" AS "user.fullName",
"user->profile"."id" AS "user.profile.id",
"user->profile"."name" AS "user.profile.name" 
FROM "Generators" AS "Generator" 
LEFT OUTER JOIN ( 
	"GeneratorCollectors" AS "collectors->GeneratorCollectors" 
	INNER JOIN "Collectors" AS "collectors" ON "collectors"."id" = "collectors->GeneratorCollectors"."collectorId"
	) ON "Generator"."id" = "collectors->GeneratorCollectors"."generatorId" 
AND ("collectors"."deletedAt" > '2018-07-09 14:39:18.034 +00:00' OR "collectors"."deletedAt" IS NULL) 
LEFT OUTER JOIN "Users" AS "user" ON "Generator"."createdBy" = "user"."id" 
AND ("user"."deletedAt" > '2018-07-09 14:39:18.034 +00:00' OR "user"."deletedAt" IS NULL) 
LEFT OUTER JOIN "Profiles" AS "user->profile" ON "user"."profileId" = "user->profile"."id" 
AND ("user->profile"."deletedAt" > '2018-07-09 14:39:18.034 +00:00' OR "user->profile"."deletedAt" IS NULL) 
WHERE ("Generator"."deletedAt" > '2018-07-09 14:39:18.034 +00:00' OR "Generator"."deletedAt" IS NULL) 
ORDER BY "Generator"."name" LIMIT 10000 OFFSET 0

Query with sub-query

SELECT "Generator".*, 
 "collectors"."id" AS "collectors.id",
 "collectors"."name" AS "collectors.name", 
 "collectors"."registered" AS "collectors.registered",
 "collectors"."status" AS "collectors.status", 
 "collectors"."lastHeartbeat" AS "collectors.lastHeartbeat",
 "collectors"."isDeleted" AS "collectors.isDeleted", 
 "collectors"."createdAt" AS "collectors.createdAt",
 "collectors"."updatedAt" AS "collectors.updatedAt", 
 "collectors->GeneratorCollectors"."createdAt" AS "collectors.GeneratorCollectors.createdAt", 
 "collectors->GeneratorCollectors"."updatedAt" AS "collectors.GeneratorCollectors.updatedAt", 
 "collectors->GeneratorCollectors"."collectorId" AS "collectors.GeneratorCollectors.collectorId", 
 "collectors->GeneratorCollectors"."generatorId" AS "collectors.GeneratorCollectors.generatorId", "user"."id" AS "user.id",
 "user"."name" AS "user.name", "user"."email" AS "user.email", "user"."fullName" AS "user.fullName", 
 "user->profile"."id" AS "user.profile.id", "user->profile"."name" AS "user.profile.name" 
 FROM (
	 SELECT "Generator"."name", "Generator"."id" 
	 FROM "Generators" AS "Generator" 
	 WHERE ("Generator"."deletedAt" > '2018-07-09 14:41:14.274 +00:00' OR "Generator"."deletedAt" IS NULL) 
	 ORDER BY "Generator"."name" LIMIT 10000 OFFSET 0
	 ) AS "Generator" 
LEFT OUTER JOIN ( 
	"GeneratorCollectors" AS "collectors->GeneratorCollectors" 
	INNER JOIN "Collectors" AS "collectors" ON "collectors"."id" = "collectors->GeneratorCollectors"."collectorId"
	) ON "Generator"."id" = "collectors->GeneratorCollectors"."generatorId" 
AND ("collectors"."deletedAt" > '2018-07-09 14:41:14.274 +00:00' OR "collectors"."deletedAt" IS NULL) 
LEFT OUTER JOIN "Users" AS "user" ON **"Generator"."createdBy"** = "user"."id" 
AND ("user"."deletedAt" > '2018-07-09 14:41:14.274 +00:00' OR "user"."deletedAt" IS NULL) 
LEFT OUTER JOIN "Profiles" AS "user->profile" ON "user"."profileId" = "user->profile"."id" 
AND ("user->profile"."deletedAt" > '2018-07-09 14:41:14.274 +00:00' OR "user->profile"."deletedAt" IS NULL) 
ORDER BY "Generator"."name"

@coveralls
Copy link

coveralls commented Jul 9, 2018

Coverage Status

Coverage decreased (-0.03%) to 89.002% when pulling 35e4086 on sequelize_generator_multiple_association into 22a0ba5 on master.

@jgraff2 jgraff2 merged commit 604cfaa into master Jul 9, 2018
@jgraff2 jgraff2 deleted the sequelize_generator_multiple_association branch July 9, 2018 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants