New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index lesson_completions on student_id #888

Merged
merged 1 commit into from Apr 13, 2018

Conversation

Projects
None yet
2 participants
@gitKrystan
Copy link
Contributor

gitKrystan commented Apr 13, 2018

In exploring performance data on Odin Project's Skylight page, I noticed that this query was not optimized:

SELECT "lesson_completions".* FROM "lesson_completions" WHERE "lesson_completions"."student_id" = ?

While the query itself isn't currently super slow, we noticed that it occurs on every endpoint (You can see it on the Skylight event sequence as "SELECT FROM lesson_completions"), and we noted that as the lesson_completions table grows the query might start to have a bigger impact on performance.

Despite the fact that the compound index does provide some benefit if it is only partially used, Postgres documentation suggests single-column indexes are generally faster (and the query planner is very effective at combining indexes, so they recommend using single-column indexes by default).

We populated the lesson_completions table with 250,000 rows and ran benchmarks on UsersController#show and CoursesController#index using Skylight in development mode and explain.

Before:

image

# User.find(200).lesson_completions.explain

D, [2018-04-13T11:04:30.082675 #63023] DEBUG -- :   LessonCompletion Load (31.9ms)  SELECT "lesson_completions".* FROM "lesson_completions" WHERE "lesson_completions"."student_id" = $1  [["student_id", 200]]
 => EXPLAIN for: SELECT "lesson_completions".* FROM "lesson_completions" WHERE "lesson_completions"."student_id" = $1 [["student_id", 200]]
                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on lesson_completions  (cost=0.00..4749.30 rows=51 width=28)
   Filter: (student_id = 200)
(2 rows)

Note that Postgres appears to be doing a sequential scan on lesson_completions here.

After:

image

# User.find(200).lesson_completions.explain

 D, [2018-04-13T11:06:32.344059 #63023] DEBUG -- :   LessonCompletion Load (0.7ms)  SELECT "lesson_completions".* FROM "lesson_completions" WHERE "lesson_completions"."student_id" = $1  [["student_id", 200]]

 => EXPLAIN for: SELECT "lesson_completions".* FROM "lesson_completions" WHERE "lesson_completions"."student_id" = $1 [["student_id", 200]]
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Scan using index_lesson_completions_on_student_id on lesson_completions  (cost=0.42..9.29 rows=50 width=28)
   Index Cond: (student_id = 200)
(2 rows)

And here we're doing an index scan using the new index.

Notes

We also looked at the lessons table, which looks like a significant source of slow queries on production. Because of the small size of this table, Postgres's query analyzer tends to think that seq scans will be fast than using indexes, so in our tests, altering the indexes on that table had no effect (but we also weren't experiencing the same long query times that are evident in production, so perhaps that database is configured somewhat differently).

CAVEAT: I work at Tilde on Skylight

paired with @zvkemp

@arku arku requested a review from KevinMulhern Apr 13, 2018

@KevinMulhern

This comment has been minimized.

Copy link
Member

KevinMulhern commented Apr 13, 2018

@gitKrystan thats an amazing performance upgrade 😮thank you to you and @zvkemp for this brilliant PR. And thanks for providing all details about the analysis you both carried out with this. It was very informative.

The lesson table is something I've looked at a few times as well, but haven't found a solution to it as of yet. I thought it might have been the amount of requests the course and lesson pages get in comparison to other pages on the site. I was thinking that caching might be the right approach to solve it.

Thanks again to you and @zvkemp for this PR

👍 LGTM

@KevinMulhern KevinMulhern merged commit 18ee784 into TheOdinProject:master Apr 13, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@gitKrystan

This comment has been minimized.

Copy link
Contributor

gitKrystan commented Apr 13, 2018

Glad you liked it @KevinMulhern. We were pretty impressed with what y'all have done thus far with respect to performance and found it difficult to find something to improve. Good work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment