Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Index lesson_completions on student_id #888
In exploring performance data on Odin Project's Skylight page, I noticed that this query was not optimized:
SELECT "lesson_completions".* FROM "lesson_completions" WHERE "lesson_completions"."student_id" = ?
While the query itself isn't currently super slow, we noticed that it occurs on every endpoint (You can see it on the Skylight event sequence as "SELECT FROM lesson_completions"), and we noted that as the lesson_completions table grows the query might start to have a bigger impact on performance.
Despite the fact that the compound index does provide some benefit if it is only partially used, Postgres documentation suggests single-column indexes are generally faster (and the query planner is very effective at combining indexes, so they recommend using single-column indexes by default).
We populated the
# User.find(200).lesson_completions.explain D, [2018-04-13T11:04:30.082675 #63023] DEBUG -- : LessonCompletion Load (31.9ms) SELECT "lesson_completions".* FROM "lesson_completions" WHERE "lesson_completions"."student_id" = $1 [["student_id", 200]] => EXPLAIN for: SELECT "lesson_completions".* FROM "lesson_completions" WHERE "lesson_completions"."student_id" = $1 [["student_id", 200]] QUERY PLAN ----------------------------------------------------------------------- Seq Scan on lesson_completions (cost=0.00..4749.30 rows=51 width=28) Filter: (student_id = 200) (2 rows)
Note that Postgres appears to be doing a sequential scan on lesson_completions here.
And here we're doing an index scan using the new index.
We also looked at the lessons table, which looks like a significant source of slow queries on production. Because of the small size of this table, Postgres's query analyzer tends to think that seq scans will be fast than using indexes, so in our tests, altering the indexes on that table had no effect (but we also weren't experiencing the same long query times that are evident in production, so perhaps that database is configured somewhat differently).
CAVEAT: I work at Tilde on Skylight
paired with @zvkemp
@gitKrystan thats an amazing performance upgrade
The lesson table is something I've looked at a few times as well, but haven't found a solution to it as of yet. I thought it might have been the amount of requests the course and lesson pages get in comparison to other pages on the site. I was thinking that caching might be the right approach to solve it.
Thanks again to you and @zvkemp for this PR