-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize search queries for saved entities #2007
Conversation
we missed the followed user bug because a lack of unit tests. i could add some more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks awesome!
109cb32
to
2e59c4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This a fantastic change!! Just a few minor comments.
I did have one question regarding having saved_tracks as a subset of the track results - I think we discussed this over slack, but it seems like this could result in a scenario where we have a weak match in a saved track, which in today's live implementation would be returned separately in the saved tracks field, but in this proposed change wouldn't score highly enough to make it into our results at all.
I suspect we don't care, but this is a small product change so it may be worth coming up with an example or two to see if this issue manifests, and if so, running it by Forrest over slack to see that we're fine with the change. WDYT @dmanjunath?
search_str, | ||
limit, | ||
offset, | ||
True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we never set personalized
to True anymore (afaict), do we still need this argument present in all of the query functions or could we lose it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'll remove this.
@@ -401,33 +358,34 @@ def submit_and_add(search_type): | |||
|
|||
if searchKind in [SearchKind.all, SearchKind.tracks]: | |||
submit_and_add("tracks") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that it really matters, but we can lower the max_workers in in the threadpool now, since we only ever perform 4 requests in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh that's why it was set at 8. Yup I'll change it.
f70dd4e
to
975db1a
Compare
4deb491
to
6d6f801
Compare
f450fc9
to
94e7361
Compare
94e7361
to
d38ca94
Compare
603f3e9
to
da0d467
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good to me but i don't feel like i have enough context to appropriately review this PR. will defer to piazza and joe
also can we get this on one of the sandbox nodes so we can test with it?
).fetchall() | ||
|
||
# track_ids is list of tuples - simplify to 1-D list | ||
track_ids = [i[0] for i in track_data] | ||
saved_tracks = set([i[0] for i in track_data if i[3]]) # if track has user ID, the current user saved that track |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should pull out these indices into named variables so it's clear what they are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will do
Yup it's already on sandbox 2. I'm keeping that updated for testing. |
Description
This change reduces the db calls needed for searches when a user is logged in: for both full searches and autocomplete. Previously, each saved entity such as saved tracks, playlists, albums, or followed users would require an additional db call which could be avoided to optimize performance by around 1.5-2x.
It also fixes a bug with the search results for followed users where followed users was equivalent to users (nothing was filtered).
With this change, saved_tracks is a subset of tracks. A saved track might exist in the saved_tracks result but not in the all tracks result. We're essentially filtering the top X tracks for any tracks that user has saved. The same applies for other entities: playlists, followed users, albums.
In order to avoid saved tracks from being buried under the top X tracks limit, this change adds a similarity score boost for any entity the current user has saved.
Tests
Added a bunch of new tests for autocomplete, internal/external search, different entities.
Ran against a prod snapshot. Compared search results to prod. Ran locust load testing and came up with 1.5x improvement in latency. Tested latency of autocomplete and full searches. Notes updated in Optimize search queries doc.
How will this change be monitored?