Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize search queries for saved entities #2007

Merged
merged 15 commits into from
Nov 11, 2021
Merged

Conversation

isaacsolo
Copy link
Contributor

@isaacsolo isaacsolo commented Oct 27, 2021

Description

This change reduces the db calls needed for searches when a user is logged in: for both full searches and autocomplete. Previously, each saved entity such as saved tracks, playlists, albums, or followed users would require an additional db call which could be avoided to optimize performance by around 1.5-2x.

It also fixes a bug with the search results for followed users where followed users was equivalent to users (nothing was filtered).

With this change, saved_tracks is a subset of tracks. A saved track might exist in the saved_tracks result but not in the all tracks result. We're essentially filtering the top X tracks for any tracks that user has saved. The same applies for other entities: playlists, followed users, albums.

In order to avoid saved tracks from being buried under the top X tracks limit, this change adds a similarity score boost for any entity the current user has saved.

Tests

Added a bunch of new tests for autocomplete, internal/external search, different entities.

Ran against a prod snapshot. Compared search results to prod. Ran locust load testing and came up with 1.5x improvement in latency. Tested latency of autocomplete and full searches. Notes updated in Optimize search queries doc.

How will this change be monitored?

@isaacsolo isaacsolo marked this pull request as ready for review October 27, 2021 15:28
@isaacsolo
Copy link
Contributor Author

isaacsolo commented Oct 27, 2021

we missed the followed user bug because a lack of unit tests. i could add some more.

Copy link
Contributor

@dmanjunath dmanjunath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks awesome!

Copy link
Contributor

@piazzatron piazzatron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a fantastic change!! Just a few minor comments.

I did have one question regarding having saved_tracks as a subset of the track results - I think we discussed this over slack, but it seems like this could result in a scenario where we have a weak match in a saved track, which in today's live implementation would be returned separately in the saved tracks field, but in this proposed change wouldn't score highly enough to make it into our results at all.

I suspect we don't care, but this is a small product change so it may be worth coming up with an example or two to see if this issue manifests, and if so, running it by Forrest over slack to see that we're fine with the change. WDYT @dmanjunath?

search_str,
limit,
offset,
True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we never set personalized to True anymore (afaict), do we still need this argument present in all of the query functions or could we lose it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll remove this.

@@ -401,33 +358,34 @@ def submit_and_add(search_type):

if searchKind in [SearchKind.all, SearchKind.tracks]:
submit_and_add("tracks")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it really matters, but we can lower the max_workers in in the threadpool now, since we only ever perform 4 requests in parallel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh that's why it was set at 8. Yup I'll change it.

@isaacsolo isaacsolo force-pushed the is-improve-search-saved branch 2 times, most recently from 4deb491 to 6d6f801 Compare November 2, 2021 20:52
@isaacsolo isaacsolo force-pushed the is-improve-search-saved branch 2 times, most recently from f450fc9 to 94e7361 Compare November 2, 2021 22:48
Copy link
Contributor

@dmanjunath dmanjunath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me but i don't feel like i have enough context to appropriately review this PR. will defer to piazza and joe

also can we get this on one of the sandbox nodes so we can test with it?

).fetchall()

# track_ids is list of tuples - simplify to 1-D list
track_ids = [i[0] for i in track_data]
saved_tracks = set([i[0] for i in track_data if i[3]]) # if track has user ID, the current user saved that track
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should pull out these indices into named variables so it's clear what they are

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will do

@isaacsolo
Copy link
Contributor Author

this looks good to me but i don't feel like i have enough context to appropriately review this PR. will defer to piazza and joe

also can we get this on one of the sandbox nodes so we can test with it?

Yup it's already on sandbox 2. I'm keeping that updated for testing.

@isaacsolo isaacsolo requested a review from jowlee November 4, 2021 18:32
@isaacsolo isaacsolo merged commit 7989bbc into master Nov 11, 2021
@isaacsolo isaacsolo deleted the is-improve-search-saved branch November 11, 2021 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants