Optimize search queries for saved entities #2007

isaacsolo · 2021-10-27T15:03:21Z

Description

This change reduces the db calls needed for searches when a user is logged in: for both full searches and autocomplete. Previously, each saved entity such as saved tracks, playlists, albums, or followed users would require an additional db call which could be avoided to optimize performance by around 1.5-2x.

It also fixes a bug with the search results for followed users where followed users was equivalent to users (nothing was filtered).

With this change, saved_tracks is a subset of tracks. A saved track might exist in the saved_tracks result but not in the all tracks result. We're essentially filtering the top X tracks for any tracks that user has saved. The same applies for other entities: playlists, followed users, albums.

In order to avoid saved tracks from being buried under the top X tracks limit, this change adds a similarity score boost for any entity the current user has saved.

Tests

Added a bunch of new tests for autocomplete, internal/external search, different entities.

Ran against a prod snapshot. Compared search results to prod. Ran locust load testing and came up with 1.5x improvement in latency. Tested latency of autocomplete and full searches. Notes updated in Optimize search queries doc.

How will this change be monitored?

isaacsolo · 2021-10-27T17:41:12Z

we missed the followed user bug because a lack of unit tests. i could add some more.

dmanjunath

this looks awesome!

discovery-provider/src/queries/search_queries.py

piazzatron

This a fantastic change!! Just a few minor comments.

I did have one question regarding having saved_tracks as a subset of the track results - I think we discussed this over slack, but it seems like this could result in a scenario where we have a weak match in a saved track, which in today's live implementation would be returned separately in the saved tracks field, but in this proposed change wouldn't score highly enough to make it into our results at all.

I suspect we don't care, but this is a small product change so it may be worth coming up with an example or two to see if this issue manifests, and if so, running it by Forrest over slack to see that we're fine with the change. WDYT @dmanjunath?

piazzatron · 2021-10-29T16:16:30Z

discovery-provider/src/queries/search_queries.py

-                search_str,
-                limit,
-                offset,
-                True,


Since we never set personalized to True anymore (afaict), do we still need this argument present in all of the query functions or could we lose it?

Yeah I'll remove this.

piazzatron · 2021-10-29T21:42:53Z

discovery-provider/src/queries/search_queries.py

@@ -401,33 +358,34 @@ def submit_and_add(search_type):

            if searchKind in [SearchKind.all, SearchKind.tracks]:
                submit_and_add("tracks")


Not that it really matters, but we can lower the max_workers in in the threadpool now, since we only ever perform 4 requests in parallel.

Ooh that's why it was set at 8. Yup I'll change it.

dmanjunath

this looks good to me but i don't feel like i have enough context to appropriately review this PR. will defer to piazza and joe

also can we get this on one of the sandbox nodes so we can test with it?

dmanjunath · 2021-11-04T14:35:18Z

discovery-provider/src/queries/search_queries.py

    ).fetchall()

    # track_ids is list of tuples - simplify to 1-D list
    track_ids = [i[0] for i in track_data]
+    saved_tracks = set([i[0] for i in track_data if i[3]]) # if track has user ID, the current user saved that track


we should pull out these indices into named variables so it's clear what they are

isaacsolo · 2021-11-04T15:58:23Z

this looks good to me but i don't feel like i have enough context to appropriately review this PR. will defer to piazza and joe

also can we get this on one of the sandbox nodes so we can test with it?

Yup it's already on sandbox 2. I'm keeping that updated for testing.

isaacsolo added 2 commits October 27, 2021 15:02

Optimize search queries for saved entities

79cb016

fix unit test

4b8ffdd

isaacsolo requested a review from dmanjunath October 27, 2021 15:27

isaacsolo marked this pull request as ready for review October 27, 2021 15:28

raymondjacobson assigned piazzatron Oct 27, 2021

fix unit test

5d31f2d

dmanjunath approved these changes Oct 27, 2021

View reviewed changes

discovery-provider/src/queries/search_queries.py Show resolved Hide resolved

isaacsolo added 2 commits October 27, 2021 20:30

clean up

bd141a0

Add unit tests

2e59c4e

isaacsolo force-pushed the is-improve-search-saved branch from 109cb32 to 2e59c4e Compare October 28, 2021 17:45

isaacsolo requested a review from piazzatron October 28, 2021 17:59

piazzatron approved these changes Oct 29, 2021

View reviewed changes

Clean up args

975db1a

isaacsolo force-pushed the is-improve-search-saved branch from f70dd4e to 975db1a Compare October 29, 2021 23:57

isaacsolo added 2 commits November 1, 2021 20:03

fix autocomplete metadata

803c5b2

wip

aec68c6

isaacsolo force-pushed the is-improve-search-saved branch 2 times, most recently from 4deb491 to 6d6f801 Compare November 2, 2021 20:52

Add boost for saved entities

27b961f

isaacsolo force-pushed the is-improve-search-saved branch 2 times, most recently from f450fc9 to 94e7361 Compare November 2, 2021 22:48

fix follower query

d38ca94

isaacsolo force-pushed the is-improve-search-saved branch from 94e7361 to d38ca94 Compare November 2, 2021 22:55

isaacsolo assigned raymondjacobson Nov 2, 2021

isaacsolo requested review from piazzatron and dmanjunath November 3, 2021 17:47

Rename user id to saved user id

da0d467

isaacsolo force-pushed the is-improve-search-saved branch from 603f3e9 to da0d467 Compare November 3, 2021 17:49

dmanjunath reviewed Nov 4, 2021

View reviewed changes

isaacsolo added 2 commits November 4, 2021 18:15

Add col names

c242e28

Merge branch 'master' into is-improve-search-saved

b8306ee

isaacsolo requested a review from jowlee November 4, 2021 18:32

jowlee approved these changes Nov 10, 2021

View reviewed changes

isaacsolo added 2 commits November 10, 2021 15:15

Merge branch 'master' into is-improve-search-saved

ac920c1

Merge branch 'master' into is-improve-search-saved

6f5d9d1

raymondjacobson approved these changes Nov 10, 2021

View reviewed changes

isaacsolo merged commit 7989bbc into master Nov 11, 2021

isaacsolo deleted the is-improve-search-saved branch November 11, 2021 00:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize search queries for saved entities #2007

Optimize search queries for saved entities #2007

isaacsolo commented Oct 27, 2021 •

edited

isaacsolo commented Oct 27, 2021 •

edited

dmanjunath left a comment

piazzatron left a comment

piazzatron Oct 29, 2021

isaacsolo Oct 29, 2021

piazzatron Oct 29, 2021

isaacsolo Oct 29, 2021

dmanjunath left a comment

dmanjunath Nov 4, 2021

isaacsolo Nov 4, 2021

isaacsolo commented Nov 4, 2021

		@@ -401,33 +358,34 @@ def submit_and_add(search_type):

		if searchKind in [SearchKind.all, SearchKind.tracks]:
		submit_and_add("tracks")

Optimize search queries for saved entities #2007

Optimize search queries for saved entities #2007

Conversation

isaacsolo commented Oct 27, 2021 • edited

Description

Tests

How will this change be monitored?

isaacsolo commented Oct 27, 2021 • edited

dmanjunath left a comment

Choose a reason for hiding this comment

piazzatron left a comment

Choose a reason for hiding this comment

piazzatron Oct 29, 2021

Choose a reason for hiding this comment

isaacsolo Oct 29, 2021

Choose a reason for hiding this comment

piazzatron Oct 29, 2021

Choose a reason for hiding this comment

isaacsolo Oct 29, 2021

Choose a reason for hiding this comment

dmanjunath left a comment

Choose a reason for hiding this comment

dmanjunath Nov 4, 2021

Choose a reason for hiding this comment

isaacsolo Nov 4, 2021

Choose a reason for hiding this comment

isaacsolo commented Nov 4, 2021

isaacsolo commented Oct 27, 2021 •

edited

isaacsolo commented Oct 27, 2021 •

edited