Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search on es #3093

Merged
merged 46 commits into from
May 27, 2022
Merged

Search on es #3093

merged 46 commits into from
May 27, 2022

Conversation

stereosteve
Copy link
Contributor

@stereosteve stereosteve commented May 13, 2022

Description

  • full/search/full on ES
  • test against frontend
  • relevance tuning, share with ray/forrest
  • introduce new ENV variable to enable search: audius_elasticsearch_search_enabled
  • fallback to sql on error
  • autocomplete
  • other entity searches (tracks, etc.)
  • tests
  • script with assertions against production data

Completes PLAT-130

Tests

Using json-diff tool (npm i -g json-diff):

DISCPROV='http://x.x.x.x:5000'

time curl -s -H "X-User-ID: 2" "$DISCPROV/feed?limit=20" > es_feed.json
time curl -s "$DISCPROV/v1/full/search/full?limit=3&offset=0&query=isaac+dawn&user_id=aNzoj" > sql_user.json
time curl -s "$DISCPROV/v1/full/search/full?limit=3&offset=0&query=isaac+dawn" > sql_anon.json
time curl -s "$DISCPROV/v1/tracks/search?limit=15&query=isaac+dawn" > sql_anon.json
time curl -s "$DISCPROV/v1/playlists/search?limit=15&query=isaac+dawn" > sql_anon.json
time curl -s "$DISCPROV/v1/users/search?limit=15&query=isaac+dawn" > sql_anon.json
json-diff es_user.json sql_user.json | less

@stereosteve stereosteve marked this pull request as ready for review May 26, 2022 15:08
@stereosteve stereosteve requested review from jowlee and removed request for hareeshnagaraj and piazzatron May 26, 2022 17:28
discovery-provider/compose/docker-compose.backend.yml Outdated Show resolved Hide resolved
Comment on lines +49 to +50
logger.info('refreshing index: ' + indexName)
await es.indices.refresh({ index: indexName })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this take a while?
If already logging, might be nice to record the time it takes as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a few ms each index so probably ok without logging. it was added here since we extended the refresh_interval to 5s. an edge case would be when the indexer first catches up we want to refresh before making it available otherwise there could be no results for a few seconds. it's also necessary for test cases since we don't want to wait 5s.

discovery-provider/es-indexer/src/indexers/UserIndexer.ts Outdated Show resolved Hide resolved
discovery-provider/es-indexer/src/types/docs.ts Outdated Show resolved Hide resolved
Comment on lines +56 to +57
+ to_wei(user.get("associated_sol_wallets_balance", "0") or 0)
+ to_wei(user.get("waudio", "0") or 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to multiply the solana wormhole audio by 10^10

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks right. @sddioulde can you confirm for us?

Copy link
Member

@raymondjacobson raymondjacobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so great. I don't want to be critical review path here, but it looks great to me.

@@ -85,19 +117,19 @@ export class TrackIndexer extends BaseIndexer<TrackDoc> {
-- etl tracks
select
tracks.*,
(tracks.download->>'is_downloadable')::boolean as is_downloadable,
(tracks.download->>'is_downloadable')::boolean as downloadable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change? just curious

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's the naming we use in the response currently

"downloadable": false,
"download": {
"cid": null,
"is_downloadable": false,
"requires_follow": false
},

discovery-provider/scripts/search_quality.py Show resolved Hide resolved
discovery-provider/scripts/search_quality.py Show resolved Hide resolved

# users: finalize
for k in ["users", "followed_users"]:
users = drop_copycats(response[k])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just so i understand, we're well-protected here b/c of overfetch yeah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that's a good catch. I will add some over-fetching to user limit

"fields": [
"suggest",
"suggest._2gram",
"suggest._3gram",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious does 2gram give us a lot in addition to having trigrams?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit confused about how search_as_you_type worked at first.

By default it uses the default tokenizer (words) and then builds ngrams with those word tokens. In the past I've always used ngrams to describe letter couplets and shingles to describe word couplets... but in current config I think these ngrams fields are working with word tokens

So for example with you took that photo:

  • 2grams: you took took that that photo
  • 3grams: you took that took that photo

This still helps with phrase matching because the default suggest will match if any of the words appear in any order... but additional matches on 2grams and 3grams will happen if words appear in same order in the query and document, so more relevant results will rank higher. At least that's my current understanding... the search_as_you_type docs are a bit sparse haha.

We were considering if search_as_you_type was giving us much over just using a standard analyzer, but it seemed to be working well enough, so we kept using it. Since these fields are short (a handful of terms) the overhead of building these 2gram and 3gram subfields is negligible.

But might revisit in future mapping changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea that's how i understand it too and why i think track title artist vs artist track title ordering slightly matters.

Comment on lines +56 to +57
+ to_wei(user.get("associated_sol_wallets_balance", "0") or 0)
+ to_wei(user.get("waudio", "0") or 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks right. @sddioulde can you confirm for us?

@stereosteve stereosteve merged commit 8f9fe4c into master May 27, 2022
@stereosteve stereosteve deleted the search-on-es branch May 27, 2022 16:23
users = response["users"]
users = list(map(extend_user, users))
return success_response(users)
return success_response(response)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be like:

response = search(search_args)
return success_response(response['users'])

sliptype pushed a commit that referenced this pull request Sep 10, 2023
Co-authored-by: Saliou Diallo <saliou@audius.co>
sliptype pushed a commit that referenced this pull request Sep 10, 2023
[e804aa1] [C-2248, C-2373] Use playlistUpdates, remove legacyNotifications (#3094) Dylan Jeffers
[824933e] [C-2366] Improve web notification selection performance (#3103) Dylan Jeffers
[4b8edef] [PLAT-696] Add trending-playlists/underground notifications (#3089) Dylan Jeffers
[1f9cf3e] [C-2275] Fix android drawer offsets (#3095) Dylan Jeffers
[fc14c82] [PAY-1063][PAY-1085][PAY-1086] Update UI for inaccessible gated tracks from favorites and history pages (#3100) Saliou Diallo
[b0441f5] [C-2365] Update play buttons on web and mobile to show resume when track is current (#3101) Kyle Shanks
[453910f] [C-2378] Add upload v2 feature flag (#3099) Sebastian Klingler
[962a6df] [C-2337] Remove reachability mobile web (#3090) Raymond Jacobson
[4ad5cd2] Fix visible collectibles for upload popup (#3093) Saliou Diallo
[c143078] Fix feature flag bug (#3092) Saliou Diallo
[44435b5] Fix upload prompt modal learn more url (#3091) Saliou Diallo
[c9024ad] Use chat.messagesStatus instead of selector (#3087) Reed
[38d43c4] [C-2369] Fix issue where notification poll can break app on signout (#3088) Dylan Jeffers
[90122d9] [PAY-923] DMs: Add desktop entrypoints (#3083) Marcus Pasell
[00f27e8] [PAY-907] Mobile chat reactions (#3020) Reed
[4678b89] DMs: Fix broken typecheck on main (#3086) Marcus Pasell
[756ade4] [PAY-1000][PAY-1084][PAY-1096][PAY-1097][PAY-1098] - More gated content fixes (#3085) Saliou Diallo
[820aa9d] Fix upload and repost probers tests and lint (#3076) Sebastian Klingler
[345607e] [C-2320] Fix profile socials alignment (#3079) Dylan Jeffers
[569199c] Fix prod build timeout (#3084) Sebastian Klingler
[12f6c22] Remove ports for local dev (#3082) Theo Ilie
[1940618] Fix broken Main build due to typeerror (#3080) Marcus Pasell
[eb8d47e] [PAY-1082] DMs: Dedupe sent messages (#3066) Marcus Pasell
[50a11c3] Update SDK to 2.0.3-beta.0 (#3078) Marcus Pasell
[c420fbb] Clean up NPM package lock (#3077) Marcus Pasell
[35d1124] [C-2327] Add playlist updates slice (#3063) Dylan Jeffers
[59862ad] [C-2344] Update the web playbar scrubber to respect the playback speed of podcasts (#3075) Kyle Shanks
[ffeb0d3] [C-2349] Default download on wifi only to false (#3074) Andrew Mendelsohn
[cafae41] [C-2325] Fix playlist table date-added column (#3073) Dylan Jeffers
[384a510] [PAY-927] DMs: Empty messages state (#3068) Marcus Pasell
[1132f83] Update @jup-ag/core to 2.0.0-beta.9 (#3072) Marcus Pasell
[49c0ebf] [PAY-1072] Change "Download App" icon on Settings Page (#3067) Marcus Pasell
[928dcaf] [PAY-1056] - More gated content updates and fixes (#3070) Saliou Diallo
[1e1f769] [C-2345] Move PlaybackRate drawer to common drawers map (#3071) Kyle Shanks
[f5d1251] Fix web-dist CI steps (#3069) Sebastian Klingler
[5f89800] Fix heavy rotation playlist on client (#3056) sabrina-kiam
[c0191e2] [C-2316] Add remote config for all oauth verification (#3052) Raymond Jacobson
[40f5627] [PAY-1074][PAY-1075][PAY-1076][PAY-1080] - Update availability settings states + more QA fixes (#3059) Saliou Diallo
[5be60ac] [C-2339] Update podcast control updates to also work for audiobooks (#3065) Kyle Shanks
[163ebf5] [C-2297] Add fallback flag to podcast feature (#3064) Sebastian Klingler
[f206391] [PAY-904] - Add gated content upload prompt (#3057) Saliou Diallo
[1afc4e5] [C-1344] Move probers to monorepo and make tests pass (#3061) Sebastian Klingler
[e198279] Remove random line (#3062) Saliou Diallo
[24a001b] Add playback position logic for mobile (#3051) Kyle Shanks
[d210124] [PAY-1070] Update TabSlider/SegmentedControl slider size on resize (#3044) Marcus Pasell
@AudiusProject AudiusProject deleted a comment from linear bot Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants