-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ranking #941
Comments
Technically I'm not sure how this could be done, but we could indeed have a certain download number threshold before adding alternative names maybe? That way at least those three examples would not show up anymore, it bothered me too. The one that will be harder to avoid is |
just ordering by download should help no? |
The problem is that custom ranking comes after word relevance, unless you mean resorting in the frontend @bodinsamuel |
Downloads are already used as part of the ranking but as I understand they have lower priority than the match on project name, so they don't help here. Something I'm not 100% sure about though - I would expect that "ignored words" apply only to whole words, not parts of them so in "vuejs" the "js" suffix would not be ignored. But based on the search results, it either is ignored even when it's a suffix (in which case my proposed solution would help) or it's some other algolia feature that makes this rank higher (in which case the threshold idea may not help). |
The js is not an ignored word, but it's listed as an alternativeName, which is listed after the official name in searchableAttributes |
no frontend reordering :p I thought putting popular sooner in the ranking would help but it's the rule that promote those package indeed. we could remove the rule for promoting alternativeName? (I'm not entirely when it's relevant) |
Oh I see, I previously looked only at the index config but there's a whole bunch of logic that adds alternative names in the code. Then my previous suggestion stands - do this only if the package seems relevant based on other metadata. |
deprecated should already show up lower according to ranking, but if you spell it correctly, it should still be the first result, as sometimes you would be looking for that. I think we can add a certain popularity threshold before packages get alternative names. Will look into a PR |
I agree you should get deprecated packages when spelled correctly but not sure about alternative names - there it's a similar problem that if the name is similar enough, deprecation has too low priority to push the result down. My idea was to use multiple attributes for this because downloads may not always be enough, e.g. add it only if all:
If you didn't want to use the other rules then I suppose a higher popularity threshold. |
#951 should actually fullfill most of those use cases, at the cost of not showing popular alternative names that match in a first or second position. I think those cases are fairly small though, only one I could think of is places.js, hogan.js or floating.js (small one) but they all still eventually show up |
Any update on the reindex progress? Are we able to test the new attributes now? |
I don't think we started a full reindex, as it stops the synchronisation, and would take a very long time to catch up, while we don't yet have code for handling multiple instances. |
yup sorry with all the holidays I did not had time to look into it. I'll try to setup that today or tomorrow, full indexing takes more than a week. |
@bodinsamuel did you have time to set up the other instance? |
Yes it's currently running |
NB: you can test what you want using the tmp index named |
The index doesn't use the newly added field yet because we wanted to test it first after all packages are updated (0e5194d) without affecting the main index. Considering the bootstrap runs in a separate index anyway, what would be the best course of action? I suppose we could update the config right away? |
ah true, forgot about that. Since we did not merge, all manual changes will be erased after some time by the script so not good. We could merge the PR in master and do not deploy the main process |
I have swapped the index since it had fully reindexed. |
Awesome, it works really well! I tested all packages from my initial post here as well as those from comments in #951 and all of them return relevant results now. |
@bodinsamuel when you have time please check and swap the indexes if possible 😄 |
@bodinsamuel @Haroenv ping, can we get this live, please? 😄 |
I'll merge 👌🏻 |
Also not sure if the indexes were swapped yet. |
Applied on the main index 👍🏻 |
Sorry for the delay, it's hard to allocate time internally 🙇🏻 Thanks a lot for your patience |
After including package downloads within the search results, we noticed a very common problem with relevance. I believe it's caused by ignoring "js" and probably typo tolerance, and I wonder if there's a way to improve this. A few examples:
https://www-jsdelivr-com-pr-400.onrender.com/?query=vue
First match "vue" (correct), then "vue.js" (garbage), then "vuejs" (garbage, even marked as deprecated), then relevant vue-related packages.
https://www-jsdelivr-com-pr-400.onrender.com/?query=react
First match "react" (correct), then "reactjs" (garbage), then re-act (garbage), then relevant react-related packages.
https://www-jsdelivr-com-pr-400.onrender.com/?query=bootstrap
First match "bootstrap" (correct), then "bootstrapjs" (garbage).
I know that in some cases, it's the other way around and "package.js" is the correct popular name while "package" is garbage, so this isn't easy, but at the same time, npm has very often multiple packages with similar names, so this happens for the majority of searches. To get correct results, we'd need to allow the "relaxed" matching (js ignore and typos) only if a popular match isn't found.
I'm not sure how doable this is but one idea I got so far - maybe instead of ignoring "js" globally, only set the other version as alternate name for packages that seem relevant (downloads above some threshold, not deprecated, have readme, ...). Non-relevant packages wouldn't get this relaxed matching.
The text was updated successfully, but these errors were encountered: