Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Search User API #944

Closed
2 of 3 tasks
mrkvon opened this issue Dec 9, 2018 · 12 comments
Closed
2 of 3 tasks

Improve Search User API #944

mrkvon opened this issue Dec 9, 2018 · 12 comments

Comments

@mrkvon
Copy link
Contributor

mrkvon commented Dec 9, 2018

Follow up of #531.

Improve /api/users?search=string

  • pagination
  • expand the user profile fields returned, or allow user to specify which fields to return
  • improve search via usernames; figure out how dots, dashes etc. are treated by text index
    it is working fine for periods, hyphen and underscores. If you put period or hyphen into the search it works the same like a space
@mrkvon
Copy link
Contributor Author

mrkvon commented Dec 9, 2018

ping @Akronix

Feel free to update the issue with further requirements.

@petr-hajek petr-hajek changed the title Improve /api/users?search=string Improve Search User API Dec 12, 2018
@Akronix Akronix mentioned this issue Dec 13, 2018
9 tasks
@Akronix
Copy link
Contributor

Akronix commented Dec 31, 2018

How about fuzzy search?
Searching for "Abe" should return all instances of Abel, one of them myself: Abel Por Qué No.
And better, searching for "Avel" (similar names) should find "Abel" as well...

@simison
Copy link
Contributor

simison commented Jan 2, 2019

Searching for "Abe" should return all instances of Abel

So this was possible with MongoDB regexp queries but the issue was that they weren't relying on indices and thus potentially very slow — is that correct @mrkvon ?

I wonder if we could do 2nd expensive query with regexp if first one doesn't yield results?

And better, searching for "Avel" (similar names) should find "Abel" as well...

I agree that's how it should work ideally. Great if it's easy possible and performant enough with MongoDB Indices but I suspect it won't be either. :-( Investing time and maintenance energy in something like ElasticSearch at this point probably isn't worth the effort on the other hand?

@guaka
Copy link
Contributor

guaka commented Jan 2, 2019 via email

@simison
Copy link
Contributor

simison commented Jan 2, 2019

Yep, that's a valid point. If someone whips up a PR I'm happy to help to measure with production data (~34K users).

@guaka
Copy link
Contributor

guaka commented Jan 2, 2019 via email

@mrkvon
Copy link
Contributor Author

mrkvon commented Jan 2, 2019

potentially very slow

I believe the main issue is being a potential vector of DoS attack. Not the slowness on its own.
So, can we limit amount of requests from one user per second for a specific API endpoint?

Also, unlike a search using indices, this will get worse as the amount of members grows.

@simison
Copy link
Contributor

simison commented Jan 2, 2019

I believe the main issue is being a potential vector of DoS attack. Not the slowness on its own.
So, can we limit amount of requests from one user per second for a specific API endpoint?

Yes, we already do that for the whole /api/* but adding a separate more strict "bucket" for specific API endpoints is very straightforward.

It's configured in Nginx.

@mrkvon
Copy link
Contributor Author

mrkvon commented Jan 2, 2019

Then go on with the regex. Just make a restrictive rate limit and be aware that it's a temporary solution which is not scalable as the amount of members grows.

https://docs.mongodb.com/manual/reference/operator/query/regex/#index-use

Some case-sensitive search (i.e. ?search=abel doesn't find Abel) can use indices.
Eventually, we could create lowercase versions of searched fields to have (pseudo)case-insensitive search with index.

Or we can migrate to a database which supports more option. 😅 👎

@Akronix
Copy link
Contributor

Akronix commented Jan 3, 2019

Good discussion.

I'd go ahead with regexp.
Then, when scaling become an actual issue, we can consider moving to elasticsearch.

@simison
Copy link
Contributor

simison commented Jan 3, 2019

Yep and yep. 👍

We can deploy it as such to prod and follow up with regexp version but it's not urgent; we have something that already works.

@github-actions
Copy link

This issue is marked as unloved because it has not had any activity for 180 days.

It doesn't mean it's not important, so please remove the unloved label if you like it, or add a comment saying what it means to you :) If this was a bug, maybe you can test to see if this is still an issue?

However, if you just leave it like this, I'll close it in 14 days to help keep your issues tidy!

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants