Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support $regex's x option #592

Open
1 task
AlekSi opened this issue May 11, 2022 · 7 comments
Open
1 task

Support $regex's x option #592

AlekSi opened this issue May 11, 2022 · 7 comments
Labels
area/fuzz Issues about fuzzing and smithing code/enhancement Some user-visible feature could work better not ready Issues that are not ready to be worked on; PRs that should skip CI

Comments

@AlekSi
Copy link
Member

AlekSi commented May 11, 2022

Cover with great tests and fuzzing.

DoD:

  • update user docs about $regex as this issue is mentioned there
@AlekSi AlekSi added the code/enhancement Some user-visible feature could work better label May 11, 2022
@noisersup
Copy link
Member

Hi, I was studying a little bit a codebase yesterday and I'm interested in this issue. Wouldn't you mind if I take that on me?

@AlekSi
Copy link
Member Author

AlekSi commented May 23, 2022

Sure. Please make sure that you add good tests for that. I will try to help with fuzzing as soon as possible :)

@noisersup
Copy link
Member

noisersup commented May 27, 2022

There's not too much about free-spacing mode on internet, as far as I found there are also small differences between engines, finally some of them doesn't implement it at all...

My approach is to use every regex input with "x" flag as an argument in a function which just translates multi-line, free-spaced, commented regex into a string understandable by Go regexp package.

However I'm not sure if my condidtions correspond to the intended behavior.

Conditions that I'm not sure:

  1. Spaces between [ ] shouldn't be removed from a string
  2. Spaces between { } shouldnt't be also removed.
    ( a{10} matches all text where a is repeated 10 times in a row, but a{1 0} just match "a{1 0}" in a text)

I have tried to find an answer in mongodb source code but still I'm not sure about that.
If anybody knows an answer I would be more than happy to hear about it.
If something is not clear let me know, I'll do my best to clarify.

Also I understand that there's a lot of work to do on this project so I will just try to implement that this way and continue to search.

@seeforschauer
Copy link
Contributor

@noisersup
Copy link
Member

@seeforschauer I'm already there!

@AlekSi
Copy link
Member Author

AlekSi commented May 27, 2022

Conditions that I'm not sure

In that case, we use integration tests to do the same thing as MongoDB. Please add a test there. See there for an overview of our testing.

@AlekSi AlekSi added this to the v0.3.1 milestone May 30, 2022
@AlekSi AlekSi added the not ready Issues that are not ready to be worked on; PRs that should skip CI label Jun 17, 2022
@AlekSi AlekSi removed this from the v0.4.0 milestone Jun 23, 2022
@AlekSi AlekSi added the area/fuzz Issues about fuzzing and smithing label Sep 1, 2022
@quasilyte
Copy link
Contributor

Doing a regexp pattern preprocessing as described above would require full-fledged regex parsing to avoid issues when working with [] and {}.

The Go standard regex/syntax package is not a good fit here as it wasn't really made for external users. It creates an AST that is good for regex package compilation, but otherwise, it's hard to work with. For instance, it's hard to convert it back to string (or use it to construct a new, preprocessed string).

I used this package to do regexp parsing in a couple of linters:
https://github.com/quasilyte/regex/tree/master/syntax

It supports most of the PCRE syntax as well, which can be handy since MongoDB uses this dialect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/fuzz Issues about fuzzing and smithing code/enhancement Some user-visible feature could work better not ready Issues that are not ready to be worked on; PRs that should skip CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants