Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag queries choke on certain punctuation #411

Closed
stockholmux opened this issue Aug 2, 2018 · 3 comments
Closed

Tag queries choke on certain punctuation #411

stockholmux opened this issue Aug 2, 2018 · 3 comments

Comments

@stockholmux
Copy link
Contributor

According to the documentation:

The tokenization is simpler: The user can determine a separator (defaults to a comma) for multiple tags, and we only do whitespace trimming at the end of tags. Thus, tags can contain spaces, punctuation marks, accents, etc. The only two transformations we perform are lower-casing (for latin languages only as of now), and whitespace trimming.

Test schema and a couple documents

127.0.0.1:6379> ft.create testtags schema atag tag
OK
127.0.0.1:6379> ft.add testtags mydoc1 1 fields atag "hello:42"
OK
127.0.0.1:6379> ft.add testtags mydoc2 1 fields atag "hello;42"
OK

So, you can add documents with tag field values that contain punctuation, but you can't query them.

127.0.0.1:6379> ft.search testtags "@atag:{ hello:42 }"
(error) Syntax error at offset 13 near 'hello'
127.0.0.1:6379> ft.search testtags "@atag:{ hello;42 }"
(error) Syntax error at offset 13 near 'hello'

Seems to be choking at the [semi-]colons. Escaping doesn't work either:

127.0.0.1:6379> ft.search testtags "@atag:{ hello\:42 }"
(error) Syntax error at offset 13 near 'hello'
127.0.0.1:6379> ft.search testtags "@atag:{ hello\;42 }"
(error) Syntax error at offset 13 near 'hello'

Some punctuation seems to be OK.

127.0.0.1:6379> ft.search testtags "@atag:{ hello>42 }"
1) (integer) 0
127.0.0.1:6379> ft.search testtags "@atag:{ hello!42 }"
1) (integer) 0
127.0.0.1:6379> ft.search testtags "@atag:{ hello^42 }"
1) (integer) 0
127.0.0.1:6379> ft.search testtags "@atag:{ hello 42 }"
1) (integer) 0

Further testing:

127.0.0.1:6379> ft.search testtags "@name:{ hello:42 }"
(error) Syntax error at offset 13 near 'hello'
127.0.0.1:6379> ft.search testtags "@atag:{ hello;42 }"
(error) Syntax error at offset 13 near 'hello'
127.0.0.1:6379> ft.search testtags "@atag:{ hello$42 }"
(error) Syntax error at offset 13 near '42'
127.0.0.1:6379> ft.search testtags "@atag:{ hello%42 }"
(error) Syntax error at offset 14 near '42'
127.0.0.1:6379> ft.search testtags "@atag:{ hello@42 }"
(error) Syntax error at offset 17 near '42'
127.0.0.1:6379> ft.search testtags "@atag:{ hello*42 }"
(error) Syntax error at offset 17 near '42'
127.0.0.1:6379> ft.search testtags "@atag:{ hello(42 }"
(error) Syntax error at offset 17 near '42'

Other punctuation seems to be all over the place as far as where the error is located.

@stockholmux
Copy link
Contributor Author

Probably related.

If you do use non-syntax error producing punctuation in the queries, the query parser seemingly still tokenizes the tag:

127.0.0.1:6379> ft.add testtags foo2 1 fields atag "hello!42"
OK
127.0.0.1:6379> ft.search testtags "@atag:{ hello!42 }"
0
127.0.0.1:6379> ft.explain testtags "@atag:{ hello!42 }"
TAG:@atag {
  INTERSECT {
    hello
    42
  }
}

I would expect the explain to be this instead of the INTERSECT:

> ft.explain testtags "@atag:{ hello!42 }"
TAG:@atag {
  hello!42
}

@dvirsky
Copy link
Contributor

dvirsky commented Aug 6, 2018

but why? all punctuations are tokenizing Kyle. escaping them needs double backslashes. There's no bug here AFAICT

@stockholmux
Copy link
Contributor Author

Glad that worked! Although we need to change the documentation now - the only place I'm seeing double escaping is on the Tokenization and Escaping page that specifically says it's only for text fields and says to look on the tag page for a description of how tagging tokenization/escaping works - no mention of double escaping there (only single).

stockholmux pushed a commit to stockholmux/RediSearch that referenced this issue Aug 7, 2018
mnunberg added a commit that referenced this issue Aug 8, 2018
Clarifying double escaping on tag fields (#411)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants