Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata fields for text do not support space delimiters #32

Closed
Akron opened this issue Aug 23, 2017 · 4 comments
Closed

Metadata fields for text do not support space delimiters #32

Akron opened this issue Aug 23, 2017 · 4 comments

Comments

@Akron
Copy link
Member

Akron commented Aug 23, 2017

When searching a meta data field like "author" as part of a virtual corpus, currently it's not possible to query this as a string with a space delimiter, e.g. "author eq 'Theodor Fontane'" does not work. Maybe text fields with "eq" should be treated like sequences of tokens delimited by spaces.

@Akron
Copy link
Member Author

Akron commented Sep 13, 2017

Another possibility would be to index all text fields both as sequences of tokens and as a string.

@Akron
Copy link
Member Author

Akron commented Mar 28, 2018

One problem is, that meta data fields like "author" will be language specific. For the moment I will make this german-only, but we may need to come up with a good solution that is not language specific.

@Akron
Copy link
Member Author

Akron commented Apr 4, 2018

For the moment, I ignore language specific indexation and use the StandardTokenizer with standard lowercasing. It is now possible to term search a text string and search the string using a phrase query as well. This is realized by prepending the verbatim string as a token with a huge position gap to the real token stream. After playing around with the prefered tokenizer pipeline in Lucene I switched to creating the tokenstream myself.

(Will be closed once the changes are reviewed.)

@Akron
Copy link
Member Author

Akron commented Apr 5, 2018

Changes are now in master. They require reindexing to take effect.

@Akron Akron closed this as completed Apr 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant