Metadata fields for text do not support space delimiters #32

Akron · 2017-08-23T12:19:25Z

When searching a meta data field like "author" as part of a virtual corpus, currently it's not possible to query this as a string with a space delimiter, e.g. "author eq 'Theodor Fontane'" does not work. Maybe text fields with "eq" should be treated like sequences of tokens delimited by spaces.

Akron · 2017-09-13T16:18:17Z

Another possibility would be to index all text fields both as sequences of tokens and as a string.

Akron · 2018-03-28T14:21:19Z

One problem is, that meta data fields like "author" will be language specific. For the moment I will make this german-only, but we may need to come up with a good solution that is not language specific.

Akron · 2018-04-04T18:51:48Z

For the moment, I ignore language specific indexation and use the StandardTokenizer with standard lowercasing. It is now possible to term search a text string and search the string using a phrase query as well. This is realized by prepending the verbatim string as a token with a huge position gap to the real token stream. After playing around with the prefered tokenizer pipeline in Lucene I switched to creating the tokenstream myself.

(Will be closed once the changes are reviewed.)

Akron · 2018-04-05T12:39:32Z

Changes are now in master. They require reindexing to take effect.

Akron mentioned this issue Feb 15, 2018

corpusByMatch assistant KorAP/Kalamar#27

Closed

Akron added this to the Reindexing-2018 milestone Mar 7, 2018

Akron closed this as completed Apr 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata fields for text do not support space delimiters #32

Metadata fields for text do not support space delimiters #32

Akron commented Aug 23, 2017

Akron commented Sep 13, 2017

Akron commented Mar 28, 2018

Akron commented Apr 4, 2018 •

edited

Akron commented Apr 5, 2018

Metadata fields for text do not support space delimiters #32

Metadata fields for text do not support space delimiters #32

Comments

Akron commented Aug 23, 2017

Akron commented Sep 13, 2017

Akron commented Mar 28, 2018

Akron commented Apr 4, 2018 • edited

Akron commented Apr 5, 2018

Akron commented Apr 4, 2018 •

edited