New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't set "no stopwords" on analyzer #329
Comments
Yea, thats a problem.. . The problem with how configuration is done is that there is no way to identify empty array (== no stop words) and not set (== default stop words). I am going to push a fix that will allow for |
Can't set "no stopwords" on analyzer, closed by c0552bd. |
pushed a fix, can you test it? |
Awesome and as always thanks! |
Yeah, will have an update for you tomorrow. Just started rebuilding content for a different reason anyways. |
no problem, its pretty late here, would love someone looking over my shoulder ;). And I fixed it for all analyzers, not just pattern. |
Cool... will have details later tonight (my time) for you. |
Looks good to me. Thanks! |
Would be really great to have the
in the docs. Searched several hours for a fix and finally found the |
Fix incorrect statement in README.md Fix elastic#329
…elastic#329) Terraform: add target stickiness to EMS load balancing Simply relying on least_outstanding_requests for balancing between EMS nodes didn't provide the result we hoped for - many times the cluster was out of balance. Maybe because each client has multiple EMS adaptors and during the time of connection, the ALB does not "know" if this will be a socket connection or a simple request. By adding stickiness based on the Engageli session cookie all requests from a single client end up on the same host. This is probably a temporary measure until we implement our own traffic steering layer. Approved-by: Can Yildiz Approved-by: fabien Approved-by: Andre Sodermans
With this commit we don't purge the CDN cache anymore after a new change to the benchmarks page has been deployed but add a clarifying note about the CDN cache expiration time. We are dropping cache purge after infra has moved from Fastly to Google's CDN where they have configured a cache expiry of 10 minutes. Given that the timeout is so small, we don't push changes very often and purging the cache on Google's CDN is a bit more involved, our team has decided that we just let the stale cache entry expire. Closes elastic#322
It appears that my empty stopwords for a pattern tokenizer is getting ignored. Doing a facet on the field shows me that AN is not making it into the term list.
Here is my elastic search config:
gateway:
type: fs
fs:
location: /data/elasticsearch/
index:
analysis :
analyzer :
piped_space_semi :
type: pattern
lowercase: true
pattern: '[&||;| ]+'
stopwords: []
path:
logs: /data/elasticsearch/logs/
Here is a test to reproduce this. It creates the index, adds a mapping to use to tokenizer, submits two docs and then shows the bug with the two queries.
curl -XPUT http://localhost:9200/testindex/
curl -XPUT http://localhost:9200/testindex/testindex/_mapping -d '{"testindex": {"date_formats": ["date_optional_time"], "dynamic": false, "properties": {"testsymbol": {"omit_norms": true, "type": "string", "analyzer": "piped_space_semi"}}, "_source": {"compress": true}}}'
curl -XPUT 'http://localhost:9200/testindex/testindex/test1' -d '
{ testsymbol: "US;MSFT" }
'
curl -XPUT 'http://localhost:9200/testindex/testindex/test2' -d '
{ testsymbol: "US;AN" }
'
Matches both - Should only match US;AN
http://localhost:9200/testindex/testindex/_search?q=%22US%20AN%22
Matches correct
http://localhost:9200/testindex/testindex/_search?q=%22US%20MSFT%22
The text was updated successfully, but these errors were encountered: