Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't set "no stopwords" on analyzer #329

Closed
ppearcy opened this issue Aug 19, 2010 · 10 comments
Closed

Can't set "no stopwords" on analyzer #329

ppearcy opened this issue Aug 19, 2010 · 10 comments

Comments

@ppearcy
Copy link
Contributor

ppearcy commented Aug 19, 2010

It appears that my empty stopwords for a pattern tokenizer is getting ignored. Doing a facet on the field shows me that AN is not making it into the term list.

Here is my elastic search config:

gateway:
type: fs
fs:
location: /data/elasticsearch/
index:
analysis :
analyzer :
piped_space_semi :
type: pattern
lowercase: true
pattern: '[&||;| ]+'
stopwords: []
path:
logs: /data/elasticsearch/logs/

Here is a test to reproduce this. It creates the index, adds a mapping to use to tokenizer, submits two docs and then shows the bug with the two queries.

curl -XPUT http://localhost:9200/testindex/

curl -XPUT http://localhost:9200/testindex/testindex/_mapping -d '{"testindex": {"date_formats": ["date_optional_time"], "dynamic": false, "properties": {"testsymbol": {"omit_norms": true, "type": "string", "analyzer": "piped_space_semi"}}, "_source": {"compress": true}}}'

curl -XPUT 'http://localhost:9200/testindex/testindex/test1' -d '
{ testsymbol: "US;MSFT" }
'

curl -XPUT 'http://localhost:9200/testindex/testindex/test2' -d '
{ testsymbol: "US;AN" }
'

Matches both - Should only match US;AN

http://localhost:9200/testindex/testindex/_search?q=%22US%20AN%22

Matches correct

http://localhost:9200/testindex/testindex/_search?q=%22US%20MSFT%22

@kimchy
Copy link
Member

kimchy commented Aug 20, 2010

Yea, thats a problem.. . The problem with how configuration is done is that there is no way to identify empty array (== no stop words) and not set (== default stop words). I am going to push a fix that will allow for stopwords: _none_ meaning no stopwords.

@kimchy
Copy link
Member

kimchy commented Aug 20, 2010

Can't set "no stopwords" on analyzer, closed by c0552bd.

@kimchy
Copy link
Member

kimchy commented Aug 20, 2010

pushed a fix, can you test it?

@ppearcy
Copy link
Contributor Author

ppearcy commented Aug 20, 2010

Awesome and as always thanks!

@ppearcy
Copy link
Contributor Author

ppearcy commented Aug 20, 2010

Yeah, will have an update for you tomorrow. Just started rebuilding content for a different reason anyways.

@kimchy
Copy link
Member

kimchy commented Aug 20, 2010

no problem, its pretty late here, would love someone looking over my shoulder ;). And I fixed it for all analyzers, not just pattern.

@ppearcy
Copy link
Contributor Author

ppearcy commented Aug 20, 2010

Cool... will have details later tonight (my time) for you.

@ppearcy
Copy link
Contributor Author

ppearcy commented Aug 20, 2010

Looks good to me. Thanks!

@mrkamel
Copy link

mrkamel commented Jan 10, 2014

Would be really great to have the

stopwords: _none_

in the docs. Searched several hours for a fix and finally found the _none_ solution in the source code.
Please let me know if i can help to extend the docs.

@clintongormley
Copy link

mrkamel pushed a commit to mrkamel/elasticsearch that referenced this issue Jan 13, 2014
mrkamel pushed a commit to mrkamel/elasticsearch that referenced this issue Jan 13, 2014
brusic pushed a commit to brusic/elasticsearch that referenced this issue Jan 19, 2014
brusic pushed a commit to brusic/elasticsearch that referenced this issue Jan 19, 2014
williamrandolph pushed a commit to williamrandolph/elasticsearch that referenced this issue Jun 4, 2020
Fix incorrect statement in README.md
Fix elastic#329
mindw pushed a commit to mindw/elasticsearch that referenced this issue Sep 5, 2022
…elastic#329)

Terraform: add target stickiness to EMS load balancing

Simply relying on least_outstanding_requests for balancing between EMS nodes
didn't provide the result we hoped for - many times the cluster was out of
balance. Maybe because each client has multiple EMS adaptors and during the
time of connection, the ALB does not "know" if this will be a socket connection
or a simple request. By adding stickiness based on the Engageli session cookie
all requests from a single client end up on the same host. This is probably a
temporary measure until we implement our own traffic steering layer.


Approved-by: Can Yildiz
Approved-by: fabien
Approved-by: Andre Sodermans
costin pushed a commit that referenced this issue Dec 6, 2022
🤖 ESQL: Merge upstream
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Oct 2, 2023
With this commit we don't purge the CDN cache anymore after a new change
to the benchmarks page has been deployed but add a clarifying note about
the CDN cache expiration time. We are dropping cache purge after infra
has moved from Fastly to Google's CDN where they have configured a cache
expiry of 10 minutes. Given that the timeout is so small, we don't push
changes very often and purging the cache on Google's CDN is a bit more
involved, our team has decided that we just let the stale cache entry
expire.

Closes elastic#322
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants