Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain identifiers cause an ES exception #4

Open
emanuil-tolev opened this issue Jan 22, 2012 · 6 comments
Open

Certain identifiers cause an ES exception #4

emanuil-tolev opened this issue Jan 22, 2012 · 6 comments
Assignees
Labels
Milestone

Comments

@emanuil-tolev
Copy link
Contributor

Try identifying "-" (dash) without the quotes on the web front-end. ElasticSearch tries to parse the dash and throws an exception. The query needs to be escaped, in some way.

Only relevant thing I found was this ElasticSearch issue: elastic/elasticsearch#41

Which means that ES should allow escaping. As far as I understand, that would work by adding a field to the query_string JSON object, escape: true or escape: 1. Trouble is, I'm not sure how to modify the query_string JSON object - it seems to me that dao.DomainObject.query() just takes q="string", gives it to pyes, pyes turns this into a query_string object with the "query" field set to "string". Just can't quite grasp how to add "escape": "true" in this flow.

Any help? Looking at the pyes pydoc didn't yield an unexpected revelation...

@emanuil-tolev
Copy link
Contributor Author

We might look at that for dev8d, but decided that we don't necessarily care enough about this edge case to fix it right now. It's not critical.

@emanuil-tolev
Copy link
Contributor Author

UPDATE: the problem described in this comment turned out to be a separate issue and was resolved accordingly. The last comment below should give you the current status of the original issue.

Interesting, another problem: if you try to identify the string
"car insurance systems" via both GET and POST with

pyes.urllib3.connectionpool.MaxRetryError
MaxRetryError: Max retries exceeded for url: /idfind/uidentifier/car insurance system

I'm thinking the spaces have something to do with it.

@emanuil-tolev
Copy link
Contributor Author

Another one, trying to identify
"(Forenames Surname|forenames.surname@gmail.com|xxx@somewhere.ac.uk)" via both GET and POST.

pyes.urllib3.connectionpool.MaxRetryError
MaxRetryError: Max retries exceeded for url: /idfind/uidentifier/(Forenames Surname|forenames.surname@gmail.com|xxx@somewhere.ac.uk)

@emanuil-tolev
Copy link
Contributor Author

Okay, so all those errors were caused by commit 8f9746c which tried to prevent duplicates in the unknown identifiers (document type uidentifier) by assigning the identifier string as the document id. That doesn't sit very well with spaces and other "special" characters.

All fixed now, but now we've got a record of every unsuccessful attempt to identify something in the index. Note: every attempt (so if I try 'lalala' 5 times, we get 5 docs in the index). We don't really want duplicates as we want to run some sort of background processing to identify those unknowns using newly submitted tests in the future, but we'll have to solve this some other way.

emanuil-tolev added a commit that referenced this issue Mar 19, 2012
…t fixed) by allowing the index to have duplicates of unknown identifiers (uidentifier doc. type). Basically fix a bug introduced by the solution of #16 in commit 8f9746c.
@emanuil-tolev
Copy link
Contributor Author

This issue stays open as original ES exception still not fixed AFAIK.

@ghost ghost assigned emanuil-tolev May 15, 2012
@emanuil-tolev
Copy link
Contributor Author

MaxRetryErrors above have been fixed (by ef930d2 I suspect).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant