Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Unicode Index and Type Names #857

Closed
jbaiera opened this issue Sep 27, 2016 · 1 comment
Closed

Support Unicode Index and Type Names #857

jbaiera opened this issue Sep 27, 2016 · 1 comment

Comments

@jbaiera
Copy link
Member

jbaiera commented Sep 27, 2016

Currently, ES-Hadoop does not have total support for index or type names that are Unicode.

The ES-Hadoop REST Client is capable of interacting with Elasticsearch using Unicode names as long as they are property encoded for URL usage:

val indexName = "בְּדִיקָה"
val indexNameEncoded = java.net.URLEncoder.encode(indexName, "UTF-8")
println(indexNameEncoded) //%D7%91%D6%B0%D6%BC%D7%93%D6%B4%D7%99%D7%A7%D6%B8%D7%94

When ES-Hadoop queries for which shards to run against during the sampling phase of setting up the number of input splits/slices, it receives index/type names back from Elasticsearch that may be Unicode. The library does no further encoding of this information before using it. Without the encoding, the ES-Hadoop REST Client cannot successfully continue communicating with Elasticsearch.

I'm not sure where else this may pop up at, but it needs to be handled uniformly across the entire library. Unicode index/type names are allowed, and even if a user encodes the values before handing them to us, we will break on encoding anyway. If we add the encoding support for internal operations, we might as well fully support receiving Unicode index/types and do the encoding for the user to keep things uniform.

@rtrujill007
Copy link

@jbaiera I was just wondering if we can move the priority on this issue? We (Esri) opened a support ticket with ES a while back ago to help fix this issue.

Is there anything holding this issue back from being fixed? Thanks!

@acchen97 acchen97 mentioned this issue Jan 11, 2017
3 tasks
@jbaiera jbaiera added v5.3.0 and removed v5.2.0 labels Jan 31, 2017
jbaiera added a commit that referenced this issue Feb 13, 2017
The SimpleRequest object takes both path and query parts for the URL, but so much of the framework combines them together into the path for ease of use. SimpleRequest now senses this, and splits them apart so that they may be each encoded separately from each other (queries are often encoded at time of their creation, but paths are encoded just before the request is sent).
fixes #857
@jbaiera jbaiera closed this as completed Mar 28, 2017
jbaiera added a commit that referenced this issue May 8, 2017
The SimpleRequest object takes both path and query parts for the URL, but so much of the framework combines them together into the path for ease of use. SimpleRequest now senses this, and splits them apart so that they may be each encoded separately from each other (queries are often encoded at time of their creation, but paths are encoded just before the request is sent).
fixes #857
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants