Support Unicode Index and Type Names #857

jbaiera · 2016-09-27T05:32:44Z

Currently, ES-Hadoop does not have total support for index or type names that are Unicode.

The ES-Hadoop REST Client is capable of interacting with Elasticsearch using Unicode names as long as they are property encoded for URL usage:

val indexName = "בְּדִיקָה"
val indexNameEncoded = java.net.URLEncoder.encode(indexName, "UTF-8")
println(indexNameEncoded) //%D7%91%D6%B0%D6%BC%D7%93%D6%B4%D7%99%D7%A7%D6%B8%D7%94

When ES-Hadoop queries for which shards to run against during the sampling phase of setting up the number of input splits/slices, it receives index/type names back from Elasticsearch that may be Unicode. The library does no further encoding of this information before using it. Without the encoding, the ES-Hadoop REST Client cannot successfully continue communicating with Elasticsearch.

I'm not sure where else this may pop up at, but it needs to be handled uniformly across the entire library. Unicode index/type names are allowed, and even if a user encodes the values before handing them to us, we will break on encoding anyway. If we add the encoding support for internal operations, we might as well fully support receiving Unicode index/types and do the encoding for the user to keep things uniform.

The text was updated successfully, but these errors were encountered:

rtrujill007 · 2017-01-10T16:50:53Z

@jbaiera I was just wondering if we can move the priority on this issue? We (Esri) opened a support ticket with ES a while back ago to help fix this issue.

Is there anything holding this issue back from being fixed? Thanks!

The SimpleRequest object takes both path and query parts for the URL, but so much of the framework combines them together into the path for ease of use. SimpleRequest now senses this, and splits them apart so that they may be each encoded separately from each other (queries are often encoded at time of their creation, but paths are encoded just before the request is sent). fixes #857

jbaiera added :Rest enhancement v5.0.0-rc1 labels Sep 27, 2016

jbaiera added v5.0.0 and removed v5.0.0-rc1 labels Oct 10, 2016

jbaiera added v5.0.1 and removed v5.0.0 labels Oct 26, 2016

jbaiera added v5.0.2 and removed v5.0.1 labels Nov 17, 2016

jbaiera added v5.2.0 and removed v5.0.2 labels Dec 15, 2016

acchen97 mentioned this issue Jan 11, 2017

ES-Hadoop 5.3 release #916

Closed

3 tasks

jbaiera added v5.3.0 and removed v5.2.0 labels Jan 31, 2017

jbaiera closed this as completed Mar 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Unicode Index and Type Names #857

Support Unicode Index and Type Names #857

jbaiera commented Sep 27, 2016

rtrujill007 commented Jan 10, 2017

Support Unicode Index and Type Names #857

Support Unicode Index and Type Names #857

Comments

jbaiera commented Sep 27, 2016

rtrujill007 commented Jan 10, 2017