Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object field named 'properties' breaks parsing of ES mapping #809

Closed
1 of 2 tasks
andregarcia opened this issue Jul 20, 2016 · 0 comments
Closed
1 of 2 tasks

Object field named 'properties' breaks parsing of ES mapping #809

andregarcia opened this issue Jul 20, 2016 · 0 comments

Comments

@andregarcia
Copy link
Contributor

What kind an issue is this?

  • Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
    The easier it is to track down the bug, the faster it is solved.
  • Feature Request. Start by telling us what problem you’re trying to solve.
    Often a solution already exists! Don’t send pull requests to implement new features without
    first getting our support. Sometimes we leave features out on purpose to keep the project small.

Issue description

Parser of ES type mapping does not work properly when you have a field of type object named 'properties' in it. I checked the code and verified that the parser is unable to distinguish between a field of type object named 'properties' and the 'properties' key used by ES to define object sub-fields.

Steps to reproduce

Code:

create ES index, mapping and document:

curl -XPOST 'localhost:9200/sample_index/sample_type/5123' -d '{"name":"value0","properties":{"x":"value1","y":"value2"},"title":"value3"}'

query using elasticsearch-hadoop (python code)

from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName('app').setMaster('local')
sc = SparkContext(conf=conf)
es_hadoop_conf = {
        'es.nodes' : 'localhost',
        'es.port' : '9200',
        'es.resource' : 'sample_index/sample_type'
}
rdd = sc.newAPIHadoopRDD(
        inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
        keyClass="org.apache.hadoop.io.NullWritable",
        valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
        conf=es_hadoop_conf
)
print rdd.collect()[0]

output shows empty document:

(u'5123', {})

Strack trace:

no exceptions, but DEBUG log indicates that although mapping was retrieved correctly, parsing was not done correctly

16/07/20 05:00:13 DEBUG header: >> "GET /sample_index/sample_type/_mapping HTTP/1.1[\r][\n]"
16/07/20 05:00:13 DEBUG HttpMethodBase: Adding Host request header
16/07/20 05:00:13 DEBUG header: >> "User-Agent: Jakarta Commons-HttpClient/3.1[\r][\n]"
16/07/20 05:00:13 DEBUG header: >> "Host: 127.0.0.1:9200[\r][\n]"
16/07/20 05:00:13 DEBUG header: >> "[\r][\n]"
16/07/20 05:00:13 DEBUG header: << "HTTP/1.1 200 OK[\r][\n]"
16/07/20 05:00:13 DEBUG header: << "HTTP/1.1 200 OK[\r][\n]"
16/07/20 05:00:13 DEBUG header: << "Content-Type: application/json; charset=UTF-8[\r][\n]"
16/07/20 05:00:13 DEBUG header: << "Content-Length: 187[\r][\n]"
16/07/20 05:00:13 DEBUG header: << "[\r][\n]"
16/07/20 05:00:13 DEBUG content: << "{"sample_index":{"mappings":{"sample_type":{"properties":{"name":{"type":"string"},"properties":{"properties":{"x":{"type":"string"},"y":{"type":"string"}}},"title":{"type":"string"}}}}}}"
16/07/20 05:00:13 INFO EsInputFormat: Discovered mapping {sample_type=[x=STRING, y=STRING]} for [sample_index/sample_type]

Version Info

OS: : Ubuntu 14.04 LTS
JVM : java version "1.8.0_91"
Hadoop/Spark: spark-1.6.2-bin-hadoop2.6
ES-Hadoop : elasticsearch-hadoop-5.0.0-alpha4
ES : elasticsearch-2.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants