Elasticsearch Mapreduce write "es.mapping.exclude" not working #752

lilimzeal · 2016-04-20T17:53:05Z

What kind an issue is this?

[ x] Bug report.

Issue description

I wrote a map/reduce code to save json objects to elasticsearch. I set up "es.mapping.exclude" in the run() function configuration. However, it didn't work as expected. In the code example below, the "indexname" field still in the output index.

Is this a bug, or there is an error in my code? Please help me. Thank you!

Steps to reproduce

Code:

public static class EsHadoopMapper extends Mapper<Object, Text, NullWritable, Text> {

    private JSONObject jo = new JSONObject();
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        jo = new JSONObject(value.toString());
        jo.put("indexname", "abcde");
        context.write(NullWritable.get(), new Text(jo.toString()));
    }
}

@SuppressWarnings("deprecation")
public int run(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
    Configuration config = new Configuration();
    config.setBoolean("mapreduce.map.speculative", false);
    config.set("es.nodes", args[1]);
    config.set("es.resource.write", "{indexname}/type1");
    config.set("es.input.json", "yes");
    config.set("es.mapping.exclude", "indexname");

    Job job = new Job(config, "EsHadoop");      
    job.setJarByClass(EsHadoopWriteTest.class);
    job.setMapperClass(EsHadoopMapper.class);
    job.setMapOutputKeyClass(NullWritable.class);
    job.setMapOutputValueClass(Text.class);     
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(EsOutputFormat.class);     
    job.setNumReduceTasks(0);       
    FileSystem fs = FileSystem.get(config);
    if(fs.exists(new Path(args[0]))) {
        FileInputFormat.addInputPath(job, new Path(args[0]));
    }

    return (job.waitForCompletion(true) ? 0 : 1);
}

Version Info

OS: : Ubuntu 12.04
JVM : 1.7
Hadoop/Spark: Hadoop-2.6
ES-Hadoop : 2.2
ES : 2.2

The text was updated successfully, but these errors were encountered:

costin · 2016-04-20T19:37:11Z

To quote the docs:

For cases where the job input data is already in JSON, elasticsearch-hadoop allows direct indexing without >applying any transformation; the data is taken as is and sent directly to Elasticsearch. In such cases, one >needs to indicate the json input by setting the es.input.json parameter.

Basically exclusion/inclusion of fields works only on native types not on JSON data which is passed through as-is. Instead of constructing the JSON, pass it to ES which will do that in your place and allow data manipulation.

costin added invalid :MR v2.3.1 v5.0.0-alpha2 labels Apr 20, 2016

costin closed this as completed Apr 20, 2016

seang-es mentioned this issue May 20, 2016

[DOCS] : 2.3.1 release notes list many bugs as fixed without commits #770

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticsearch Mapreduce write "es.mapping.exclude" not working #752

Elasticsearch Mapreduce write "es.mapping.exclude" not working #752

lilimzeal commented Apr 20, 2016 •

edited

costin commented Apr 20, 2016

Elasticsearch Mapreduce write "es.mapping.exclude" not working #752

Elasticsearch Mapreduce write "es.mapping.exclude" not working #752

Comments

lilimzeal commented Apr 20, 2016 • edited

What kind an issue is this?

Issue description

Steps to reproduce

Version Info

costin commented Apr 20, 2016

lilimzeal commented Apr 20, 2016 •

edited