Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch Mapreduce write "es.mapping.exclude" not working #752

Closed
lilimzeal opened this issue Apr 20, 2016 · 1 comment
Closed

Elasticsearch Mapreduce write "es.mapping.exclude" not working #752

lilimzeal opened this issue Apr 20, 2016 · 1 comment

Comments

@lilimzeal
Copy link

lilimzeal commented Apr 20, 2016

What kind an issue is this?

  • [ x] Bug report.

Issue description

I wrote a map/reduce code to save json objects to elasticsearch. I set up "es.mapping.exclude" in the run() function configuration. However, it didn't work as expected. In the code example below, the "indexname" field still in the output index.

Is this a bug, or there is an error in my code? Please help me. Thank you!

Steps to reproduce

Code:

public static class EsHadoopMapper extends Mapper<Object, Text, NullWritable, Text> {

    private JSONObject jo = new JSONObject();
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        jo = new JSONObject(value.toString());
        jo.put("indexname", "abcde");
        context.write(NullWritable.get(), new Text(jo.toString()));
    }
}

@SuppressWarnings("deprecation")
public int run(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
    Configuration config = new Configuration();
    config.setBoolean("mapreduce.map.speculative", false);
    config.set("es.nodes", args[1]);
    config.set("es.resource.write", "{indexname}/type1");
    config.set("es.input.json", "yes");
    config.set("es.mapping.exclude", "indexname");

    Job job = new Job(config, "EsHadoop");      
    job.setJarByClass(EsHadoopWriteTest.class);
    job.setMapperClass(EsHadoopMapper.class);
    job.setMapOutputKeyClass(NullWritable.class);
    job.setMapOutputValueClass(Text.class);     
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(EsOutputFormat.class);     
    job.setNumReduceTasks(0);       
    FileSystem fs = FileSystem.get(config);
    if(fs.exists(new Path(args[0]))) {
        FileInputFormat.addInputPath(job, new Path(args[0]));
    }

    return (job.waitForCompletion(true) ? 0 : 1);
}

Version Info

OS: : Ubuntu 12.04
JVM : 1.7
Hadoop/Spark: Hadoop-2.6
ES-Hadoop : 2.2
ES : 2.2

@costin
Copy link
Member

costin commented Apr 20, 2016

To quote the docs:

For cases where the job input data is already in JSON, elasticsearch-hadoop allows direct indexing without >applying any transformation; the data is taken as is and sent directly to Elasticsearch. In such cases, one >needs to indicate the json input by setting the es.input.json parameter.

Basically exclusion/inclusion of fields works only on native types not on JSON data which is passed through as-is. Instead of constructing the JSON, pass it to ES which will do that in your place and allow data manipulation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants