You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wrote a map/reduce code to save json objects to elasticsearch. I set up "es.mapping.exclude" in the run() function configuration. However, it didn't work as expected. In the code example below, the "indexname" field still in the output index.
Is this a bug, or there is an error in my code? Please help me. Thank you!
Steps to reproduce
Code:
public static class EsHadoopMapper extends Mapper<Object, Text, NullWritable, Text> {
private JSONObject jo = new JSONObject();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
jo = new JSONObject(value.toString());
jo.put("indexname", "abcde");
context.write(NullWritable.get(), new Text(jo.toString()));
}
}
@SuppressWarnings("deprecation")
public int run(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration config = new Configuration();
config.setBoolean("mapreduce.map.speculative", false);
config.set("es.nodes", args[1]);
config.set("es.resource.write", "{indexname}/type1");
config.set("es.input.json", "yes");
config.set("es.mapping.exclude", "indexname");
Job job = new Job(config, "EsHadoop");
job.setJarByClass(EsHadoopWriteTest.class);
job.setMapperClass(EsHadoopMapper.class);
job.setMapOutputKeyClass(NullWritable.class);
job.setMapOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(EsOutputFormat.class);
job.setNumReduceTasks(0);
FileSystem fs = FileSystem.get(config);
if(fs.exists(new Path(args[0]))) {
FileInputFormat.addInputPath(job, new Path(args[0]));
}
return (job.waitForCompletion(true) ? 0 : 1);
}
For cases where the job input data is already in JSON, elasticsearch-hadoop allows direct indexing without >applying any transformation; the data is taken as is and sent directly to Elasticsearch. In such cases, one >needs to indicate the json input by setting the es.input.json parameter.
Basically exclusion/inclusion of fields works only on native types not on JSON data which is passed through as-is. Instead of constructing the JSON, pass it to ES which will do that in your place and allow data manipulation.
What kind an issue is this?
Issue description
I wrote a map/reduce code to save json objects to elasticsearch. I set up "es.mapping.exclude" in the run() function configuration. However, it didn't work as expected. In the code example below, the "indexname" field still in the output index.
Is this a bug, or there is an error in my code? Please help me. Thank you!
Steps to reproduce
Code:
public static class EsHadoopMapper extends Mapper<Object, Text, NullWritable, Text> {
Version Info
OS: : Ubuntu 12.04
JVM : 1.7
Hadoop/Spark: Hadoop-2.6
ES-Hadoop : 2.2
ES : 2.2
The text was updated successfully, but these errors were encountered: