New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Es-Hadoop ingestion through Pig is missing the mappings #405
Comments
From Costin: @BalachanderGS This is an es-hadoop issue and should have been posted under elastic/elasticsearch-hadoop issue tracker. The read/write operation succeed so likely the issue is that the data hits an ES cluster but not the one you are expecting. In the past we've seen issue with DEFINE command which might ignore the given configuration - in other words, while you do EsStorage to point to 'es.nodes = 68.67.141.63 Pig might disregard that and use the default parameter, namely the localhost.
I'm closing the issue for now - if you still have issues, please open a new one in the es-hadoop project. Thanks, |
I think it is going to the write IP as I am able to see the indexing through Marvel (on the desired cluster). We are on Pig 0.11 and CDH 4.4. Do you think there is some version issue here ? |
The versions should work. Pig is somewhat old but still supported. Again there are no errors in the logs and clearly there is an indication of the number of records. Consider turning on logging to see the network traffic and what ES nodes are hit. |
Thanks. |
Neither. Turn in on in the es-hadoop connector. |
Thanks. I use the JAR that has the EsStorage class to write from Pig code. Is there a config file that should drop on the JAR location that overwrites the default Log level? |
When in doubt, see the reference manual: On 3/30/15 8:34 PM, BalachanderGS wrote:
Costin |
Closing the issue; if the problem persists please create a new one. |
ES:
curl -XDELETE 'http://localhost:9200/ztmp_inventory_tool_sample'
curl -XPOST localhost:9200/ztmp_inventory_tool_sample -d '{
"settings" : {
"term_index_interval" : 256,
"term_index_divisor" : 5
},
"mappings" : {
"invData" : {
"_source" : { "enabled" : true },
"properties" : {
"ekv_raw" : { "type" : "byte" },
"ekv_flight" : { "type" : "byte" },
"event_id" : { "type" : "long" },
"cookie_id" : { "type" : "long" },
"dpId" : { "type" : "short" },
"vertical" : { "type" : "string" },
"activity_group" : { "type" : "string" },
"activity" : { "type" : "string" },
"eventDateTime" : {"type":"date", "format":"YYYY-MM-dd'"'T'"'HH:mm:ss.SSSZ"},
"departureDate" : { "type" : "date", "format":"YYYY-MM-dd", "ignore_malformed" : true},
"returnDate" : { "type" : "string" },
"origin" : { "type" : "string" },
"destination" : { "type" : "string" },
"destination_country_code" : { "type" : "string" },
"destination_state" : { "type" : "string" },
"destination_city" : { "type" : "string" },
"carrier" : { "type" : "string" },
"cabinClassGroup" : { "type" : "string" },
"currency" : { "type" : "string" },
"travelers" : { "type" : "short" },
"duration" : { "type" : "short" },
"bookedDate" : { "type" : "date", "format":"YYYY-MM-dd", "ignore_malformed" : true},
"airFare" : { "type" : "float" }}
}}}'
REGISTER elasticsearch-hadoop-2.1.0.Beta3/dist/elasticsearch-hadoop-2.1.0.Beta3.jar
DEFINE EsStorage org.elasticsearch.hadoop.pig.EsStorage('es.http.timeout=1m',
'es.resource=ztmp_inventory_tool_sample/invData',
'es.mapping.pig.tuple.use.field.names = true',
'es.http.timeout = 5s',
'es.index.auto.create = false',
'es.nodes = 68.67.141.63',
'es.port = 9200');
testEs = LOAD '/user/bganapathy/flightPKeys' USING PigStorage('|') AS (ekv_raw: int, ekv_flight: int, eventId: long, cookie_id: long, dpId: int, vertical: chararray, activity_group: chararray, activity: chararray, eventDateTime: chararray, departureDate: chararray, returnDate: chararray, origin: chararray, destination: chararray, destination_country_code: chararray, destination_state: chararray, destination_city: chararray, carrier: chararray, cabinClassGroup: chararray, currency: chararray, travelers: chararray, duration: chararray, bookedDate: chararray, airFare: chararray);
testEs = LIMIT testEs 100000;
STORE testEs INTO 'ztmp_inventory_tool_sample/invData' USING EsStorage();
Logs:
2015-03-24 22:10:34,544 [JobControl] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2015-03-24 22:10:34,755 [JobControl] INFO org.elasticsearch.hadoop.mr.EsOutputFormat - Writing to [ztmp_inventory_tool_sample/invData]
2015-03-24 22:10:34,774 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-03-24 22:10:34,774 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-03-24 22:10:34,776 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2015-03-24 22:10:35,971 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201408291723_27582
2015-03-24 22:10:35,971 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases testEs
2015-03-24 22:10:35,971 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: C: R: testEs[-1,-1]
2015-03-24 22:10:35,971 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://hnn101-lax1:50030/jobdetails.jsp?jobid=job_201408291723_27582
2015-03-24 22:10:47,051 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 75% complete
2015-03-24 22:10:59,619 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 92% complete
2015-03-24 22:11:15,720 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-03-24 22:11:15,721 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.4.0 0.11.0-cdh4.4.0 bganapathy 2015-03-24 22:09:46 2015-03-24 22:11:15 LIMIT
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias FeatureOutputs
job_201408291723_27581 12 1 8 5 7 7 15 15 15 15 testEs
job_201408291723_27582 1 1 5 5 5 5 17 17 17 17 testEs ztmp_inventory_tool_sample/invData,
Input(s):
Successfully read 1200000 records (19469816 bytes) from: "/user/bganapathy/flightPKeys"
Output(s):
Successfully stored 100000 records in: "ztmp_inventory_tool_sample/invData"
The text was updated successfully, but these errors were encountered: