Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to insert data into ES from Hive - Cannot determine write shards #467

Closed
pricecarl opened this issue Jun 5, 2015 · 25 comments
Closed

Comments

@pricecarl
Copy link

Hello,

I'm having a problem inserting data from Hive into ES using ES Hadoop and I cant for the life of my figure out what is going wrong.

I've created a simple Hive table in a database called dev:

CREATE external TABLE IF NOT EXISTS dev.carltest (
  TimeTaken                      string
)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties (
"separatorChar" = ","
)   
LOCATION '/user/cloudera/carltest'

When I run a select query on this table it pulls back the single vale absolutely fine. I've also created another simple Hive table in a database called elasticsearchtables:

CREATE EXTERNAL TABLE IF NOT EXISTS elasticsearchtables.carltest (
TimeTaken string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES ('es.resource' = 'proxylog-carltest/event',
               'es.index.auto.create' = 'true',
               'es.nodes' = <servername here>,
               'es.port' = '9200',
               'es.field.read.empty.as.null' ='true',
               'es.mapping.names' = 'TimeTaken:TimeTaken'
)

When I run the following Hive query:

INSERT OVERWRITE TABLE elasticsearchtables.carltest
SELECT TimeTaken
FROM dev.carltest

I get a map reduce error and looking in the hive log it states:
Task with the most failures(4):

Diagnostic Messages for this Task:

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"timetaken":"1"}
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"timetaken":"1"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
    ... 8 more
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot determine write shards for [proxylog-carltest/event]; likely its format is incorrect (maybe it contains illegal characters?)
    at org.elasticsearch.hadoop.util.Assert.isTrue(Assert.java:50)
    at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:427)
    at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:392)
    at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:173)
    at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:58)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:638)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
    at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:519)
    ... 9 more

My ES cluster is running 1.5.2 and i'm using the latest beta release of ES-Hadoop and I'm running CDH 5.1.3

Any help with this would be greatly appreciated.

@costin
Copy link
Member

costin commented Jun 5, 2015

Edited your post to add formatting

The exception indicates the cluster for some reason behaves like it's read-only. Can you please turn on logging and report back the results as a gist?

P.S. Please do add formatting to your posts - it's minimally intrusive and significantly increases the readability of its content as you can see above. Thanks.

@pricecarl
Copy link
Author

Apologies Costin, new to this.

I'm not sure how to turn on logging, I've tried adding log4j.category.org.elasticsearch.hadoop.hive=DEBUG to Hive's log4j.properties file but the output in the log is the same as before:
https://gist.github.com/pricecarl/519d7162d2e23b368688

I don't think the cluster is in read only as if I send in the data over rest it will create the index and insert the data no problem.

@costin
Copy link
Member

costin commented Jun 5, 2015

No worries.

Hive tends to be painful to configure especially since there are multiple config files. I'm not sure whether you are trying to run things across a cluster or in a pseudo-distributed config (everything is on the same machine); have you looked at their wiki.
Additionally, the CDH console might offer you some options as well.

The cluster might not be read-only but something is clearly off as the index is not created before hand and it should be. By the way, can you drop the table and create it each time - just in case the table definition already exists with invalid/incorrect configuration...

@pricecarl
Copy link
Author

Hi Costin,

Ive just dropped the table as suggested and created a new one:

-- Drop External Table as a new one needs to be created for every Index to be loaded.
DROP TABLE IF EXISTS elasticsearchtables.carltest;


-- External table inserting in to ES proxylogtest index/type, format = proxy-<yyyy-MM-dd>/event
CREATE EXTERNAL TABLE IF NOT EXISTS elasticsearchtables.carltest (
  TimeTaken string
  )
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES ('es.resource' = 'carltest/event',
               'es.index.auto.create' = 'true',
               'es.nodes' = 'calhdb09.cyber.lab',
               'es.port' = '9200',
               'es.field.read.empty.as.null' ='true',
               'es.mapping.names' = 'TimeTaken:TimeTaken'

              )

Notice that I changed 'es.resource' = 'proxylog-carltest/event' to be 'es.resource' = 'carltest/event' and this has worked.

But ES index naming does support hyphens as we have several indices with them in??

@costin
Copy link
Member

costin commented Jun 5, 2015

Seems like a bug - probably the HTTP encoding is messed by the hyphen.

By the way, you don't need to set es.index.. - this is true by default. Same goes for es.port - if you have multiple tables, consider extracting the configuration into the HiveConf object - this way instead of repeating the common settings, you can just define it onc.e

@costin
Copy link
Member

costin commented Jun 5, 2015

@pricecarl Hi,

I've tried reproducing your error to no avail. I have a similar table definition (with es.index declared and es.resource pointing to proxylog-carltest/event. The table is created without any issues - I've tried this on multiple Hive versions including the latest 1.2.0.

I'm not sure what actual Hive version is in CDH 5.3 (Hive 0.13 + patches doesn't say much) however based on the logs there's no escaping of the hyphen at all.
In other words, I don't think the index name has anything to do with the initial error - does it appear if you recreate the table definition and point to a index/type with a hyphen?

@pricecarl
Copy link
Author

Hi Costin,

I've just tried it with carl-test/event as the index name/ type and it works fine.
My colleague just sent me this:

‐ 8208 2010 HYPHEN
‑ 8209 2011 NON-BREAKING HYPHEN
‒ 8210 2012 FIGURE DASH
– 8211 2013 – EN DASH
— 8212 2014 — EM DASH
― 8213 2015 HORIZONTAL BAR

Taken from:
http://www.w3schools.com/charsets/ref_utf_punctuation.asp

So I could of picked up one of these some how. I'm testing now and will report back.

@pricecarl
Copy link
Author

It seems that none of these work.
I've tried creating multiple tables copying those different hyphens of the page and pasting them into the hive query window of Hue but all fail.

it only works if you actually type the hyphen.

@pricecarl
Copy link
Author

Hi Costin,

I've been investigating this further and it seems that changing the name to 'carltest/event' from 'proxylog-carltest/event' only worked because the latter didn't have it's ES template applied to it.

I have a template:

{
"template": "proxy-*",
"settings": {
    "index.number_of_replicas": "1",
    "index.number_of_shards": "3"
},
"mappings": {
    "event": {
        "properties": {
            "@ImportDate": {
                "Type": "date",
                "format": "dateOptionalTime"
            },
            "@StartTime": {
                "Type": "date",
                "format": "dateOptionalTime"
            }
        }
    }
},
"aliases": {
    "proxylog": {}
}
}

Which would have been applied to 'proxylog-carltest/event', so does this mean Es-Hadoop cannot work with templates, or can you see where I am going wrong?

@costin
Copy link
Member

costin commented Jun 8, 2015

Thanks for the update. I don't see why it shouldn't work since the template only takes care of the index creation.
I'll test this myself to see whether something pops up - again, having some logs would have made things easier but I understand it's tricky to get Hive to play along.

@costin
Copy link
Member

costin commented Jun 8, 2015

I've tried replicating the test again this time using a template (Elasticsearch 1.5.2, latest master in es-hadoop though it shouldn't matter) and have the following logs (notice the test creates the template right before the index):

00:54:01,518 TRACE main commonshttp.CommonsHttpTransport - Opening HTTP transport to localhost:9500
00:54:01,527 TRACE main commonshttp.CommonsHttpTransport - Tx [PUT]@[localhost:9500][_template/default-test_template] w/ payload [
{"template" : "default-spark-template-*",
"settings" : {
    "number_of_shards" : 1,
    "number_of_replicas" : 0
},
"aliases" : { "spark-temp-index" : {} }
}]
00:54:01,619 TRACE main commonshttp.CommonsHttpTransport - Rx @[127.0.0.1] [200-OK] [{"acknowledged":true}]
00:54:01,622 TRACE main commonshttp.CommonsHttpTransport - Closing HTTP transport to localhost:9500

followed by:

00:54:02,408 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Opening HTTP transport to localhost:9500
00:54:02,410 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Tx [GET]@[localhost:9500][_nodes/transport] w/ payload [null]
00:54:02,415 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Rx @[127.0.0.1] [200-OK] [{"cluster_name":"ES-HADOOP-TEST","nodes":{"_Y874nQhQuOzLo0w2KbWnw":{"name":"Trump","transport_address":"inet[/192.168.1.50:9600]","host":"cerberus","ip":"192.168.1.50","version":"1.5.2","build":"62ff986","http_address":"inet[/192.168.1.50:9500]","attributes":{"local":"false"},"transport":{"bound_address":"inet[/0:0:0:0:0:0:0:0:9600]","publish_address":"inet[/192.168.1.50:9600]","profiles":{}}}}}]
00:54:02,433 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Closing HTTP transport to localhost:9500
00:54:02,434 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Opening HTTP transport to localhost:9500
00:54:02,434 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Tx [GET]@[localhost:9500][] w/ payload [null]
00:54:02,437 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Rx @[127.0.0.1] [200-OK] [{
  "status" : 200,
  "name" : "Trump",
  "cluster_name" : "ES-HADOOP-TEST",
  "version" : {
    "number" : "1.5.2",
    "build_hash" : "62ff9868b4c8a0c45860bebb259e21980778ab1c",
    "build_timestamp" : "2015-04-27T09:21:06Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}
]
00:54:02,438 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Closing HTTP transport to localhost:9500
00:54:02,440  INFO Executor task launch worker-0 util.Version - Elasticsearch Hadoop v2.1.0.BUILD-SNAPSHOT [675f46f055]
00:54:02,440  INFO Executor task launch worker-0 rdd.EsRDDWriter - Writing to [default-spark-template-index/alias]
00:54:02,445 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Opening HTTP transport to localhost:9500
00:54:02,445 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Tx [PUT]@[localhost:9500][default-spark-template-index] w/ payload [null]
00:54:02,617  INFO elasticsearch[Trump][clusterService#updateTask][T#1] cluster.metadata - [Trump] [default-spark-template-index] creating index, cause [api], templates [default-test_template], shards [1]/[0], mappings []
00:54:02,725 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Rx @[127.0.0.1] [200-OK] [{"acknowledged":true}]
00:54:02,727 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Tx [GET]@[localhost:9500][/_cluster/health/default-spark-template-index?wait_for_status=yellow&timeout=10s] w/ payload [null]
00:54:02,802 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Rx @[127.0.0.1] [200-OK] [{"cluster_name":"ES-HADOOP-TEST","status":"green","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":1,"active_shards":1,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"number_of_pending_tasks":1}]
00:54:02,803 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Tx [GET]@[localhost:9500][default-spark-template-index/_search_shards] w/ payload [null]
00:54:02,807 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Rx @[127.0.0.1] [200-OK] [{"nodes":{"_Y874nQhQuOzLo0w2KbWnw":{"name":"Trump","transport_address":"inet[/192.168.1.50:9600]","attributes":{"local":"false"}}},"shards":[[{"state":"STARTED","primary":true,"node":"_Y874nQhQuOzLo0w2KbWnw","relocating_node":null,"shard":0,"index":"default-spark-template-index"}]]}]
00:54:02,808 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Tx [GET]@[localhost:9500][_nodes/http] w/ payload [null]
00:54:02,810 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Rx @[127.0.0.1] [200-OK] [{"cluster_name":"ES-HADOOP-TEST","nodes":{"_Y874nQhQuOzLo0w2KbWnw":{"name":"Trump","transport_address":"inet[/192.168.1.50:9600]","host":"cerberus","ip":"192.168.1.50","version":"1.5.2","build":"62ff986","http_address":"inet[/192.168.1.50:9500]","attributes":{"local":"false"},"http":{"bound_address":"inet[/0:0:0:0:0:0:0:0:9500]","publish_address":"inet[/192.168.1.50:9500]","max_content_length_in_bytes":104857600}}}}]
00:54:02,812 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Closing HTTP transport to localhost:9500
00:54:02,813 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Opening HTTP transport to 192.168.1.50:9500
00:54:02,822 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Tx [PUT]@[192.168.1.50:9500][default-spark-template-index/alias/_bulk] w/ payload [{"index":{}}
{"number":"1","name":"MALICE MIZER","url":"http://www.last.fm/music/MALICE+MIZER","picture":"http://userserve-ak.last.fm/serve/252/10808.jpg","@timestamp":"2000-10-06T19:20:25.000Z","list":["quick", "brown", "fox"]}

@costin
Copy link
Member

costin commented Jun 8, 2015

I can't imagine why a template would make things different - it's the same as if the index was defined before hand. Unless you probably have some type of routing that makes the index be allocated on a node that the connector cannot access?
Again, the logs would be great here - I know I keep repeating myself but they really would.

Cheers,

@pricecarl
Copy link
Author

Can you try adding a mapping to your test template and try this test again.

It works fine for me too if I don't include the mappings section.

@costin
Copy link
Member

costin commented Jun 10, 2015

That was it!
Managed to reproduce it - here is the log (notice the problem appears only if the template matches and the mapping takes effect):

12:59:26,083 TRACE main commonshttp.CommonsHttpTransport - Tx [PUT]@[localhost:9500][_template/with-meta-test_template] w/ payload [
{"template" : "*",
"settings" : {
    "number_of_shards" : 1,
    "number_of_replicas" : 0
},
"mappings" : {
  "alias" : {
    "properties" : {
      "name" : "string",
      "number" : "long",
      "@ImportDate" : {
        "type" : "date"
       }
     }
   }
 },
"aliases" : { "spark-temp-index" : {} }
}]
12:59:25,859 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Tx [GET]@[localhost:9500][] w/ payload [null]
12:59:25,862 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Rx @[127.0.0.1] [200-OK] [{
  "status" : 200,
  "name" : "Grim Reaper",
  "cluster_name" : "ES-HADOOP-TEST",
  "version" : {
    "number" : "1.5.2",
    "build_hash" : "62ff9868b4c8a0c45860bebb259e21980778ab1c",
    "build_timestamp" : "2015-04-27T09:21:06Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}
]
12:59:25,863 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Closing HTTP transport to localhost:9500
12:59:25,865  INFO Executor task launch worker-0 util.Version - Elasticsearch Hadoop v2.1.0.BUILD-SNAPSHOT [2a72f1b1c8]
12:59:25,865  INFO Executor task launch worker-0 rdd.EsRDDWriter - Writing to [default-spark-template-index/alias]
12:59:25,869 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Opening HTTP transport to localhost:9500
12:59:25,870 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Tx [PUT]@[localhost:9500][default-spark-template-index] w/ payload [null]
12:59:26,051 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Rx @[127.0.0.1] [400-Bad Request] [{"error":"MapperParsingException[mapping [alias]]; nested: MapperParsingException[Expected map for property [fields] on field [name] but got a class java.lang.String]; ","status":400}]
12:59:26,051 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Tx [GET]@[localhost:9500][default-spark-template-index/_search_shards] w/ payload [null]
12:59:26,055 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Rx @[127.0.0.1] [200-OK] [{"nodes":{},"shards":[]}]
12:59:26,056 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Tx [GET]@[localhost:9500][_nodes/http] w/ payload [null]
12:59:26,059 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Rx @[127.0.0.1] [200-OK] [{"cluster_name":"ES-HADOOP-TEST","nodes":{"Vg5gPh3mSVWOJIiFoanVpA":{"name":"Grim Reaper","transport_address":"inet[/192.168.1.50:9600]","host":"cerberus","ip":"192.168.1.50","version":"1.5.2","build":"62ff986","http_address":"inet[/192.168.1.50:9500]","attributes":{"local":"false"},"http":{"bound_address":"inet[/0:0:0:0:0:0:0:0:9500]","publish_address":"inet[/192.168.1.50:9500]","max_content_length_in_bytes":104857600}}}}]
12:59:26,060 TRACE Executor task launch worker-0 commonshttp.CommonsHttpTransport - Closing HTTP transport to localhost:9500
12:59:26,063 ERROR Executor task launch worker-0 executor.Executor - Exception in task 0.0 in stage 1.0 (TID 1)
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot determine write shards for [default-spark-template-index/alias]; likely its format is incorrect (maybe it contains illegal characters?)

Need to look at what's wrong but it seems there's an error from Elasticsearch when the index is touched (due to the mapping) that is swallowed by the connector which then continues however as there's no index, there are no shards allocated for it.

Thanks for your patience - I'll try to fix this as soon as possible and report back.

@pricecarl
Copy link
Author

Awesome, Thank you Costin.

@costin
Copy link
Member

costin commented Jun 10, 2015

@pricecarl Took a while to figure out why the exception occurred and it was due to the mapping which was incorrect - the exception is not caused by the data but rather by the mapping definition which is about to be applied but fails.
Note this is fixed in 2.x where the mapping itself is validated at template creation
Once I fixed the mapping things started working normally - however I suspect an error occurs in your case as well but it is swallowed somehow.
I've fixed this in master and pushed a build - can you please try the dev build and report back?

Thanks,

@pricecarl
Copy link
Author

Hi Costin,

We have the latest build we are currently testing it out and will get back you ASAP.

@pricecarl
Copy link
Author

Hi Costin,

Apologies I've taken a while to get back you on this. Now I have a different error:

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"timetaken":"1"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"timetaken":"1"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
... 8 more
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalStateException: RemoteTransportException[[calhdb10-es][inet[/192.168.24.171:9300]][indices:admin/create]]; nested: MapperParsingException[mapping [event]]; nested: MapperParsingException[No type specified for property [@ImportDate]]; 
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:382)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:438)
at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:404)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:392)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:173)
at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:58)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:638)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:519)
... 9 more

I thought this might be because I was only trying to insert the timetaken fields and it was expecting the dates as well but got the same error:

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive     Runtime Error while processing row {"timetaken":"1","importdate":"2015-06-18","starttime":"2015-06-18"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"timetaken":"1","importdate":"2015-06-18","starttime":"2015-06-18"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
... 8 more
Caused by: org.elasticsearch.hadoop.EsHadoopIllegalStateException: RemoteTransportException    [[calhdb10-es][inet[/192.168.24.171:9300]][indices:admin/create]]; nested: MapperParsingException[mapping [event]]; nested: MapperParsingException[No type specified for property [@ImportDate]]; 
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:382)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:438)
at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:404)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:392)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:173)
at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:58)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:638)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:519)
... 9 more

Any ideas?

@costin
Copy link
Member

costin commented Jun 18, 2015

Again, it all boils down to your mapping. If you look closely, you'll notice the root exception is actually triggered from Elasticsearch - your field doesn't provide a proper mapping.
the connector is just a client - it accesses Elasticsearch from Hive, it tries to touch/ create the index, the index template applies at which point, the template inconsistency (namely the field specified without the mapping) is triggered.

Again, template validation (including mappings) is something scheduled for Elasticsearch 2.0. In the meantime you can verify them directly by creating a dummy index and adding some sample data to it (command line or whatever REST tools you might want to use).

@pricecarl
Copy link
Author

Sorry Costin, I don't understand. I have created a mapping which matches the fieldnames coming in exactly, so I don't see how the fields don't know their mapping?

You said "it tries to touch create the index" - is there a way I can get it to "touch" it with the mapping sent at the same time?

@costin
Copy link
Member

costin commented Jun 18, 2015

@pricecarl Maybe my answer was not clear enough.
The exception above is triggered by Elasticsearch based on applying the mapping within the template that matches the index which is created when the connector tries to insert data into it.

Forget about the connector and Hive for a moment; just use Elasticsearch directly either through curl or Marvel or whatever tool you are comfortable with. Using the same configuration for the index as above, try to add data in JSON format as you expect it to be in Hive and see what happens. Then replicate it into Hive and see whether there's a difference.

Without any logging in Hive (to understand what is going on behind the scenes and in your setup) and with the exception changing, isolating the problem at Elasticsearch level is a sure way forward.

P.S. This issue would be a suitable discussion on the forum as it might help others bumping into the same problem in the future.

@pricecarl
Copy link
Author

Hi Costin

I've tried as you suggested and it's the same error. Whilst doing this though I noticed that in the template I have the word type in the mapping part with a capital T. I have changed this now and its working fine.

@costin
Copy link
Member

costin commented Jun 19, 2015

So ... does this mean the error is gone? Can the issue be closed or not
(and if so what's the exception)?

Thanks,

On Fri, Jun 19, 2015 at 1:01 PM, pricecarl notifications@github.com wrote:

Hi Costin

I've tried as you suggested and it's the same error. Whilst doing this
though I noticed that in the template I have the word type in the mapping
part with a capital T. I have changed this now and its working fine.


Reply to this email directly or view it on GitHub
#467 (comment)
.

@pricecarl
Copy link
Author

Yes it can be closed. Thank you very much for your help. :)

@costin
Copy link
Member

costin commented Jun 19, 2015

Glad to hear the situation has been resolved. Cheers!

On Fri, Jun 19, 2015 at 4:33 PM, pricecarl notifications@github.com wrote:

Yes it can be closed. Thank you very much for your help. :)


Reply to this email directly or view it on GitHub
#467 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants