New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameter ES_MAPPING_TIMESTAMP is not working using saveToEs #765
Comments
Same behavior on ES-Hadoop : 2.3.2 |
@armaseg it looks like you are reporting two bugs here:
See below the test case (will soon be in master): @Test
def testEsRDDWriteWithMappingTimestamp() {
val mapping = """{ "scala-timestamp-write": {
| "_timestamp" : {
| "enabled":true
| }
|}
}""".stripMargin
val index = "spark-test"
val target = s"$index/scala-timestamp-write"
RestUtils.touch(index)
RestUtils.putMapping(target, mapping.getBytes(StringUtils.UTF_8))
val doc1 = Map("one" -> null, "two" -> Set("2"), "number" -> 1, "date" -> "2016-05-18T16:39:39.317Z")
val doc2 = Map("OTP" -> "Otopeni", "SFO" -> "San Fran", "number" -> 2, "date" -> "2016-03-18T10:11:28.123Z")
sc.makeRDD(Seq(doc1, doc2)).saveToEs(target, Map(ES_MAPPING_ID -> "number", ES_MAPPING_TIMESTAMP -> "date", ES_MAPPING_EXCLUDE -> "date"))
assertEquals(2, EsSpark.esRDD(sc, target).count());
assertTrue(RestUtils.exists(target + "/1"))
assertTrue(RestUtils.exists(target + "/2"))
val search = RestUtils.get(target + "/_search?")
assertThat(search, containsString("SFO"))
assertThat(search, not(containsString("date")))
assertThat(search, containsString("_timestamp"))
} and the http logs:
Note that unless Last but not least, the usage of timestamp is deprecated in favour of a dedicated, user defined, date field. |
Hi @costin , Thank you very much for all the time you spent working on the tests. About your questions:
{
"name" : "Flygirl",
"cluster_name" : "cluster_name",
"version" : {
"number" : "2.3.1",
"build_hash" : "xxx",
"build_timestamp" : "2016-04-04T12:25:05Z",
"build_snapshot" : false,
"lucene_version" : "5.5.0"
},
"tagline" : "You Know, for Search"
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "xxx-2016.03.18",
"_type": "log",
"_id": "yyy-123",
"_score": 1,
"_source": {
"@timestamp": "2016-03-18T16:39:39.317Z",
"@version": "1",
"foo": "bar",
"topic": "xxx",
"timenow": "2016-05-18T16:39:49.258Z",
"calendardate": "2016-03-18T10:11:28.123Z"
}
}
]
}
}
sparkConf.set(ConfigurationOptions.ES_INDEX_AUTO_CREATE, "true");
sparkConf.set(ConfigurationOptions.ES_NODES_WAN_ONLY, "true");
sparkConf.set(ConfigurationOptions.ES_INPUT_JSON, "false");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
JavaRDD<String> stringRDD = jsc.parallelize(ImmutableList.of(JSON));
JavaEsSpark.saveJsonToEs(stringRDD, ImmutableMap.of(
ConfigurationOptions.ES_RESOURCE,"{topic}-{calendardate:YYYY.MM.dd}/log"
ConfigurationOptions.ES_MAPPING_TIMESTAMP, "calendardate",
ConfigurationOptions.ES_MAPPING_EXCLUDE, "id",
ConfigurationOptions.ES_MAPPING_ID,"id"));
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "xxx-2016.03.18",
"_type": "log",
"_id": "yyy-123",
"_score": 1,
"_source": {
"@timestamp": "2016-03-18T16:39:39.317Z",
"@version": "1",
"foo": "bar",
"topic": "xxx",
"id":"yyy-123",
"timenow": "2016-05-18T16:39:49.258Z",
"calendardate": "2016-03-18T10:11:28.123Z"
}
}
]
}
} I'm using othe field to handle the date, but I wanted to report this that looks like a weird behavior to me. Did you try ES_MAPPING_EXCLUDE using saveJsonToEs without creating the mapping before? Thanks! |
I think there's a misunderstanding on how timestamp works. Everything inside {
"_index" : "spark-test",
"_type" : "scala-timestamp-write",
"_id" : "2",
"_score" : null,
"_timestamp" : 1458295888123,
"_source" : {
"OTP" : "Otopeni",
"SFO" : "San Fran",
"number" : 2
},
"sort" : [0]
} Basically a field in the data in Spark was used to extract the metadata field and excluded from the source before indexing it.
Hopefully this clarifies things.. |
Indeed, I got confused about the logstash @timestamp and the ES _timestamp. Thank you for clarifying these. You may set this bug as Invalid. Thank you again! |
closed. cheers. |
Hello,
I'm trying to save a document on Elasticsearch using Scala and the parameter ES_MAPPING_TIMESTAMP isn't working: I can see my field in my final document on ES but not as @timestamp. Nevertheless, all the other parameters are working.
My final document looks like this :
On the other hand, on Java using JavaEsSpark.saveJsonToEs the parameter ES_MAPPING_TIMESTAMP works, but not the ConfigurationOptions.ES_MAPPING_EXCLUDE ( related to #381)
Version Info
OS: : Windows 7 64 bits/ CentOS
JVM : JDK 1.8
Hadoop/Spark: HDP 2.4 / 1.6.0
ES-Hadoop : 2.3.1
ES : 2.3.1
The text was updated successfully, but these errors were encountered: