Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate timestamp when path is null #4718

Closed
khoan opened this issue Jan 14, 2014 · 7 comments
Closed

generate timestamp when path is null #4718

khoan opened this issue Jan 14, 2014 · 7 comments

Comments

@khoan
Copy link

khoan commented Jan 14, 2014

Shouldn't timestamp be generated when value of path is null ?

Mapping definition:

curl -X PUT  http://localhost:9200/twitter/ -d '{
    "mappings": {
        "_default_": {
            "_timestamp" : {
                "enabled" : "yes",
                "store": "yes",
                "path" : "post_date"
            },
            "properties": {
                "message": {
                    "type": "string"
                }
            }
        }
    }
}'

Get error when:

curl -X PUT http://127.0.0.1:9200/twitter/tweet/123 -d '{
  message: "bam bam"
}'

=>  {"error":"ElasticSearchParseException[failed to parse doc to extract routing/timestamp]; nested: TimestampParsingException[failed to parse timestamp [null]]; ","status":400}

curl -X PUT http://127.0.0.1:9200/twitter/tweet/123 -d '{
  message: "bam bam",
  post_date: "2009-11-15T14:12:12Z"
}'

=> {"ok":true,"_index":"twitter","_type":"tweet","_id":"123","_version":1}
@clintongormley
Copy link

This should probably be an option that can be turned on, so that bad data isn't silently ignored.

@dadoonet
Copy link
Member

@clintongormley Where should we put that new option? Index settings? Mapping? Other?
Which name should we use: ignore_missing_timestamp or so?

@dadoonet
Copy link
Member

The more I think about it, the more I think we should define for _timestamp field a new option named default. default could be: now (by default) or a date which respect the format format or null.

I remember a use case on the mailing list where the user does not have a value for every document so I'd like to set it to 01/01/1970.

The new _timestamp field would look like this:

{
    "tweet" : {
        "_timestamp" : {
            "enabled" : true,
            "path" : "post_date",
            "format" : "YYYY-MM-dd",
            "default" : "1970-01-01"
        }
    }
}

Or

{
    "tweet" : {
        "_timestamp" : {
            "enabled" : true,
            "path" : "post_date",
            "format" : "YYYY-MM-dd",
            "default" : "now"
        }
    }
}

Or

{
    "tweet" : {
        "_timestamp" : {
            "enabled" : true,
            "path" : "post_date",
            "format" : "YYYY-MM-dd",
            "default" : null
        }
    }
}

WDYT?

@clintongormley
Copy link

Sounds reasonable to me

@dadoonet
Copy link
Member

PR Opened #7036.

dadoonet added a commit to dadoonet/elasticsearch that referenced this issue Jul 28, 2014
Index process fails when having `_timestamp` enabled and `path` option is set.
It fails with a `TimestampParsingException[failed to parse timestamp [null]]` message.

Reproduction:

```
DELETE test
PUT  test
{
    "mappings": {
        "test": {
            "_timestamp" : {
                "enabled" : "yes",
                "path" : "post_date"
            }
        }
    }
}
PUT test/test/1
{
  "foo": "bar"
}
```

You can define a default value for when timestamp is not provided
within the index request or in the `_source` document.

By default, the default value is `now` which means the date the document was processed by the indexing chain.

You can disable that default value by setting `default` to `null`. It means that `timestamp` is mandatory:

```
{
    "tweet" : {
        "_timestamp" : {
            "enabled" : true,
            "default" : null
        }
    }
}
```

If you don't provide any timestamp value, indexation will fail.

You can also set the default value to any date respecting timestamp format:

```
{
    "tweet" : {
        "_timestamp" : {
            "enabled" : true,
            "format" : "YYYY-MM-dd",
            "default" : "1970-01-01"
        }
    }
}
```

If you don't provide any timestamp value, indexation will fail.

Closes elastic#4718.
dadoonet added a commit that referenced this issue Jul 31, 2014
Index process fails when having `_timestamp` enabled and `path` option is set.
It fails with a `TimestampParsingException[failed to parse timestamp [null]]` message.

Reproduction:

```
DELETE test
PUT  test
{
    "mappings": {
        "test": {
            "_timestamp" : {
                "enabled" : "yes",
                "path" : "post_date"
            }
        }
    }
}
PUT test/test/1
{
  "foo": "bar"
}
```

You can define a default value for when timestamp is not provided
within the index request or in the `_source` document.

By default, the default value is `now` which means the date the document was processed by the indexing chain.

You can disable that default value by setting `default` to `null`. It means that `timestamp` is mandatory:

```
{
    "tweet" : {
        "_timestamp" : {
            "enabled" : true,
            "default" : null
        }
    }
}
```

If you don't provide any timestamp value, indexation will fail.

You can also set the default value to any date respecting timestamp format:

```
{
    "tweet" : {
        "_timestamp" : {
            "enabled" : true,
            "format" : "YYYY-MM-dd",
            "default" : "1970-01-01"
        }
    }
}
```

If you don't provide any timestamp value, indexation will fail.

Closes #4718.
Closes #7036.

(cherry picked from commit 85eb0ea)
dadoonet added a commit that referenced this issue Sep 8, 2014
Index process fails when having `_timestamp` enabled and `path` option is set.
It fails with a `TimestampParsingException[failed to parse timestamp [null]]` message.

Reproduction:

```
DELETE test
PUT  test
{
    "mappings": {
        "test": {
            "_timestamp" : {
                "enabled" : "yes",
                "path" : "post_date"
            }
        }
    }
}
PUT test/test/1
{
  "foo": "bar"
}
```

You can define a default value for when timestamp is not provided
within the index request or in the `_source` document.

By default, the default value is `now` which means the date the document was processed by the indexing chain.

You can disable that default value by setting `default` to `null`. It means that `timestamp` is mandatory:

```
{
    "tweet" : {
        "_timestamp" : {
            "enabled" : true,
            "default" : null
        }
    }
}
```

If you don't provide any timestamp value, indexation will fail.

You can also set the default value to any date respecting timestamp format:

```
{
    "tweet" : {
        "_timestamp" : {
            "enabled" : true,
            "format" : "YYYY-MM-dd",
            "default" : "1970-01-01"
        }
    }
}
```

If you don't provide any timestamp value, indexation will fail.

Closes #4718.
Closes #7036.
@alfasin
Copy link

alfasin commented May 1, 2015

Can you do:
"enabled" : "yes", ???

I thought that the only valid values of "enabled" are true/false...

@dadoonet
Copy link
Member

dadoonet commented May 1, 2015

Actually "enabled":"whateveryouwant" might be considered as true. But you're right, true is definitely better!
This boolean parsing might change in the future to be more strict.
More info http://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-core-types.html#boolean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants