Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping: Improve date handling #10971

Closed
spinscale opened this issue May 5, 2015 · 7 comments
Closed

Mapping: Improve date handling #10971

spinscale opened this issue May 5, 2015 · 7 comments
Labels
Meta :Search/Mapping Index mappings, including merging and defining field types

Comments

@spinscale
Copy link
Contributor

The current date mapping code treats unix timestamps differently from other date formats. We should unify this, even though this requires changing our defaults and requires the user to explicitely configure the unix timestamp usecase.

Today we parse dates as follows:

Mapped fields with a format (defaults to dateOptionalTime)

  • If number, treat as epoch ms
  • If string, try to parse with defined format(s)
  • If it fails and is purely numeric, treat as epoch ms
  • Else fail

Dynamic date detection

  • If string,
  • and contains at least two :, -, or /
  • and matches dynamic date formats (defaults to dateOptionalTime || yyyy/MM/dd HH:mm:ss || yyyy/MM/dd )
  • then date, else string

There are a few issues which can surprise users:

  • Joda dates are not strict, so "1/1/1" is detected as a date, and "1" would be interpreted as 0001-01-01 00:00:00
  • The distinction between numeric and string values is not always possible, eg query string params are always strings (_timestamp), a date in the query_string query is always a string, and even in the JSON body some languages can render a number as a string and vice versa
  • Dates such as 2015.01.01 (german) or 20150101T000000 (iso8601) can never be detected dynamically

Proposals

Make date parsing as unambiguous as possible. Where there is ambiguity, it is because the user chose ambiguous options (which we can warn about in the docs).

For indices created in 2.0:

For mapped date field:

  • only check the specified formats, which default to strictDateOptionalTime || epoch_ms
  • No distinction between numeric and string values for date fields - always parsed as strings (ie coerce from numeric)

For dynamic date detection:

  • only check string values (don't coerce numerics)
  • accept any formats except epoch_ms and epoch_seconds
  • mapping should add just the matching format (optionally append epoch_ms?)

For indices created before 2.0:

We need to keep bwc on older indices, so we follow the same rules as specified at the beginning of this comment

Query time

Typically users will always use the same format at index time - they don't mix epoch timestamps with formatted dates, which is why we should only parse the specified formats.

However, at query time it is quite possible that (eg) Kibana may query with epoch timestamps, even though the date field only accepts a formatted date. Today, in the range query we accept a format parameter which is used to parse dates at query time.

There are two options to deal with this situation:

  • Add a format parameter to the term, terms, query_string, and simple_query_string queries, and to the range aggregation
  • Add a special format for epoch timestamps which is always recognised, eg epoch_ms:123456789
@spinscale spinscale added v2.0.0-beta1 :Search/Mapping Index mappings, including merging and defining field types labels May 5, 2015
@clintongormley clintongormley added :Dates Meta and removed :Search/Mapping Index mappings, including merging and defining field types labels May 5, 2015
spinscale added a commit to spinscale/elasticsearch that referenced this issue Jun 3, 2015
This commit changes the date handling. First and foremost Elasticsearch
does not try to convert every date to a unix timestamp first and then
uses the configured date. This now allows for dates like `2015121212` to
be parsed correctly.

Instead it is now explicit by adding a `epoch_second` and `epoch_millis`
date format. This also means, that the default date format now is
`epoch_millis||dateOptionalTime` to remain backwards compatible.

Closes elastic#5328
Relates elastic#10971
spinscale added a commit to spinscale/elasticsearch that referenced this issue Jun 3, 2015
This commit changes the date handling. First and foremost Elasticsearch
does not try to convert every date to a unix timestamp first and then
uses the configured date. This now allows for dates like `2015121212` to
be parsed correctly.

Instead it is now explicit by adding a `epoch_second` and `epoch_millis`
date format. This also means, that the default date format now is
`epoch_millis||dateOptionalTime` to remain backwards compatible.

Closes elastic#5328
Relates elastic#10971
@simianhacker
Copy link
Member

Does this affect range queries as well? I just tried using Kibana 4 with ES 2.0 and I got the following error:

Jun 8, 2015 11:48 AM INFO ? ? Caused by: ElasticsearchParseException[failed to parse date field [1433788425964] with format [dateOptionalTime]]; nested: IllegalArgumentException[Invalid format: "1433788425964" is malformed at "5964"];

If I add "format": "epoch_millis" to the range query then everything works as expected.

@spinscale
Copy link
Contributor Author

@simianhacker yes it does. When creating a range query in kibana, do you always use a unix timestamp or just in this example?

I think it makes sense to build this BWC compatible.. will check it out

@rashidkpc
Copy link

We always use a Unix timestamp because it has historically been accepted regardless of format. While we know the format from the mapping, there are differences between our date formatting lib and Joda.

@kimchy
Copy link
Member

kimchy commented Jun 12, 2015

I like this plan, at query time, I am leaning towards supporting the special format of epoch_ms:12345, its simpler to integrate and makes Kibana life simpler to provide it (for example).

spinscale added a commit to spinscale/elasticsearch that referenced this issue Jun 25, 2015
In order to be backwards compatible, indices created before 2.x must support
indexing of a unix timestamp and its configured date format. Indices created
with 2.x must configure the `epoch_millis` date formatter in order to
support this.

Relates elastic#10971
@clintongormley
Copy link

Also see #14565 - Java8 prepends a plus sign to years > 9999

@jpountz
Copy link
Contributor

jpountz commented Aug 24, 2016

Is there anything that remains to be done before closing this issue? Most items look implemented, except the format option on query_string and simple_query_string, which does not feel useful since dates have epoch_millis as a parser by default?

@clintongormley
Copy link

@jpountz the reason for the special format was because dates are no longer required to understand epoch ms, so eg kibana (which always uses epoch ms) wouldn't work with a custom mapped field that doesn't specify epoch_ms. That said, I haven't heard any complaints about this so far, so I think we can close the issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta :Search/Mapping Index mappings, including merging and defining field types
Projects
None yet
Development

No branches or pull requests

6 participants