Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More strict parsing of ISO dates #6227

Conversation

spinscale
Copy link
Contributor

If you are using the default date or the named identifiers of dates,
the current implementation was allowed to read a year with only one
digit. In order to make this more strict, this fixes a year to be at
least 4 digits.

Closes #6158

@spinscale
Copy link
Contributor Author

talked with @clintongormley - makes sense to not replace those dates, but create new ones with a 'strict' prefix (like strictDateOptionalTime), which forces the year to have at least 4 digits, month 2 and day 2 digits as well. And this will be the default for named dates and automatic recognition

@spinscale
Copy link
Contributor Author

@clintongormley updated the PR, maybe you can take another look

Changes:

  • Have a strict date format for each existing, where it made sense, which always requires you to have two digit months, days, weeks, hours, minutes and seconds and at least 4 digit years.
  • Made the strictOptionalDateTime the default
  • Added a couple of missing formats, which existed in the docs though (and vice versa)
  • Added a lot of tests to make sure the parsing is actually strict

If this is ok, I will need to update documentation, I will add the missing formats and their format (added this info in the test so far), and mention the strict ones. And see, if we can reuse some of the code in order to not duplicate the DateFormat class.

@clintongormley
Copy link

hi @spinscale

Almost there, but there are two default formats used for detection. one is now strict ,but the other is not: https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/index/mapper/object/RootObjectMapper.java#L45

This means that this illegal date is still mapped:

PUT /t/t/1
{
  "date": "5/11/10"
}

GET /_mapping

@spinscale
Copy link
Contributor Author

Updated the rootobjectmapper to have a more strict date. Talked with @clintongormley to possible have strict mapping for self formatted dates as well, like yyyy, from what I read so far, this might be tricky, but I need to dig deeper into joda for this.

@spinscale
Copy link
Contributor Author

@clintongormley ok, so having a parsing format like yyyy does not seem possible at the moment... are still good to go with the rest? If so, I would add documentation and maybe try to cut out some code if possible

@clintongormley
Copy link

LGTM

@clintongormley
Copy link

@spinscale any news on getting this one merged?

@clintongormley
Copy link

@brwe When this PR is in, then we can probably implement #1694 as well, as 1T will no longer be accepted as a date.

@clintongormley
Copy link

@spinscale giving this one back to you :)

@clintongormley clintongormley added the :Search/Mapping Index mappings, including merging and defining field types label Nov 11, 2014
@clintongormley
Copy link

Also see #5328

@spinscale
Copy link
Contributor Author

I dug a bit around in joda time and still do not see any other pattern than to copy the whole class and use the fixed date builders. Another approach would have been to take supplied date string and compare it with the date being created by the date formatter - if it doesnt match, then the formatting has been different. However this flaw has the problem, that we need to parse the date twice and it becomes really complex, when we use the || for allowing several dates in one field, as each format then needs to be tried out. This also has a performance penalty during index time.

@spinscale spinscale force-pushed the issue-6158-reject-wrongly-formatted-years-in-dates branch from 4c70305 to e852304 Compare May 22, 2015 11:55
.endObject().endObject().string();

String indexName = "test-" + randomAsciiOfLength(10).toLowerCase(Locale.ROOT);
IndexService index = createIndex(indexName, settingsBuilder().put(IndexMetaData.SETTING_VERSION_CREATED, Version.V_1_5_2_ID).build());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sucks we can do this, but it helps here :)

@spinscale spinscale force-pushed the issue-6158-reject-wrongly-formatted-years-in-dates branch from e852304 to ba2054d Compare June 25, 2015 11:41
@spinscale
Copy link
Contributor Author

incoporporated your review comments @bleskes

@@ -171,6 +244,38 @@ public static FormatDateTimeFormatter forPattern(String input, Locale locale) {
return new FormatDateTimeFormatter(input, formatter.withZone(DateTimeZone.UTC), locale);
}

public static FormatDateTimeFormatter getStrictStandardDateFormater() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: getStrictStandardDateFormat_t_er

@bleskes
Copy link
Contributor

bleskes commented Jun 29, 2015

I went through and left some comments. Looks nice. I think we also need to document this on the 2.0 migration doc. Also would love someone who is more at home with the mapping code to have a look /cc @rjernst

@@ -548,4 +554,13 @@ protected void doXContentBody(XContentBuilder builder, boolean includeDefaults,
}
}
}

private static FormatDateTimeFormatter getDateTimeFormatter(Settings indexSettings) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is cleaner to have inline? it is very short and not used anywhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. removed.

@rjernst
Copy link
Member

rjernst commented Jun 29, 2015

I left a couple comments on the mapping portion.

@spinscale spinscale force-pushed the issue-6158-reject-wrongly-formatted-years-in-dates branch from ba2054d to 392d8bd Compare June 30, 2015 12:19
if (Version.indexCreated(context.indexSettings()).before(Version.V_2_0_0)) {
boolean includesEpochFormatter = dateTimeFormatter.format().contains("epoch_");
if (fieldType().dateTimeFormatter().equals(Defaults.DATE_TIME_FORMATTER)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this have the same problem? If the user upgrades to 2.0, and tries to set their older index to use the new defaults, they will be forced back to non strict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right. A bug in the code didnt let my test fail accordingly... will rethink.
Thx for pointing out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a new commit with a test fixing this. I just check if the the parser actually parsed a new format and only in case if not, the defaults are applied

@spinscale spinscale force-pushed the issue-6158-reject-wrongly-formatted-years-in-dates branch from 392d8bd to d004f96 Compare July 1, 2015 12:37
@rjernst
Copy link
Member

rjernst commented Jul 2, 2015

LGTM

@spinscale
Copy link
Contributor Author

@bleskes are you good with the documentation changes?

@@ -49,6 +49,13 @@ first millisecond of the rounding scope. The semantics work as follows:
[[built-in]]
=== Built In Formats

Most of the above dates have a `strict` companion dates, which means, that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above -> bellow

@bleskes
Copy link
Contributor

bleskes commented Jul 2, 2015

left one minor comment. I think we also need an entry in migrate_2_0.asciidoc ?

@spinscale spinscale force-pushed the issue-6158-reject-wrongly-formatted-years-in-dates branch from d004f96 to f8faac5 Compare July 6, 2015 11:11
@spinscale
Copy link
Contributor Author

fixed the typo and added a small paragraph to the upgrade asciidoc

@bleskes
Copy link
Contributor

bleskes commented Jul 6, 2015

LGTM. Thx @spinscale

@spinscale spinscale force-pushed the issue-6158-reject-wrongly-formatted-years-in-dates branch from f8faac5 to 72f82a0 Compare July 7, 2015 07:34
If you are using the default date or the named identifiers of dates,
the current implementation was allowed to read a year with only one
digit. In order to make this more strict, this fixes a year to be at
least 4 digits. Same applies for month, day, hour, minute, seconds.

Also the new default is `strictDateOptionalTime` for indices created
with Elasticsearch 2.0 or newer.

In addition a couple of not exposed date formats have been exposed, as they
have been mentioned in the documentation.

Closes elastic#6158
@spinscale spinscale force-pushed the issue-6158-reject-wrongly-formatted-years-in-dates branch from 72f82a0 to b612cab Compare July 7, 2015 07:36
@spinscale spinscale merged commit b612cab into elastic:master Jul 7, 2015
@kevinkluge kevinkluge removed the review label Jul 7, 2015
@clintongormley clintongormley added :Search/Mapping Index mappings, including merging and defining field types and removed :Dates labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Mapping Index mappings, including merging and defining field types v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dynamic date detection accepting years < 4 digits
6 participants