Dates with formats starting with a leading '+' sign before the year part are getting parsed improperly #86

Closed
hshankar opened this Issue Nov 13, 2013 · 7 comments

Comments

Projects
None yet
3 participants

I stumbled upon an issue where dates with leading '+' signs in front of year part were getting parsed improperly. I have described the issue in stackoverflow (http://stackoverflow.com/q/19910018/373151). I'll paste the code again here:

System.out.println(DateTimeFormat.forPattern("yyyyMMdd").parseDateTime("20130101"));
// 2013-01-01T00:00:00.000+05:30 (Expected) (case 1)

System.out.println(DateTimeFormat.forPattern("yyyyMMdd").parseDateTime("+20130101"));
// 20130-10-01T00:00:00.000+05:30 (??? Notice that month changed to 10 also) (case 2)

System.out.println(DateTimeFormat.forPattern("MMyyyydd").parseDateTime("01+201301"));
// 20130-01-01T00:00:00.000+05:30 (??? At least month is fine this time) (case 3)

System.out.println(DateTimeFormat.forPattern("MM-yyyy-dd").parseDateTime("01-+2013-01"));
// 2013-01-01T00:00:00.000+05:30 (I expected an error, but this parsed correctly) (case 4)

The year is getting parsed as 20130 instead of 2013 in cases 2 and 3.

I looked through the code to see what might be happening. I think this is happening because of the position++ in the DateTimeFormatterBuilder class. I have forked the repo and tried to fix it with this commit: hshankar/joda-time@4d72559

If you think that is the right approach to fix this, I would love to submit this fix as a pull request. I have verified that all existing test cases pass successfully after this fix. Let me know if I should add any test cases. I have a small test class which I have used to test the fix:

public class Test {

    public static void main(String[] args) {
        test(1, "yyyyMMdd", "+20130101");
        test(2, "MMyyyydd", "01+201301");
        test(3, "MM-yyyy-dd", "01-+2013-01");
        test(4, "yyyydd", "+201321");
    }

    private static void test(int caseNo, String format, String input) {
        System.out.println((caseNo + ". " + "(" + format + ", " + input + ") => " + DateTimeFormat
                .forPattern(format).parseDateTime(input)));
    }
}

Here are the outputs:
Before the fix:

1. (yyyyMMdd, +20130101) => 20130-10-01T00:00:00.000+05:30
2. (MMyyyydd, 01+2013010) => 20130-01-10T00:00:00.000+05:30
3. (MM-yyyy-dd, 01-+2013-01) => 2013-01-01T00:00:00.000+05:30
4. (yyyydd, +201321) => 20132-01-01T00:00:00.000+05:30

After the fix:

1. (yyyyMMdd, +20130101) => 2013-01-01T00:00:00.000+05:30
2. (MMyyyydd, 01+201301) => 2013-01-01T00:00:00.000+05:30
3. (MM-yyyy-dd, 01-+2013-01) => 2013-01-01T00:00:00.000+05:30
4. (yyyydd, +201321) => 2013-01-21T00:00:00.000+05:30

I can add these tests to the proper test class if you can point me to the right place to add them!

You should probably add tests as JUnit test cases in https://github.com/JodaOrg/joda-time/blob/master/src/test/java/org/joda/time/format/TestDateTimeFormatterBuilder.java

You should add four test methods for each of these cases.

It would appear your approach is to detect the presence of a "+" and skip it. Personally, this would not match my expectations and lead to my own surprised results: namely, I specify a specific format (e.g. "yyyyMMdd"), then inputs that are "like my format" (e.g. "+20130101") are accepted. I would expect any deviation from the prescribed format to cause an exception. Though that does make the existing behavior strange.

Owner

jodastephen commented Nov 18, 2013

There was a change to parseInt at one point to either accept or reject '+', I can't remember which. Its possible that this is to do that.

Whether this can be adopted depends on whether it breaks other assumptions users of Joda-Time may have made. The correct use of '+' is only when the year exceeds 4 digits.

Even I did not find the acceptance of + as very intuitive. If we say that +2013 is a valid year, then we should be accepting +11 as a valid month as well. I guess it is because years can be negative as well, so signed months don't make sense in that way. But this is how it has been, so it would break code if we stop accepting it.

I did verify that parseInt would break if we keep the + in its input. If that was the only reason to increment position, then I think this should fix it without any side effects. How can we check that?

There is definitely a bug, because +20130101 gets interpreted as year 20130, month 10. It is happening because for the year part, position is incremented, but the length remains 4. So, the characters from position 1 to 1+4 (20130) get interpreted as the year. Then, the next two characters (10) get picked up as the month instead of 01.

Owner

jodastephen commented Nov 26, 2013

The ISO standard only uses + when outputting 5 or more characters. That is almost certainly why it parses 5 characters on input.

Was this never fixed because this isn't considered a bug? I notice this works fine:

/**
     * https://github.com/JodaOrg/joda-time/issues/86
     */
    public void correctly_parses_plus_in_front_of_year() {
        // Given: a date that includes + per the spec of allowing it for 5 digit years
        //final String pattern = "yyyyMMdd";
        //final String dateTime = "+20130109";
        final String dateTime = "+2013-01-09";
        // When: I try to parse this date
        //final DateTime result = DateTimeFormat.forPattern(pattern).parseDateTime(dateTime);
        final DateTime result = ISODateTimeFormat.dateTimeParser().parseDateTime(dateTime);
        // Then: I get back the correct year
        assertEquals(2013, result.getYear());
        // And: I get back the correct month
        assertEquals(1, result.getMonthOfYear());
        // And: I get back the correct day
        assertEquals(9, result.getDayOfMonth());
    }

It does seem strange to me to explicitly specify a pattern like "yyyyMMdd" and then expect ISO8601 dates to be accepted.
Then again, that is specifically the pattern of http://joda-time.sourceforge.net/apidocs/org/joda/time/format/ISODateTimeFormat.html#basicDate%28%29

Why someone would manually specify that as a pattern instead of using ISODateTimeFormat is strange to me. I suppose an argument could be made that an input pattern could be checked against a list of known patterns for given formats and made to be tolerant of the idiosyncrasies of that format. If that were the case, I would expect cPatternCache to contain some known mappings instead of starting out empty. That also seems to open up complexity in trying to keep that mapping up-to-date.

Alternately, for semantic clarity, I would expect the ISO definition of a year (with the strange +/- Y business https://en.wikipedia.org/wiki/ISO_8601#Years) to only be applied in that case. A pattern for a given input format would be applied strictly, in which case any non-numeric characters in a string expected to conform to "yyyyMMdd" would trigger an IllegalArgumentException.

What is correct behavior here?

Owner

jodastephen commented Oct 24, 2015

Change made based on Hari Shankar's code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment