Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revised testing architecture with yaml data #199

Open
wants to merge 59 commits into
base: v3.0
Choose a base branch
from

Conversation

idpaterson
Copy link
Collaborator

@idpaterson idpaterson commented Sep 23, 2016

As originally discussed in #196, this pull request begins the conversion from unittest2 to py.test. py.test was already used to run the tests, but as an alternative to unittest2 it provides an opportunity for intriguing code-level instrumentation to make tests easier to write and more comprehensive.

The following tests have been ported so far:

  • TestAlternativeAbbreviations - this is a weird one since the test actually overrides the locale to use different abbreviations, so I may not have understood why this exists and exactly what was being tested vs. the normal abbreviations in the locale
  • TestAustralianLocale
  • TestComplexDateTimes
  • TestContext - note that pdtContext will be tested on almost every other test since it is included in the assertions and specified in the test data
  • TestConvertUnitsAsWords - completed as an example of targeting numbers rather than dates or deltas
  • TestDayStartHour
  • TestDelta
  • TestErrors - these will be implemented throughout as test groups with the invalid_ prefix
  • TestFrenchLocale
  • TestGermanLocale - there are a few tests in this PR but only to make sure the load tests from every locale logic was working
  • TestInc
  • TestLocaleBase - oops this file is actually testing French
  • TestMultiple
  • TestNlp
  • TestPhrases
  • TestRanges
  • TestRussianLocale
  • TestSimpleDateTimes
  • TestSimpleOffsets
  • TestSimpleOffsetsHours
  • TestSimpleOffsetsNoon
  • TestStartTimeFromSourceTime
  • TestUnits

Additional tests needed for the test utilities:

  • @pdtFixture
  • datedelta
  • dateReplacement
  • nlpTarget
  • nlpTargetValue
  • TestGroup
  • TestCase
  • YAML constructors

Scope of this PR

This pull request can be merged once tests have been rewritten to cover all of the current test cases. At that time there will still be a lot of work to do to improve and add tests especially in non-English locales; merging to v3 will allow pull requests for community contributions.

There will be no changes to the functionality of parsedatetime in this pull request other than instrumentation necessary for testing. For example, a "wildcard" flag was added to pdtContext to support tests that do not specify an explicit context.

Additional testing support

Testing against "today" and edge case dates

Most tests in pdt 2 were based on the current date as sourceTime rather than a predetermined time. That can be good and bad – I think it made the tests more confusing to reason about but it also helped to catch edge cases related to leap years, end of year, end of month, etc. I'm looking at you, #155! 99% of the time, "today" is just a normal time on a normal day and there is no guarantee that CI would happen to run on Feb 29. If it does and an error comes up, it's already too late to push out a fix.

For tests where the sourceTime is truly arbitrary (absolute dates and anything that can be expressed with deltas come to mind) it would be interesting to parametrize the function with a few edge case sourceTimes. Run it against Feb 29, Dec 31, Jan 1, today, etc so that those cases are caught intentionally rather than accidentally.

Testing NLP may require a list of targets

Some changes will need to be made to allow multiple target dates in test data.

Notes

This is a collaborative pull request

Any parsedatetime maintainers can commit on this PR and everyone is welcome to the discussion. I am also going to continue working on the test cases.

The old tests have been deleted from this branch

I deleted the old tests to avoid further housekeeping later; anyone who would graciously contribute to this pull request will need to use an alternate copy of the old tests for comparison.

The new test names and file structure does not necessarily match the old tests

Some previously separate test classes will be combined. For example, the cases for TestMultiple and TestDelta are both in deltas.yaml because the "multiples" were just multi-unit deltas. I used comments to separate them into sections within the data file.

Documentation

I will start a new issue for this but since the first bit of code is now public I wanted to express my preference to move away from epydoc. I have always found the HTML documentation very difficult to use... the content is well-written but the format is cumbersome and outdated. Mike, are you open to hosting documentation on ReadTheDocs or as a GitHub page via the gh_pages branch so that updates can go out automatically?

The test modules in this PR are documented for sphinx with Google style docstrings. The ReadTheDocs theme for sphinx makes for (in my opinion) a better presentation and a simpler, less syntactically-verbose code style. Some of the documentation that I have currently written in the Python docstrings would be better managed in separate rst files, but for now they should be helpful to anyone contributing to the tests. The implementation of that will arrive in a different pull request.


This change is Reviewable

@idpaterson idpaterson added this to the 3.0 milestone Sep 23, 2016
@idpaterson idpaterson self-assigned this Sep 23, 2016
@codecov-io
Copy link

codecov-io commented Sep 23, 2016

Codecov Report

Merging #199 into v3.0 will decrease coverage by -50.26%.
The diff coverage is 70%.

@@             Coverage Diff             @@
##             v3.0     #199       +/-   ##
===========================================
- Coverage   78.04%   27.79%   -50.26%     
===========================================
  Files          14       14               
  Lines        1567     1576        +9     
  Branches      288      291        +3     
===========================================
- Hits         1223      438      -785     
- Misses        252     1120      +868     
+ Partials       92       18       -74
Impacted Files Coverage Δ
parsedatetime/context.py 56% <70%> (-25.82%)
parsedatetime/pdt_locales/icu.py 10.58% <0%> (-72.95%)
parsedatetime/init.py 18.26% <0%> (-56.7%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b4fd8e...4283265. Read the comment docs.

@bear
Copy link
Owner

bear commented Sep 23, 2016

Do you want me to review the changes as you make them or wait until you have the majority of them made?

@idpaterson
Copy link
Collaborator Author

That's up to you, I'm going to work on the cases that require additional support in the pytest setup first then port the remaining normal cases.

@idpaterson
Copy link
Collaborator Author

idpaterson commented Sep 23, 2016

How does this look for an nlp test?

long_phrases:
    sourceTime: 2013-08-01 21:25:00
    cases:
        - target: !nlpTarget
            - phrase: "At 8PM on August 5th"
              target: 2013-08-05 20:00:00
              context: !pdtContext month | day | hour
            - phrase: "next Friday at 9PM"
              target: 2013-08-09 21:00:00
              context: !pdtContext day | hour
            - phrase: "in 5 minutes"
              target: 2013-08-01 21:30:00
              context: !pdtContext minute
            - phrase: "next week"
              target: 2013-08-08 09:00:00
              context: !pdtContext week
          phrases:
            - >
                I'm so excited!! At 8PM on August 5th i'm going to fly to 
                Florida. Then next Friday at 9PM i'm going to Dog n Bone! 
                And in 5 minutes I'm going to eat some food! Talk to you 
                next week.
@pdtFixture('multiple_dates.yml')
def test_long_phrases(cal, phrase, sourceTime, nlpTarget):
    assert cal.nlp(phrase, sourceTime) == nlpTarget

The target normalization adds the startIndex and endIndex for each phrase automatically based on substrings, so unless startIndex and endIndex are specified in the data file we just have to avoid having a string appear twice in the text.

Sample failures

Wrong context

cal = <parsedatetime.Calendar object at 0x109962790>
phrase = "I'm so excited!! At 8PM on August 5th i'm going to fly to  Florida. Then next Friday at 9PM i'm going to Dog n Bone!  And in 5 minutes I'm going to eat some food! Talk to you  next week."
sourceTime = datetime.datetime(2013, 8, 1, 21, 25)
nlpTarget = ((datetime.datetime(2013, 8, 5, 20, 0), pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY), 17, 37, 'At 8P...in 5 minutes'), (datetime.datetime(2013, 8, 8, 9, 0), pdtContext(accuracy=pdtContext.ACU_WEEK), 176, 185, 'next week'))

    @pdtFixture('multiple_dates.yml')
    def test_long_phrases(cal, phrase, sourceTime, nlpTarget):
>       assert cal.nlp(phrase, sourceTime) == nlpTarget
E       assert ((datetime.da... 'next week')) == ((datetime.dat... 'next week'))
E         At index 0 diff: (datetime.datetime(2013, 8, 5, 20, 0), pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY | pdtContext.ACU_HOUR), 17, 37, 'At 8PM on August 5th') != (datetime.datetime(2013, 8, 5, 20, 0), pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY), 17, 37, 'At 8PM on August 5th')
E         Full diff:
E         ((datetime.datetime(2013, 8, 5, 20, 0),
E         -   pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY | pdtContext.ACU_HOUR),
E         ?                                                                ----------------------
E         +   pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY),
E         17,
E         37,
E         'At 8PM on August 5th'),
E         Detailed information truncated (15 more lines), use "-vv" to show

Missing date in test target

cal = <parsedatetime.Calendar object at 0x1109b3790>
phrase = "I'm so excited!! At 8PM on August 5th i'm going to fly to  Florida. Then next Friday at 9PM i'm going to Dog n Bone!  And in 5 minutes I'm going to eat some food! Talk to you  next week."
sourceTime = datetime.datetime(2013, 8, 1, 21, 25)
nlpTarget = ((datetime.datetime(2013, 8, 5, 20, 0), pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY | pdtContext.ACU...riday at 9PM'), (datetime.datetime(2013, 8, 8, 9, 0), pdtContext(accuracy=pdtContext.ACU_WEEK), 176, 185, 'next week'))

    @pdtFixture('multiple_dates.yml')
    def test_long_phrases(cal, phrase, sourceTime, nlpTarget):
>       assert cal.nlp(phrase, sourceTime) == nlpTarget
E       assert ((datetime.da... 'next week')) == ((datetime.dat... 'next week'))
E         At index 2 diff: (datetime.datetime(2013, 8, 1, 21, 30), pdtContext(accuracy=pdtContext.ACU_MIN), 122, 134, 'in 5 minutes') != (datetime.datetime(2013, 8, 8, 9, 0), pdtContext(accuracy=pdtContext.ACU_WEEK), 176, 185, 'next week')
E         Left contains more items, first extra item: (datetime.datetime(2013, 8, 8, 9, 0), pdtContext(accuracy=pdtContext.ACU_WEEK), 176, 185, 'next week')
E         Full diff:
E         ((datetime.datetime(2013, 8, 5, 20, 0),
E         pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY | pdtContext.ACU_HOUR),
E         17,
E         37,
E         'At 8PM on August 5th'),
E         (datetime.datetime(2013, 8, 9, 21, 0),
E         pdtContext(accuracy=pdtContext.ACU_DAY | pdtContext.ACU_HOUR),
E         73,
E         91,
E         'next Friday at 9PM'),
E         -  (datetime.datetime(2013, 8, 1, 21, 30),
E         -   pdtContext(accuracy=pdtContext.ACU_MIN),
E         -   122,
E         -   134,
E         -   'in 5 minutes'),
E         (datetime.datetime(2013, 8, 8, 9, 0),
E         pdtContext(accuracy=pdtContext.ACU_WEEK),
E         176,
E         185,
E         'next week'))

@bear
Copy link
Owner

bear commented Sep 23, 2016

I like it - it reads as easily as the others and doesn't require any repeated items. The "avoid having a string appear twice" constraint seems reasonable for a test suite

@idpaterson
Copy link
Collaborator Author

Yeah and it's only a constraint of convenience. There will need to be a test that includes multiples of the same string because that is an important thing to test, but that test will just need to specify startIndex and endIndex in the YAML.

@idpaterson idpaterson force-pushed the pull-requests/pytest-yaml branch from c54c460 to e36d937 Compare September 27, 2016 12:38
@idpaterson
Copy link
Collaborator Author

nlp tests are now implemented. Some of the existing nlp tests were specific to nlp (e.g. multiple phrases in a string) and some were generic date and times.

A new tests.data.nlpTarget class wraps data from the test and supports equality testing with nlp responses. It provides a consistent representation for tests with both normal and nlp targets (i.e. target: 2016-01-01 00:00:00 vs target: !nlpTarget ...) and allows some flexibility for the python to modify the test data. For example, the following test updates the sourcePhrase after wrapping the original phrase in quotes:

@pytest.mark.parametrize('prefix,suffix', (('"', '"'), ("'", "'"), ('(', ')')))
@pdtFixture('simple_datetimes.yml', ['times', 'invalid_times', 'dates',
                                     'invalid_dates'])
def test_simple_datetimes_wrapped(cal, phrase, sourceTime, nlpTarget, prefix,
                                  suffix):
    sourcePhrase = u'%s%s%s' % (prefix, phrase, suffix)
    nlpTarget.sourcePhrase = sourcePhrase
    assert cal.nlp(sourcePhrase, sourceTime) == nlpTarget

This allows the nlpTarget to calculate the proper start and end index for the phrase.

The test data can also specify startIndex but the documentation warns that this should only be done when required due to the string appearing more than once:

- target: !nlpTarget
    - phrase: "today"
      context: !pdtContext day
      startIndex: 5
      target: 2013-08-01 09:00:00
    - phrase: "today"
      context: !pdtContext day
      startIndex: 26
      target: 2013-08-01 09:00:00
  phrases: 
    - "Yep, today was as good as today could be"

If everything specified a startIndex it would be more of a pain to write a phrase modifying test like test_simple_datetimes_wrapped above since the startIndex would not be calculated and therefore would not match when the phrase is modified.

I ran into one case that failed (not in the original nlp tests, I pulled in some parse test cases). Since this pull request is strictly not going to fix anything in code I noted the failure with a FIXME:

def test_deltas(cal, phrase, sourceTime, nlpTarget):
    # FIXME: these tests fail
    if phrase in ('1855336.424 minutes ago',):
        return
    assert cal.nlp(phrase, sourceTime) == nlpTarget

All of these test utilities will need unit tests as well so I added them to the list at the top of this PR.

I toyed around with a !replace constructor that would call datetime.replace() on the source time but couldn't find a compelling enough reason to keep it, preferring absolute datetimes for clarity. It made the test data too abstract and I wasn't able to use it together with a delta. For example, "4pm 2 days from now" is a combination of a replacement (current time to 16:00:00) and a delta (plus 2 days).

@idpaterson idpaterson force-pushed the pull-requests/pytest-yaml branch from e36d937 to c21aae9 Compare September 27, 2016 13:04
@idpaterson idpaterson force-pushed the pull-requests/pytest-yaml branch from c21aae9 to 01371cf Compare September 28, 2016 00:07
@idpaterson
Copy link
Collaborator Author

I need to apologize for the lack of updates recently. A small paid project has taken away the time that I was using to work on parsedatetime. The following is not yet complete enough to commit.

Most recently I have reorganized the test classes to more easily add support for deriving times relative to source times and testing anything that does not require an explicit start date against multiple edge case dates. In the last comment I mentioned an attempt to represent a replacement of date components, I ended up with a succinct syntax for that:

day_suffixes:
  cases:
    - target: !replace 2008-08-22 xx:xx:xx
      phrases:
        - "August 22nd, 2008"

This test group has no sourceTime which means that it will be parametrized against each of the edge case dates (Feb 29 leap year, Feb 28 non leap year, end of year pre-1970). I like with this syntax how the replacement makes it clear that the time component will match that of the source time. I was trying to use obvious source times in the original tests like 01:02:03 to show that (because you could easily miss it if all the times are something like 12:00:00), but this is clearer.

There are some slightly weird consequences of testing against multiple dates when you're dealing with partial dates, for example:

dates:
  sourceTime: !replace xxxx-01-xx xx:xx:xx
  cases:
    - target: !replace xxxx-08-25 xx:xx:xx
      context: !pdtContext month | day
      phrases:
        - "8/25"

In this case if the edge case source times were to fall after August 25 it would map to the next year, so the replacement is used to pin the source time to January. The alternate behavior could then be tested with a source time in October where the date will map to the following year:

dates_next_year:
  sourceTime: !replace xxxx-10-xx xx:xx:xx
  cases:
    - target: !datedelta
        sourceTime: !replace xxxx-08-25 xx:xx:xx
        years: 1
      context: !pdtContext month | day
      phrases:
        - "8/25"

Still plugging along albeit quite slowly now.

@bear
Copy link
Owner

bear commented Oct 25, 2016

@idpaterson firstly no one working on an OSS project ever needs to apologize for doing paid work first - that's all part of what we do :)

Second, even if it wasn't paid, never worry about taking time to do something - we all work on this code because we love it and it solves an itch, but real life sometimes intrudes.

thanks for the update and for keeping the project moving forward!

@idpaterson
Copy link
Collaborator Author

Another bug was exposed by the new tests today where the old tests just happened to be written in a way that worked. For these particular phrases which are not valid dates, parse returns the current datetime regardless of the sourceTime provided.

  • 01/0
  • 08/35
  • 18/35
calendar = <parsedatetime.Calendar object at 0x1078b6150>, phrase = '18/35', sourceTime = datetime.datetime(1945, 12, 31, 3, 4, 5), context = pdtContext()

    @pdtFixture('errors.yml')
    def test_out_of_range(calendar, phrase, sourceTime, context):
>       assert calendar.parse(phrase, sourceTime) == (sourceTime.timetuple(),
                                                      context)
E       assert (time.struct_... pdtContext()) == (time.struct_t... pdtContext())
E         At index 0 diff: time.struct_time(tm_year=2017, tm_mon=2, tm_mday=27, tm_hour=8, tm_min=42, tm_sec=47, tm_wday=0, tm_yday=58, tm_isdst=0) != time.struct_time(tm_year=1945, tm_mon=12, tm_mday=31, tm_hour=3, tm_min=4, tm_sec=5, tm_wday=0, tm_yday=365, tm_isdst=-1)
E         Full diff:
E         - (time.struct_time(tm_year=2017, tm_mon=2, tm_mday=27, tm_hour=8, tm_min=42, tm_sec=47, tm_wday=0, tm_yday=58, tm_isdst=0),
E         ?                           -- ^                    ^^          ^          -         ^^                      -           ^
E         + (time.struct_time(tm_year=1945, tm_mon=12, tm_mday=31, tm_hour=3, tm_min=4, tm_sec=5, tm_wday=0, tm_yday=365, tm_isdst=-1),
E         ?                            ^^^         +           ^^          ^                   ^                     ++            ^^
E         pdtContext())
E         Detailed information truncated (-2 more lines), use "-vv" to show

tests/test_errors.py:6: AssertionError

@idpaterson
Copy link
Collaborator Author

@bear, I noticed that the CI and coverage have not been running on recent commits for this pull request. Would you please check whether those are still operational, maybe I hit a limit on commits for this PR?

@idpaterson
Copy link
Collaborator Author

All of the original base/English language test cases are now implemented in the new format. There is still some work to do before this can be merged, so I would like to aim for the following plan:

  1. Double check that all old test cases are adequately covered
  2. Add comments in test cases to aid translation and highlight nuances (e.g. identify that a word was chosen because it contains the abbreviation of a month to avoid someone translating the word directly rather than selecting a different word containing a month in the target language)
  3. Add more test cases where appropriate since it is now easier to add more phrases and to see what is being tested
  4. Discuss and finalize the test data structure and write meta tests for the test utilities
  5. Merge this PR into v3.0
  6. Pull request for the new sphinx documentation introduced in the new test instrumentation and update existing docstrings
  7. Create issues for bugs that surfaced while working on this, most of which are marked with TODO or FIXME in the test data (there are about 7 or 8 things to fix).
  8. Create issues to ask for community involvement in updating each locale's test cases, link to docs for translation instructions

I don't foresee any new features making it into 3.0 unless we get pull requests for them since there are already a number of bugs to fix and based on past experience trying to improve the German locale I suspect that more issues will pop up while working through the locales.

@idpaterson
Copy link
Collaborator Author

The test library now has 100% test coverage though I could not at least at this time justify creating separate unit tests for every aspect of the test lib. Before the additional tests, the parsedatetime tests provided 81% coverage of data.py and 92% coverage of fixtures.py. I have not yet worked on 100% test coverage of parsedatetime itself.

I think the last main step before merging this to v3.0 is to add comments in the test case data to aid translation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants