Revised testing architecture with yaml data #199

idpaterson · 2016-09-23T00:52:56Z

As originally discussed in #196, this pull request begins the conversion from unittest2 to py.test. py.test was already used to run the tests, but as an alternative to unittest2 it provides an opportunity for intriguing code-level instrumentation to make tests easier to write and more comprehensive.

The following tests have been ported so far:

Additional tests needed for the test utilities:

Scope of this PR

This pull request can be merged once tests have been rewritten to cover all of the current test cases. At that time there will still be a lot of work to do to improve and add tests especially in non-English locales; merging to v3 will allow pull requests for community contributions.

There will be no changes to the functionality of parsedatetime in this pull request other than instrumentation necessary for testing. For example, a "wildcard" flag was added to pdtContext to support tests that do not specify an explicit context.

Additional testing support

Testing against "today" and edge case dates

Most tests in pdt 2 were based on the current date as sourceTime rather than a predetermined time. That can be good and bad – I think it made the tests more confusing to reason about but it also helped to catch edge cases related to leap years, end of year, end of month, etc. I'm looking at you, #155! 99% of the time, "today" is just a normal time on a normal day and there is no guarantee that CI would happen to run on Feb 29. If it does and an error comes up, it's already too late to push out a fix.

For tests where the sourceTime is truly arbitrary (absolute dates and anything that can be expressed with deltas come to mind) it would be interesting to parametrize the function with a few edge case sourceTimes. Run it against Feb 29, Dec 31, Jan 1, today, etc so that those cases are caught intentionally rather than accidentally.

Testing NLP may require a list of targets

Some changes will need to be made to allow multiple target dates in test data.

Notes

This is a collaborative pull request

Any parsedatetime maintainers can commit on this PR and everyone is welcome to the discussion. I am also going to continue working on the test cases.

The old tests have been deleted from this branch

I deleted the old tests to avoid further housekeeping later; anyone who would graciously contribute to this pull request will need to use an alternate copy of the old tests for comparison.

The new test names and file structure does not necessarily match the old tests

Some previously separate test classes will be combined. For example, the cases for TestMultiple and TestDelta are both in deltas.yaml because the "multiples" were just multi-unit deltas. I used comments to separate them into sections within the data file.

Documentation

I will start a new issue for this but since the first bit of code is now public I wanted to express my preference to move away from epydoc. I have always found the HTML documentation very difficult to use... the content is well-written but the format is cumbersome and outdated. Mike, are you open to hosting documentation on ReadTheDocs or as a GitHub page via the gh_pages branch so that updates can go out automatically?

The test modules in this PR are documented for sphinx with Google style docstrings. The ReadTheDocs theme for sphinx makes for (in my opinion) a better presentation and a simpler, less syntactically-verbose code style. Some of the documentation that I have currently written in the Python docstrings would be better managed in separate rst files, but for now they should be helpful to anyone contributing to the tests. The implementation of that will arrive in a different pull request.

This change is

… of pdtContext

…ny other pdtContext

codecov-io · 2016-09-23T01:00:51Z

Codecov Report

Merging #199 into v3.0 will decrease coverage by -50.26%.
The diff coverage is 70%.

@@             Coverage Diff             @@
##             v3.0     #199       +/-   ##
===========================================
- Coverage   78.04%   27.79%   -50.26%     
===========================================
  Files          14       14               
  Lines        1567     1576        +9     
  Branches      288      291        +3     
===========================================
- Hits         1223      438      -785     
- Misses        252     1120      +868     
+ Partials       92       18       -74

Impacted Files	Coverage Δ
parsedatetime/context.py	`56% <70%> (-25.82%)`	❌
parsedatetime/pdt_locales/icu.py	`10.58% <0%> (-72.95%)`	❌
parsedatetime/init.py	`18.26% <0%> (-56.7%)`	❌

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b4fd8e...4283265. Read the comment docs.

bear · 2016-09-23T08:10:18Z

Do you want me to review the changes as you make them or wait until you have the majority of them made?

idpaterson · 2016-09-23T10:57:38Z

That's up to you, I'm going to work on the cases that require additional support in the pytest setup first then port the remaining normal cases.

idpaterson · 2016-09-23T12:58:20Z

How does this look for an nlp test?

long_phrases:
    sourceTime: 2013-08-01 21:25:00
    cases:
        - target: !nlpTarget
            - phrase: "At 8PM on August 5th"
              target: 2013-08-05 20:00:00
              context: !pdtContext month | day | hour
            - phrase: "next Friday at 9PM"
              target: 2013-08-09 21:00:00
              context: !pdtContext day | hour
            - phrase: "in 5 minutes"
              target: 2013-08-01 21:30:00
              context: !pdtContext minute
            - phrase: "next week"
              target: 2013-08-08 09:00:00
              context: !pdtContext week
          phrases:
            - >
                I'm so excited!! At 8PM on August 5th i'm going to fly to 
                Florida. Then next Friday at 9PM i'm going to Dog n Bone! 
                And in 5 minutes I'm going to eat some food! Talk to you 
                next week.

@pdtFixture('multiple_dates.yml')
def test_long_phrases(cal, phrase, sourceTime, nlpTarget):
    assert cal.nlp(phrase, sourceTime) == nlpTarget

The target normalization adds the startIndex and endIndex for each phrase automatically based on substrings, so unless startIndex and endIndex are specified in the data file we just have to avoid having a string appear twice in the text.

Sample failures

Wrong context

cal = <parsedatetime.Calendar object at 0x109962790>
phrase = "I'm so excited!! At 8PM on August 5th i'm going to fly to  Florida. Then next Friday at 9PM i'm going to Dog n Bone!  And in 5 minutes I'm going to eat some food! Talk to you  next week."
sourceTime = datetime.datetime(2013, 8, 1, 21, 25)
nlpTarget = ((datetime.datetime(2013, 8, 5, 20, 0), pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY), 17, 37, 'At 8P...in 5 minutes'), (datetime.datetime(2013, 8, 8, 9, 0), pdtContext(accuracy=pdtContext.ACU_WEEK), 176, 185, 'next week'))

    @pdtFixture('multiple_dates.yml')
    def test_long_phrases(cal, phrase, sourceTime, nlpTarget):
>       assert cal.nlp(phrase, sourceTime) == nlpTarget
E       assert ((datetime.da... 'next week')) == ((datetime.dat... 'next week'))
E         At index 0 diff: (datetime.datetime(2013, 8, 5, 20, 0), pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY | pdtContext.ACU_HOUR), 17, 37, 'At 8PM on August 5th') != (datetime.datetime(2013, 8, 5, 20, 0), pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY), 17, 37, 'At 8PM on August 5th')
E         Full diff:
E         ((datetime.datetime(2013, 8, 5, 20, 0),
E         -   pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY | pdtContext.ACU_HOUR),
E         ?                                                                ----------------------
E         +   pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY),
E         17,
E         37,
E         'At 8PM on August 5th'),
E         Detailed information truncated (15 more lines), use "-vv" to show

Missing date in test target

cal = <parsedatetime.Calendar object at 0x1109b3790>
phrase = "I'm so excited!! At 8PM on August 5th i'm going to fly to  Florida. Then next Friday at 9PM i'm going to Dog n Bone!  And in 5 minutes I'm going to eat some food! Talk to you  next week."
sourceTime = datetime.datetime(2013, 8, 1, 21, 25)
nlpTarget = ((datetime.datetime(2013, 8, 5, 20, 0), pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY | pdtContext.ACU...riday at 9PM'), (datetime.datetime(2013, 8, 8, 9, 0), pdtContext(accuracy=pdtContext.ACU_WEEK), 176, 185, 'next week'))

    @pdtFixture('multiple_dates.yml')
    def test_long_phrases(cal, phrase, sourceTime, nlpTarget):
>       assert cal.nlp(phrase, sourceTime) == nlpTarget
E       assert ((datetime.da... 'next week')) == ((datetime.dat... 'next week'))
E         At index 2 diff: (datetime.datetime(2013, 8, 1, 21, 30), pdtContext(accuracy=pdtContext.ACU_MIN), 122, 134, 'in 5 minutes') != (datetime.datetime(2013, 8, 8, 9, 0), pdtContext(accuracy=pdtContext.ACU_WEEK), 176, 185, 'next week')
E         Left contains more items, first extra item: (datetime.datetime(2013, 8, 8, 9, 0), pdtContext(accuracy=pdtContext.ACU_WEEK), 176, 185, 'next week')
E         Full diff:
E         ((datetime.datetime(2013, 8, 5, 20, 0),
E         pdtContext(accuracy=pdtContext.ACU_MONTH | pdtContext.ACU_DAY | pdtContext.ACU_HOUR),
E         17,
E         37,
E         'At 8PM on August 5th'),
E         (datetime.datetime(2013, 8, 9, 21, 0),
E         pdtContext(accuracy=pdtContext.ACU_DAY | pdtContext.ACU_HOUR),
E         73,
E         91,
E         'next Friday at 9PM'),
E         -  (datetime.datetime(2013, 8, 1, 21, 30),
E         -   pdtContext(accuracy=pdtContext.ACU_MIN),
E         -   122,
E         -   134,
E         -   'in 5 minutes'),
E         (datetime.datetime(2013, 8, 8, 9, 0),
E         pdtContext(accuracy=pdtContext.ACU_WEEK),
E         176,
E         185,
E         'next week'))

bear · 2016-09-23T17:19:53Z

I like it - it reads as easily as the others and doesn't require any repeated items. The "avoid having a string appear twice" constraint seems reasonable for a test suite

idpaterson · 2016-09-23T17:28:56Z

Yeah and it's only a constraint of convenience. There will need to be a test that includes multiples of the same string because that is an important thing to test, but that test will just need to specify startIndex and endIndex in the YAML.

…st data

# Conflicts: # tests/TestNlp.py

idpaterson · 2016-09-27T12:57:18Z

nlp tests are now implemented. Some of the existing nlp tests were specific to nlp (e.g. multiple phrases in a string) and some were generic date and times.

A new tests.data.nlpTarget class wraps data from the test and supports equality testing with nlp responses. It provides a consistent representation for tests with both normal and nlp targets (i.e. target: 2016-01-01 00:00:00 vs target: !nlpTarget ...) and allows some flexibility for the python to modify the test data. For example, the following test updates the sourcePhrase after wrapping the original phrase in quotes:

@pytest.mark.parametrize('prefix,suffix', (('"', '"'), ("'", "'"), ('(', ')')))
@pdtFixture('simple_datetimes.yml', ['times', 'invalid_times', 'dates',
                                     'invalid_dates'])
def test_simple_datetimes_wrapped(cal, phrase, sourceTime, nlpTarget, prefix,
                                  suffix):
    sourcePhrase = u'%s%s%s' % (prefix, phrase, suffix)
    nlpTarget.sourcePhrase = sourcePhrase
    assert cal.nlp(sourcePhrase, sourceTime) == nlpTarget

This allows the nlpTarget to calculate the proper start and end index for the phrase.

The test data can also specify startIndex but the documentation warns that this should only be done when required due to the string appearing more than once:

- target: !nlpTarget
    - phrase: "today"
      context: !pdtContext day
      startIndex: 5
      target: 2013-08-01 09:00:00
    - phrase: "today"
      context: !pdtContext day
      startIndex: 26
      target: 2013-08-01 09:00:00
  phrases: 
    - "Yep, today was as good as today could be"

If everything specified a startIndex it would be more of a pain to write a phrase modifying test like test_simple_datetimes_wrapped above since the startIndex would not be calculated and therefore would not match when the phrase is modified.

I ran into one case that failed (not in the original nlp tests, I pulled in some parse test cases). Since this pull request is strictly not going to fix anything in code I noted the failure with a FIXME:

def test_deltas(cal, phrase, sourceTime, nlpTarget):
    # FIXME: these tests fail
    if phrase in ('1855336.424 minutes ago',):
        return
    assert cal.nlp(phrase, sourceTime) == nlpTarget

All of these test utilities will need unit tests as well so I added them to the list at the top of this PR.

I toyed around with a !replace constructor that would call datetime.replace() on the source time but couldn't find a compelling enough reason to keep it, preferring absolute datetimes for clarity. It made the test data too abstract and I wasn't able to use it together with a delta. For example, "4pm 2 days from now" is a combination of a replacement (current time to 16:00:00) and a delta (plus 2 days).

…id (null) nlpTargets

idpaterson · 2016-10-25T00:38:52Z

I need to apologize for the lack of updates recently. A small paid project has taken away the time that I was using to work on parsedatetime. The following is not yet complete enough to commit.

Most recently I have reorganized the test classes to more easily add support for deriving times relative to source times and testing anything that does not require an explicit start date against multiple edge case dates. In the last comment I mentioned an attempt to represent a replacement of date components, I ended up with a succinct syntax for that:

day_suffixes:
  cases:
    - target: !replace 2008-08-22 xx:xx:xx
      phrases:
        - "August 22nd, 2008"

This test group has no sourceTime which means that it will be parametrized against each of the edge case dates (Feb 29 leap year, Feb 28 non leap year, end of year pre-1970). I like with this syntax how the replacement makes it clear that the time component will match that of the source time. I was trying to use obvious source times in the original tests like 01:02:03 to show that (because you could easily miss it if all the times are something like 12:00:00), but this is clearer.

There are some slightly weird consequences of testing against multiple dates when you're dealing with partial dates, for example:

dates:
  sourceTime: !replace xxxx-01-xx xx:xx:xx
  cases:
    - target: !replace xxxx-08-25 xx:xx:xx
      context: !pdtContext month | day
      phrases:
        - "8/25"

In this case if the edge case source times were to fall after August 25 it would map to the next year, so the replacement is used to pin the source time to January. The alternate behavior could then be tested with a source time in October where the date will map to the following year:

dates_next_year:
  sourceTime: !replace xxxx-10-xx xx:xx:xx
  cases:
    - target: !datedelta
        sourceTime: !replace xxxx-08-25 xx:xx:xx
        years: 1
      context: !pdtContext month | day
      phrases:
        - "8/25"

Still plugging along albeit quite slowly now.

bear · 2016-10-25T01:17:32Z

@idpaterson firstly no one working on an OSS project ever needs to apologize for doing paid work first - that's all part of what we do :)

Second, even if it wasn't paid, never worry about taking time to do something - we all work on this code because we love it and it solves an itch, but real life sometimes intrudes.

thanks for the update and for keeping the project moving forward!

…s in tests/data, added dateReplacement

idpaterson · 2017-02-27T13:51:43Z

Another bug was exposed by the new tests today where the old tests just happened to be written in a way that worked. For these particular phrases which are not valid dates, parse returns the current datetime regardless of the sourceTime provided.

01/0
08/35
18/35

calendar = <parsedatetime.Calendar object at 0x1078b6150>, phrase = '18/35', sourceTime = datetime.datetime(1945, 12, 31, 3, 4, 5), context = pdtContext()

    @pdtFixture('errors.yml')
    def test_out_of_range(calendar, phrase, sourceTime, context):
>       assert calendar.parse(phrase, sourceTime) == (sourceTime.timetuple(),
                                                      context)
E       assert (time.struct_... pdtContext()) == (time.struct_t... pdtContext())
E         At index 0 diff: time.struct_time(tm_year=2017, tm_mon=2, tm_mday=27, tm_hour=8, tm_min=42, tm_sec=47, tm_wday=0, tm_yday=58, tm_isdst=0) != time.struct_time(tm_year=1945, tm_mon=12, tm_mday=31, tm_hour=3, tm_min=4, tm_sec=5, tm_wday=0, tm_yday=365, tm_isdst=-1)
E         Full diff:
E         - (time.struct_time(tm_year=2017, tm_mon=2, tm_mday=27, tm_hour=8, tm_min=42, tm_sec=47, tm_wday=0, tm_yday=58, tm_isdst=0),
E         ?                           -- ^                    ^^          ^          -         ^^                      -           ^
E         + (time.struct_time(tm_year=1945, tm_mon=12, tm_mday=31, tm_hour=3, tm_min=4, tm_sec=5, tm_wday=0, tm_yday=365, tm_isdst=-1),
E         ?                            ^^^         +           ^^          ^                   ^                     ++            ^^
E         pdtContext())
E         Detailed information truncated (-2 more lines), use "-vv" to show

tests/test_errors.py:6: AssertionError

idpaterson · 2017-02-27T23:35:16Z

@bear, I noticed that the CI and coverage have not been running on recent commits for this pull request. Would you please check whether those are still operational, maybe I hit a limit on commits for this PR?

idpaterson · 2017-02-28T00:18:15Z

All of the original base/English language test cases are now implemented in the new format. There is still some work to do before this can be merged, so I would like to aim for the following plan:

Double check that all old test cases are adequately covered
Add comments in test cases to aid translation and highlight nuances (e.g. identify that a word was chosen because it contains the abbreviation of a month to avoid someone translating the word directly rather than selecting a different word containing a month in the target language)
Add more test cases where appropriate since it is now easier to add more phrases and to see what is being tested
Discuss and finalize the test data structure and write meta tests for the test utilities
Merge this PR into v3.0
Pull request for the new sphinx documentation introduced in the new test instrumentation and update existing docstrings
Create issues for bugs that surfaced while working on this, most of which are marked with TODO or FIXME in the test data (there are about 7 or 8 things to fix).
Create issues to ask for community involvement in updating each locale's test cases, link to docs for translation instructions

I don't foresee any new features making it into 3.0 unless we get pull requests for them since there are already a number of bugs to fix and based on past experience trying to improve the German locale I suspect that more issues will pop up while working through the locales.

… and object representations

…more accessible

…esting

…missing test group or file

idpaterson · 2017-03-09T01:52:38Z

The test library now has 100% test coverage though I could not at least at this time justify creating separate unit tests for every aspect of the test lib. Before the additional tests, the parsedatetime tests provided 81% coverage of data.py and 92% coverage of fixtures.py. I have not yet worked on 100% test coverage of parsedatetime itself.

I think the last main step before merging this to v3.0 is to add comments in the test case data to aid translation.

…tly associated with or dependent on the yaml data

…aml-dependent code isolated

idpaterson added 5 commits September 21, 2016 07:48

Added pdtContext.fromAccuracyStrings to allow a string representation…

8df3a1a

… of pdtContext

Added ACU_WILDCARD to support a wildcard pdtContext that will match a…

6adb275

…ny other pdtContext

Removed old tests

8717a5a

Added test data parsing and fixtures

422a775

Added a few tests

1dc7127

idpaterson added this to the 3.0 milestone Sep 23, 2016

idpaterson self-assigned this Sep 23, 2016

idpaterson added 2 commits September 26, 2016 20:05

Added explicit datedelta, nlpTarget, and nlpTargetValue classes to te…

cd4a753

…st data

Merge branch 'v3.0' into pull-requests/pytest-yaml

784a463

# Conflicts: # tests/TestNlp.py

idpaterson force-pushed the pull-requests/pytest-yaml branch from c54c460 to e36d937 Compare September 27, 2016 12:38

Use nlpTarget to wrap plain target values and added support for inval…

cf3fe6e

…id (null) nlpTargets

idpaterson force-pushed the pull-requests/pytest-yaml branch from e36d937 to c21aae9 Compare September 27, 2016 13:04

Added nlp tests

01371cf

idpaterson force-pushed the pull-requests/pytest-yaml branch from c21aae9 to 01371cf Compare September 28, 2016 00:07

Added some of the simple datetime tests

7eafcc2

idpaterson mentioned this pull request Nov 20, 2016

Allow locale override idpaterson/alfred-wunderlist-workflow#141

Open

idpaterson added 5 commits November 22, 2016 15:28

Changed “cal” fixture to “calendar”

4e5a6a5

Migrated functions in tests/fixtures to TestGroup and TestCase classe…

4133253

…s in tests/data, added dateReplacement

Updated tests to use !replace rather than explicit dates

5efd022

Ported noon and midnight test cases

57a24fa

Ported TestPhrases

73f1080

Ported TestUnits

af813ae

idpaterson added 2 commits February 27, 2017 08:54

Ported TestErrors

c1cfd8b

Fixed abysmal coverage due to not actually running the tests

498e1e1

idpaterson added 20 commits February 27, 2017 19:34

Ported TestSimpleOffsetsNoon

67b52a8

Ported TestInc

ec2450d

flake8 cleanup

bcbd248

Added test cases for datedelta and improved consistency of its string…

f792382

… and object representations

Added tests for dateReplacement

b20334d

Added tests for nlpTargetValue

8ff819a

Added test utilities to coverage test

bb5586e

Added a few tests to improve coverage of datedelta and dateReplacement

4cdca9e

Test datedelta argument type errors

d309971

Reorganized test lib files and tests to keep the parsedatetime tests …

1506566

…more accessible

Test data directory is now relative to the test file

188be6a

Added tests for pdtFixture

a97f596

Additional tests for number words

d085d9d

Added a few tests for nlpTarget where lacking coverage

205925f

Re-raise YAML loading errors to avoid silent failure

f6b15e9

Removed xfail tests that end up skipped in favor of more direct testing

ff924e0

Fixed parametrization when only one parameter is requested

692cf9a

Added a “case” fixture to get the TestCase instance to simplify metat…

dae6415

…esting

Treat skipped tests as errors and note when the skip may be due to a …

c78178d

…missing test group or file

Added some metatests to achieve full coverage of the test lib

108b19f

idpaterson added 2 commits March 9, 2017 09:18

Split data module into data and models since the models are not direc…

01122f7

…tly associated with or dependent on the yaml data

Split tests.lib.data into two modules, models and data, to keep the y…

4283265

…aml-dependent code isolated

idpaterson mentioned this pull request Mar 10, 2017

tests fail in March #215

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revised testing architecture with yaml data #199

Revised testing architecture with yaml data #199

idpaterson commented Sep 23, 2016 •

edited

Loading

codecov-io commented Sep 23, 2016 •

edited

Loading

bear commented Sep 23, 2016

idpaterson commented Sep 23, 2016

idpaterson commented Sep 23, 2016 •

edited

Loading

bear commented Sep 23, 2016

idpaterson commented Sep 23, 2016

idpaterson commented Sep 27, 2016

idpaterson commented Oct 25, 2016

bear commented Oct 25, 2016

idpaterson commented Feb 27, 2017

idpaterson commented Feb 27, 2017

idpaterson commented Feb 28, 2017

idpaterson commented Mar 9, 2017

Revised testing architecture with yaml data #199

Are you sure you want to change the base?

Revised testing architecture with yaml data #199

Conversation

idpaterson commented Sep 23, 2016 • edited Loading

Scope of this PR

Additional testing support

Testing against "today" and edge case dates

Testing NLP may require a list of targets

Notes

This is a collaborative pull request

The old tests have been deleted from this branch

The new test names and file structure does not necessarily match the old tests

Documentation

codecov-io commented Sep 23, 2016 • edited Loading

Codecov Report

bear commented Sep 23, 2016

idpaterson commented Sep 23, 2016

idpaterson commented Sep 23, 2016 • edited Loading

Sample failures

Wrong context

Missing date in test target

bear commented Sep 23, 2016

idpaterson commented Sep 23, 2016

idpaterson commented Sep 27, 2016

idpaterson commented Oct 25, 2016

bear commented Oct 25, 2016

idpaterson commented Feb 27, 2017

idpaterson commented Feb 27, 2017

idpaterson commented Feb 28, 2017

idpaterson commented Mar 9, 2017

idpaterson commented Sep 23, 2016 •

edited

Loading

codecov-io commented Sep 23, 2016 •

edited

Loading

idpaterson commented Sep 23, 2016 •

edited

Loading