Automatically test if new version of HARK crash Demarks/Remarks #29

shaunagm · 2019-04-30T12:52:38Z

Copying a comment from @llorracc in another issue:

Since we do not yet have much in the way of unit tests, I'm hoping that we can repurpose something that we DO have: Our DemARKs and REMARKs. They reside in separate repos, but we could make those repos "submodules" in the HARK repo, pointing always to the master branch of DemARK and REMARK.

A minimal "test" of new code is that it doesn't break any of our existing working DemARK and REMARK examples. This would be a good task for a sprinter who knows a reasonable amount about CI and testing, but nothing about HARK.

The first step would be for Travis to "update" the submodules to the latest version of the DemARK and REMARK submodules. Then it seems like it should be possible to get Travis to run some pseudo-code like (in my native language of bash):

cd [path to DemARK/notebooks]
for f in *.py; do
  ipython $f
done

and see whether any of the runs crashes things. (Now that we impose CI for PR's for everything, this would also mean that any revision of an existing DemARK or REMARK would automatically be tested upon creation). For REMARKs it is only slightly more complicated. Since the instructions for creating a do_min.py file is that it should take no more than a couple of minutes to run, something like this should work:

cd [path to REMARKs]

for d in [directories]; do  
   for f in [all instances of do_min.py in the main directory or subdirectories]; do  
      ipython $f
   done 
done

The text was updated successfully, but these errors were encountered:

hameerabbasi · 2019-05-06T17:53:22Z

Has anyone here looked at nbval for PyTest integration? Currently it conflicts with pytest-cov (see computationalmodelling/nbval#116), but it tests everything at least.

llorracc · 2019-05-06T18:00:40Z

Thanks for pointing us to nbval. I hope that it does not test whether the results of a cell are byte-by-byte identical, because we've found that jupyter notebooks embed all kinds of things like timestamps and module version numbers and other cruft which makes it hard to tell when there has been a MEANINGFUL change to the notebook. (We are using jupytext to handle that problem but I would imagine a similar issue would come up in this context).

…

On Mon, May 6, 2019 at 1:53 PM Hameer Abbasi ***@***.***> wrote: Has anyone here looked at nbval <https://pypi.org/project/nbval/> for PyTest integration? Currently it conflicts with pytest-cov (see computationalmodelling/nbval#116 <computationalmodelling/nbval#116>), but it tests everything at least. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#29 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKCK76X4ORZ2ARKNX4BMZDPUBWBFANCNFSM4HJLKORA> .

-- - Chris Carroll

hameerabbasi · 2019-05-06T18:01:44Z

It has a loose mode which doesn't test the output but does test that the notebooks run successfully.

llorracc · 2019-05-06T18:11:50Z

That sounds great, exactly what we need. @shaunagm, we might want to do this differently for DemARK's and REMARKs. For REMARKs the most important thing is that the do_min.py file runs from the command line; the notebooks tend to be gravy. But for the DemARKs their whole point is to interactively demonstrate things, so nbval might be perfect for them.__

shaunagm · 2019-05-06T19:36:52Z

@hameerabbasi and Keith and I have been chatting about this. We probably want to identify a set of notebooks which should always be up to date with HARK - maybe the "gentle introduction to HARK" notebook and other instructional examples. When changing HARK broke those notebooks we'd know we're introducing a breaking change and could mark the release/schedule for release accordingly, while updating the notebooks.

But many of our notebooks we don't want to actively update every time we change HARK, so we can't test against them - or at least, we can't test that they execute, though we can (probably) test that they still load. I'm not sure what kind of changes to HARK would be so catastrophic they'd prevent a notebook from even loading.

llorracc · 2019-05-06T19:53:13Z

One thing I've been wondering about is whether there it is possible to have a second kind of testing, for slow-running code (some of our do_min.py files might take 2-3 minutes, and if we were to test all of them there might be a quite a delay before Travis approved them).

Is this part of the reason you think some of the notebooks should be excluded from Travis testing? Because if not for that consideration, it seems to me a good workflow would be to by default test all the DemARK notebooks and REMARK do_min's, and then if we are notified that one of them breaks we can choose either to fix the problem (if it's easy) or to remove that notebook from the "master" branch until it is fixed. (I REALLY don't want to have notebooks posted that break when new users try to use them).

If the issue is about speed of the tests (not wanting to have to wait very long for Travis), but there is another kind of testing that we could, say, run every night overnight, and be informed the next day after a merge, that might be a good approach.

Does this make sense?

shaunagm · 2019-05-06T19:57:23Z

@llorracc - my feeling is that we should have two kinds of notebooks.

Type A is "pinned" to a specific older version of HARK (and pinned to specific versions of any other dependencies). Those always work because they are a snapshot of history, frozen in time.

Type B uses whatever the most recent version of HARK is. This is the kind of notebook we would test new versions of HARK against, and would need to be changed as we changed HARK. They would approximately always work but we might accidentally break them occasionally because we're maintaining them the same as we maintain HARK.

Type A are much, much easier to have and seem like a good fit for remarks, aka notebooks that are capturing an implementation/replication of a specific paper. Type B is a form of testing/documentation and is more work to maintain but that's the nature of tests/documentation - they always need to be changed along with the code.

I don't really care about how long the tests are taking to run - it doesn't seem too bad so far, so I haven't considered it as a factor. This is more trying to avoid the cognitive labor of having to update dozens or hundreds of notebooks every time we change HARK.

llorracc · 2019-05-06T20:02:36Z

I see, that makes sense. What I was worried about is if somebody had an updated version of HARK but downloaded a notebook that used to work but doesn't now, that would fluster them. But your "pinning" solution solves that problem. I think you're right about the REMARKs -- they are intended as a kind of snapshot of a moment in time and a set of tools with which the problem was solved.

Is there a way to set things up so that new content automatically gets "marked" with the HARK version number under which it was initially tested, or is that something we would have to put in by hand when we merge to master for the first time (say)?

This is more trying to avoid the cognitive labor of having to update dozens or hundreds of notebooks every time we change HARK.

I'm hoping that with most HARK version changes, not much will break. But possibly I'm overoptimistic on that ...

shaunagm · 2019-05-06T20:08:18Z

What do you mean by "new content"? Like, new demarks, remarks, etc? I don't know of an automatic way to ensure that dependencies are pinned, so my feeling is we'll just need to be good about checking for version #s before merging PRs. But there may be a way I haven't heard about.

llorracc · 2019-05-06T20:14:02Z

What do you mean by "new content"? Like, new demarks, remarks, etc?

Yes, that's what I meant.

So, it should be part of a checklist before merging new content into master. Or I guess there can be one master "requirements.txt" file at the root of the REMARK which applies to all of the content therein? (Like, do_min, do_mid, do_all, and other ways of using the code)?

shaunagm · 2019-05-06T20:38:14Z

Yes, checklist seems like the solution here. We'll likely want to let different remarks have different requirements, but that'll depend a bit on whether we end up using mybinder, colab, etc to host them.

keithblaha · 2019-05-07T15:56:56Z

econ-ark/HARK#280 starts to work on this

shaunagm · 2020-01-15T20:45:29Z

With Mridul's help we have set this up to work on Demark.

project-bot bot added this to Needs triage in To Dos Apr 30, 2019

shaunagm added the priority: medium label Apr 30, 2019

llorracc mentioned this issue Apr 30, 2019

Get unit test coverage of key HARK functions #18

Closed

shaunagm mentioned this issue Apr 30, 2019

On the Testing folder econ-ark/HARK#246

Closed

shaunagm added the Sprint Task label Apr 30, 2019

econ-ark-owner mentioned this issue Jun 13, 2019

Improve HARK test infrastructure #1

Closed

3 tasks

shaunagm mentioned this issue Jul 25, 2019

Simplify Binder build configuration and notebook execution econ-ark/DemARK#50

Open

shaunagm closed this as completed Jan 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically test if new version of HARK crash Demarks/Remarks #29

Automatically test if new version of HARK crash Demarks/Remarks #29

shaunagm commented Apr 30, 2019

hameerabbasi commented May 6, 2019

llorracc commented May 6, 2019 via email

hameerabbasi commented May 6, 2019

llorracc commented May 6, 2019

shaunagm commented May 6, 2019

llorracc commented May 6, 2019

shaunagm commented May 6, 2019 •

edited

llorracc commented May 6, 2019 •

edited

shaunagm commented May 6, 2019

llorracc commented May 6, 2019

shaunagm commented May 6, 2019

keithblaha commented May 7, 2019

shaunagm commented Jan 15, 2020

Automatically test if new version of HARK crash Demarks/Remarks #29

Automatically test if new version of HARK crash Demarks/Remarks #29

Comments

shaunagm commented Apr 30, 2019

hameerabbasi commented May 6, 2019

llorracc commented May 6, 2019 via email

hameerabbasi commented May 6, 2019

llorracc commented May 6, 2019

shaunagm commented May 6, 2019

llorracc commented May 6, 2019

shaunagm commented May 6, 2019 • edited

llorracc commented May 6, 2019 • edited

shaunagm commented May 6, 2019

llorracc commented May 6, 2019

shaunagm commented May 6, 2019

keithblaha commented May 7, 2019

shaunagm commented Jan 15, 2020

shaunagm commented May 6, 2019 •

edited

llorracc commented May 6, 2019 •

edited