Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically test if new version of HARK crash Demarks/Remarks #29

Closed
shaunagm opened this issue Apr 30, 2019 · 13 comments
Closed

Automatically test if new version of HARK crash Demarks/Remarks #29

shaunagm opened this issue Apr 30, 2019 · 13 comments

Comments

@shaunagm
Copy link

Copying a comment from @llorracc in another issue:

Since we do not yet have much in the way of unit tests, I'm hoping that we can repurpose something that we DO have: Our DemARKs and REMARKs. They reside in separate repos, but we could make those repos "submodules" in the HARK repo, pointing always to the master branch of DemARK and REMARK.

A minimal "test" of new code is that it doesn't break any of our existing working DemARK and REMARK examples. This would be a good task for a sprinter who knows a reasonable amount about CI and testing, but nothing about HARK.

The first step would be for Travis to "update" the submodules to the latest version of the DemARK and REMARK submodules. Then it seems like it should be possible to get Travis to run some pseudo-code like (in my native language of bash):

cd [path to DemARK/notebooks]
for f in *.py; do
  ipython $f
done

and see whether any of the runs crashes things. (Now that we impose CI for PR's for everything, this would also mean that any revision of an existing DemARK or REMARK would automatically be tested upon creation). For REMARKs it is only slightly more complicated. Since the instructions for creating a do_min.py file is that it should take no more than a couple of minutes to run, something like this should work:

cd [path to REMARKs]

for d in [directories]; do  
   for f in [all instances of do_min.py in the main directory or subdirectories]; do  
      ipython $f
   done 
done
@hameerabbasi
Copy link

Has anyone here looked at nbval for PyTest integration? Currently it conflicts with pytest-cov (see computationalmodelling/nbval#116), but it tests everything at least.

@llorracc
Copy link
Contributor

llorracc commented May 6, 2019 via email

@hameerabbasi
Copy link

It has a loose mode which doesn't test the output but does test that the notebooks run successfully.

@llorracc
Copy link
Contributor

llorracc commented May 6, 2019

That sounds great, exactly what we need. @shaunagm, we might want to do this differently for DemARK's and REMARKs. For REMARKs the most important thing is that the do_min.py file runs from the command line; the notebooks tend to be gravy. But for the DemARKs their whole point is to interactively demonstrate things, so nbval might be perfect for them.__

@shaunagm
Copy link
Author

shaunagm commented May 6, 2019

@hameerabbasi and Keith and I have been chatting about this. We probably want to identify a set of notebooks which should always be up to date with HARK - maybe the "gentle introduction to HARK" notebook and other instructional examples. When changing HARK broke those notebooks we'd know we're introducing a breaking change and could mark the release/schedule for release accordingly, while updating the notebooks.

But many of our notebooks we don't want to actively update every time we change HARK, so we can't test against them - or at least, we can't test that they execute, though we can (probably) test that they still load. I'm not sure what kind of changes to HARK would be so catastrophic they'd prevent a notebook from even loading.

@llorracc
Copy link
Contributor

llorracc commented May 6, 2019

One thing I've been wondering about is whether there it is possible to have a second kind of testing, for slow-running code (some of our do_min.py files might take 2-3 minutes, and if we were to test all of them there might be a quite a delay before Travis approved them).

Is this part of the reason you think some of the notebooks should be excluded from Travis testing? Because if not for that consideration, it seems to me a good workflow would be to by default test all the DemARK notebooks and REMARK do_min's, and then if we are notified that one of them breaks we can choose either to fix the problem (if it's easy) or to remove that notebook from the "master" branch until it is fixed. (I REALLY don't want to have notebooks posted that break when new users try to use them).

If the issue is about speed of the tests (not wanting to have to wait very long for Travis), but there is another kind of testing that we could, say, run every night overnight, and be informed the next day after a merge, that might be a good approach.

Does this make sense?

@shaunagm
Copy link
Author

shaunagm commented May 6, 2019

@llorracc - my feeling is that we should have two kinds of notebooks.

Type A is "pinned" to a specific older version of HARK (and pinned to specific versions of any other dependencies). Those always work because they are a snapshot of history, frozen in time.

Type B uses whatever the most recent version of HARK is. This is the kind of notebook we would test new versions of HARK against, and would need to be changed as we changed HARK. They would approximately always work but we might accidentally break them occasionally because we're maintaining them the same as we maintain HARK.

Type A are much, much easier to have and seem like a good fit for remarks, aka notebooks that are capturing an implementation/replication of a specific paper. Type B is a form of testing/documentation and is more work to maintain but that's the nature of tests/documentation - they always need to be changed along with the code.

I don't really care about how long the tests are taking to run - it doesn't seem too bad so far, so I haven't considered it as a factor. This is more trying to avoid the cognitive labor of having to update dozens or hundreds of notebooks every time we change HARK.

@llorracc
Copy link
Contributor

llorracc commented May 6, 2019

I see, that makes sense. What I was worried about is if somebody had an updated version of HARK but downloaded a notebook that used to work but doesn't now, that would fluster them. But your "pinning" solution solves that problem. I think you're right about the REMARKs -- they are intended as a kind of snapshot of a moment in time and a set of tools with which the problem was solved.

Is there a way to set things up so that new content automatically gets "marked" with the HARK version number under which it was initially tested, or is that something we would have to put in by hand when we merge to master for the first time (say)?

This is more trying to avoid the cognitive labor of having to update dozens or hundreds of notebooks every time we change HARK.

I'm hoping that with most HARK version changes, not much will break. But possibly I'm overoptimistic on that ...

@shaunagm
Copy link
Author

shaunagm commented May 6, 2019

What do you mean by "new content"? Like, new demarks, remarks, etc? I don't know of an automatic way to ensure that dependencies are pinned, so my feeling is we'll just need to be good about checking for version #s before merging PRs. But there may be a way I haven't heard about.

@llorracc
Copy link
Contributor

llorracc commented May 6, 2019

What do you mean by "new content"? Like, new demarks, remarks, etc?

Yes, that's what I meant.

So, it should be part of a checklist before merging new content into master. Or I guess there can be one master "requirements.txt" file at the root of the REMARK which applies to all of the content therein? (Like, do_min, do_mid, do_all, and other ways of using the code)?

@shaunagm
Copy link
Author

shaunagm commented May 6, 2019

Yes, checklist seems like the solution here. We'll likely want to let different remarks have different requirements, but that'll depend a bit on whether we end up using mybinder, colab, etc to host them.

@keithblaha
Copy link

econ-ark/HARK#280 starts to work on this

@shaunagm
Copy link
Author

With Mridul's help we have set this up to work on Demark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
To Dos
  
Needs triage
Development

No branches or pull requests

4 participants