Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Cantera examples a submodule / consolidate with Jupyter #129

Closed
ischoegl opened this issue Jan 17, 2022 · 17 comments
Closed

Make Cantera examples a submodule / consolidate with Jupyter #129

ischoegl opened this issue Jan 17, 2022 · 17 comments
Labels
feature-request New feature request

Comments

@ischoegl
Copy link
Member

Abstract

At the moment, Python examples are installed deep in the 'guts' of cantera, and thus are relatively hard to find. When installing from source, the installation location is also not consistent with other languages, see #51.

Beyond, there already is a separate Cantera/cantera-jupyter repository (as well as examples from workshops). These examples are currently not covered by CI testing.

One idea would be to create a dedicated repository for Cantera examples that can be cloned from GitHub. The repository should be included as a submodule in the main Cantera/cantera repo, so things are installed together when compiling from source and can be tested by CI (as already done at the moment).

Motivation

Describe the need for the proposed change:

  • What problem is it trying to solve? ... make examples easier to find / consolidate Jupyter and Python.
  • Who is affected by the change? ... anyone interested in examples
  • Why is this a good solution? ... consolidate different example locations and ensure that everything is covered by CI.
@ischoegl ischoegl added the feature-request New feature request label Jan 17, 2022
@ischoegl ischoegl changed the title Move examples into sub-repository / consolidate with Jupyter Move examples into submodule / consolidate with Jupyter Jan 17, 2022
@ischoegl ischoegl changed the title Move examples into submodule / consolidate with Jupyter Make Cantera examples a submodule / consolidate with Jupyter Jan 17, 2022
@bryanwweber
Copy link
Member

Why is this a good solution? ... consolidate different example locations and ensure that everything is covered by CI.

Thanks for the suggestion @ischoegl! I agree that the current location of the Python examples is probably suboptimal for anyone wanting to learn Cantera from those examples.

However, I think moving the examples to a separate repo would be even worse from a developer ergonomics perspective, especially for new contributors trying to figure out where files are located.

I actually wonder whether it makes sense to package the examples at all. I suppose it doesn't hurt anything to do so, but I'd be very surprised if anyone looked at the installed examples instead of the website. It feels like packaging examples comes from the days before easy internet access to documentation...

With that in mind, I don't think I'd be opposed to relocating the examples in the source to a common folder. I think this was discussed before though...

@bryanwweber
Copy link
Member

Fwiw, the commit of mine that @speth included in Cantera/cantera-jupyter#33 was the start of me trying to set up CI for that repo. It is pretty easy to trigger CI for another repo, see my workflows branch on Cantera/Cantera (sorry, on my phone, so a link is hard).

@ischoegl
Copy link
Member Author

ischoegl commented Jan 17, 2022

I actually wonder whether it makes sense to package the examples at all. I suppose it doesn't hurt anything to do so, but I'd be very surprised if anyone looked at the installed examples instead of the website. It feels like packaging examples comes from the days before easy internet access to documentation...

I tend to agree. I originally thought of proposing a separate package (for pip or conda), but realized that moving things to a stand-alone repo makes a lot more sense. Whether or not this is included as a submodule is secondary, especially as examples cannot be run from scons. The important thing is that they need to be covered by CI, and if this can be triggered from the main repo without making this a submodule, there really isn't much of a reason to include them.

PS: The only reason why I would still advocate for setting things up as a submodule are C++/Fortran examples, as they do depend on a working Cantera installation (where it may be harder to make everything work if things aren't compiled from source). This is mainly from the perspective of avoiding chatter on the UG.

PPS:

However, I think moving the examples to a separate repo would be even worse from a developer ergonomics perspective, especially for new contributors trying to figure out where files are located.

I believe the links created for submodules are actually really easy to follow, e.g. cantera/ext. One solution for this proposal would be to link directly to the submodule from cantera/samples.

@bryanwweber
Copy link
Member

I believe the links created for submodules are actually really easy to follow, e.g. cantera/ext. One solution for this proposal would be to link directly to the submodule from cantera/samples.

Fair enough in terms of finding the files. However, updating the examples would then require two pull requests, one to actually change the example and another to update the submodule commit in Cantera/cantera. I would not like that workflow, personally. Plus it would double the work for maintainers 🤪

I think the main point you're making is that the examples should be run by CI, right? Why can't that be done if the example files are still located in Cantera/cantera? And the Jupyter examples can be triggered by a commit to main?

Or is the concern more that all the examples we have created over the years should be collected in one place, including the Jupyter examples, workshop examples, and all the examples in Cantera/cantera? This case makes more sense to me to have a single "examples" repo, especially if the examples (for all the interfaces) were not going to be installed any more. It feels weird to have a project repo with no examples, but I'm starting to wrap my head around it 🤓 As far as CI, we can have Conda packages to make that a little bit easier. You can see in my workflows branch that the PyPI builds will run on every commit to main... I'm planning something similar for the Conda packages, so that shouldn't be too hard.

@ischoegl
Copy link
Member Author

ischoegl commented Jan 17, 2022

Or is the concern more that all the examples we have created over the years should be collected in one place, including the Jupyter examples, workshop examples, and all the examples in Cantera/cantera? This case makes more sense to me to have a single "examples" repo, especially if the examples (for all the interfaces) were not going to be installed any more.

Yes, this is my point.

However, updating the examples would then require two pull requests, one to actually change the example and another to update the submodule commit in Cantera/cantera.

I believe this is similar to what is already necessary for any substantial code change that affects Cantera/cantera-website. In most instances, I believe that examples are an 'afterthought', regardless of them being extremely important. I guess the sample repo would require its own CI, which can be done. But it would still have to be triggered whenever Cantera/cantera is changed.

@speth
Copy link
Member

speth commented Jan 17, 2022

I think the following are good goals to work toward:

  • Make it easier to find the examples, for all user interfaces
  • Make sure that all examples are covered by CI testing

However, I strongly disagree that any examples should be moved out of the main repository. The examples should be continuously updated to keep pace with the main branch, and the easiest way to do that is if they can be updated in the same commits and as part of the same PRs as the base library.

One of the main reasons for keeping the Jupyter examples out of the main repo is because Jupyter notebooks really don't play nice with Git and tend to result in very messy histories. Keeping them in a repository by themselves helps us keep that mess isolated.

I don't think submodules are a great solution for this kind of scenario. They're really best for the purpose that we use them now, i.e. vendoring dependencies that we don't have to update all that often. If we start using them for a more tightly coupled repository, then there will be a lot of churn, since you will need to have a new commit in the parent repo every time you want to bump the commit checked out from the submodule. Even with our infrequent updates to submodule versions, I've seen these get messed up in various pull requests where users end up inadvertently updating or reverting changes to the submodules.

I think the second goal above can be accomplished by having a CI job as part of Cantera/cantera that clones and runs the notebooks from the cantera-jupyter repository, as @bryanwweber already mentioned. We already have a CI job that runs all of the Python examples, and the Fortran and C/C++ examples are already run as part of scons test.

For the first goal, rather than having the examples as a separately clonable Git repository, I think it would be easier for users if we just generated a .zip file with all of the examples as a CI job. I think this would help with your concern that the examples aren't that easy to find once you've installed Cantera (who is going to go digging through ~/.conda/envs/whatever/lib/python3.10/site-packages/cantera/examples/...?). With this, we could easily have available both examples for each release version, as well as a version corresponding to the current main branch. I guess in this case, it would be possible to move the Python examples out of interfaces/cython/cantera and into the samples folder, but I think that's kind of secondary.

Regarding other examples such as those from the workshops, I think there's a couple of (non-mutually-exclusive) options:

  1. Recognize them as being historical artifacts that worked with a particular version of Cantera and may not work with any future version
  2. For the ones that are most useful, bring them into the cantera-jupyter repository
  3. Keep a "live" repository for workshop notebooks that we expect to keep using in future workshops that also gets tested as part of the CI for Cantera/cantera.

@ischoegl
Copy link
Member Author

ischoegl commented Jan 17, 2022

One of the main reasons for keeping the Jupyter examples out of the main repo is because Jupyter notebooks really don't play nice with Git and tend to result in very messy histories. Keeping them in a repository by themselves helps us keep that mess isolated.

I have used nbsphinx with good success elsewhere to create rendered notebooks, while saving notebooks themselves with none of the cells executed. This keeps the diffs small.

PS: adding the jupyter notebooks directly to the website would be imho a nice feature.

@ischoegl
Copy link
Member Author

ischoegl commented Jan 17, 2022

I don't think submodules are a great solution for this kind of scenario. They're really best for the purpose that we use them now, i.e. vendoring dependencies that we don't have to update all that often. If we start using them for a more tightly coupled repository, then there will be a lot of churn, since you will need to have a new commit in the parent repo every time you want to bump the commit checked out from the submodule. Even with our infrequent updates to submodule versions, I've seen these get messed up in various pull requests where users end up inadvertently updating or reverting changes to the submodules.

Fair enough.

I think it would be easier for users if we just generated a .zip file with all of the examples as a CI job. I think this would help with your concern that the examples aren't that easy to find once you've installed Cantera (who is going to go digging through ~/.conda/envs/whatever/lib/python3.10/site-packages/cantera/examples/...?).

👍

I guess in this case, it would be possible to move the Python examples out of interfaces/cython/cantera and into the samples folder, but I think that's kind of secondary.

This would address #51 (which means that I'd still strongly be in favor).

@bryanwweber
Copy link
Member

bryanwweber commented Jan 17, 2022

I think the second goal above can be accomplished by having a CI job as part of Cantera/cantera that clones and runs the notebooks from the cantera-jupyter repository, as @bryanwweber already mentioned.

@speth I actually think this would be better going the other way, waiting for the Conda packages to be built on each push to main and then installing the package and running the Jupyter notebooks via a CI job in Cantera/cantera-jupyter. But the goal is the same 😄

Even with our infrequent updates to submodule versions, I've seen these get messed up in various pull requests where users end up inadvertently updating or reverting changes to the submodules.

@ischoegl This was also my concern with having a submodule that's regularly updated in the main Cantera/cantera. Even with your example of the website, the only folks that have to do both pull requests are de facto creating a larger feature so that we have the motivation to coach them through the process. On the other hand, I sincerely doubt that Cantera/cantera#1173 would have happened correctly if the examples were in a submodule. Anyway, all of which is to say I agree with @speth that submodule is not appropriate here (regardless of whatever else happens with the examples).

I think it would be easier for users if we just generated a .zip file with all of the examples as a CI job.

I don't really see a benefit of this, since the files can be downloaded from the website, but I suppose it wouldn't hurt.

@ischoegl
Copy link
Member Author

ischoegl commented Jan 17, 2022

I figure there's is disagreement all over the place 😂. I rest my case about submodules.

Fwiw, the issue remains that Python examples are well hidden. For jupyter, I believe using nbsphinx to render them on the website, while moving non-executed notebooks to the main repo would be a potential improvement (which means that the Cantera/cantera-jupyter repo could be phased out).

@speth
Copy link
Member

speth commented Jan 17, 2022

@speth I actually think this would be better going the other way, waiting for the Conda packages to be built on each push to main and then installing the package and running the Jupyter notebooks via a CI job in Cantera/cantera-jupyter. But the goal is the same 😄

I'm not going to pretend to understand what kind of behavior can be triggered by a push to Cantera/cantera, but if this workflow is possible, then sure. I guess the effect of what you're suggesting is that this would only run when a PR is merged, not on every update to every open PR? I think that makes sense, but I'm curious where the information that the the CI job to test the Jupyter examples goes -- that is, how do we see / get notified if it fails?

I don't really see a benefit of this, since the files can be downloaded from the website, but I suppose it wouldn't hurt.

I was thinking that the benefit was that you could download all the examples at once this way, and just extract them into your homedir or whatever location you liked.

PS: adding the jupyter notebooks directly to the website would be imho a nice feature.

We already have the rendered Jupyter notebooks available on the website. Is there something wrong with this: https://cantera.org/examples/jupyter/reactors/NonIdealShockTube.ipynb.html that I'm not seeing?

@ischoegl
Copy link
Member Author

We already have the rendered Jupyter notebooks available on the website. Is there something wrong with this: https://cantera.org/examples/jupyter/reactors/NonIdealShockTube.ipynb.html that I'm not seeing?

whoops. didn't recall that! But the point is that nbsphinx will generate output from non-executed notebooks, which play 'nice' with git.

@speth
Copy link
Member

speth commented Jan 18, 2022

whoops. didn't recall that! But the point is that nbsphinx will generate output from non-executed notebooks, which play 'nice' with git.

I see -- being able to keep the cantera-jupyter repository as just being non-executed notebooks would probably be beneficial in terms of keeping the source modifications more trackable. But I still wouldn't want to put them in the main repository. My experience is that it's still very easy to get lots of spurious changes, even from just using different versions of Jupyter to edit the notebooks.

@bryanwweber
Copy link
Member

But the point is that nbsphinx will generate output from non-executed notebooks

This is only true if nbsphinx is allowed to execute the notebooks, which means that the execution environment must contain all the dependencies of the notebooks. Although this wouldn't be too difficult with conda, it would substantially increase the time it takes to run builds of the website.

There are also services that enable nicer diffing of Notebooks in the GitHub interface, for example: https://www.reviewnb.com/ (from a quick Google search)

I don't have a super strong preference one way or the other, but I thought it would be worth pointing out.

@ischoegl
Copy link
Member Author

Although this wouldn't be too difficult with conda, it would substantially increase the time it takes to run builds of the website.

Setup isn't too bad, and I don't think the build time is a game stopper.

I've actually done this with ctwrap, see microcombustion/ctwrap. Got all excited about using that in one of my classes last year, which was promptly canceled. As I result, I still haven't updated to Cantera 2.51 😢

@ischoegl
Copy link
Member Author

@speth and @bryanwweber: thanks for your input here! I believe that the outcome of the discussion is already covered by #51 - closing.

@decaluwe
Copy link
Member

I was thinking that the benefit was that you could download all the examples at once this way, and just extract them into your homedir or whatever location you liked.

Just weighing in to agree with @speth here. Different users will have different preferences, but I personally don't want to hop around a website. Just give me a file folder and let me explore its contents. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature request
Projects
None yet
Development

No branches or pull requests

4 participants