Make Cantera examples a submodule / consolidate with Jupyter #129

ischoegl · 2022-01-17T00:13:26Z

Abstract

At the moment, Python examples are installed deep in the 'guts' of cantera, and thus are relatively hard to find. When installing from source, the installation location is also not consistent with other languages, see #51.

Beyond, there already is a separate Cantera/cantera-jupyter repository (as well as examples from workshops). These examples are currently not covered by CI testing.

One idea would be to create a dedicated repository for Cantera examples that can be cloned from GitHub. The repository should be included as a submodule in the main Cantera/cantera repo, so things are installed together when compiling from source and can be tested by CI (as already done at the moment).

Motivation

Describe the need for the proposed change:

What problem is it trying to solve? ... make examples easier to find / consolidate Jupyter and Python.
Who is affected by the change? ... anyone interested in examples
Why is this a good solution? ... consolidate different example locations and ensure that everything is covered by CI.

The text was updated successfully, but these errors were encountered:

bryanwweber · 2022-01-17T00:59:17Z

Why is this a good solution? ... consolidate different example locations and ensure that everything is covered by CI.

Thanks for the suggestion @ischoegl! I agree that the current location of the Python examples is probably suboptimal for anyone wanting to learn Cantera from those examples.

However, I think moving the examples to a separate repo would be even worse from a developer ergonomics perspective, especially for new contributors trying to figure out where files are located.

I actually wonder whether it makes sense to package the examples at all. I suppose it doesn't hurt anything to do so, but I'd be very surprised if anyone looked at the installed examples instead of the website. It feels like packaging examples comes from the days before easy internet access to documentation...

With that in mind, I don't think I'd be opposed to relocating the examples in the source to a common folder. I think this was discussed before though...

bryanwweber · 2022-01-17T01:02:05Z

Fwiw, the commit of mine that @speth included in Cantera/cantera-jupyter#33 was the start of me trying to set up CI for that repo. It is pretty easy to trigger CI for another repo, see my workflows branch on Cantera/Cantera (sorry, on my phone, so a link is hard).

ischoegl · 2022-01-17T01:41:34Z

I actually wonder whether it makes sense to package the examples at all. I suppose it doesn't hurt anything to do so, but I'd be very surprised if anyone looked at the installed examples instead of the website. It feels like packaging examples comes from the days before easy internet access to documentation...

I tend to agree. I originally thought of proposing a separate package (for pip or conda), but realized that moving things to a stand-alone repo makes a lot more sense. Whether or not this is included as a submodule is secondary, especially as examples cannot be run from scons. The important thing is that they need to be covered by CI, and if this can be triggered from the main repo without making this a submodule, there really isn't much of a reason to include them.

PS: The only reason why I would still advocate for setting things up as a submodule are C++/Fortran examples, as they do depend on a working Cantera installation (where it may be harder to make everything work if things aren't compiled from source). This is mainly from the perspective of avoiding chatter on the UG.

PPS:

However, I think moving the examples to a separate repo would be even worse from a developer ergonomics perspective, especially for new contributors trying to figure out where files are located.

I believe the links created for submodules are actually really easy to follow, e.g. cantera/ext. One solution for this proposal would be to link directly to the submodule from cantera/samples.

bryanwweber · 2022-01-17T04:27:59Z

I believe the links created for submodules are actually really easy to follow, e.g. cantera/ext. One solution for this proposal would be to link directly to the submodule from cantera/samples.

Fair enough in terms of finding the files. However, updating the examples would then require two pull requests, one to actually change the example and another to update the submodule commit in Cantera/cantera. I would not like that workflow, personally. Plus it would double the work for maintainers 🤪

I think the main point you're making is that the examples should be run by CI, right? Why can't that be done if the example files are still located in Cantera/cantera? And the Jupyter examples can be triggered by a commit to main?

Or is the concern more that all the examples we have created over the years should be collected in one place, including the Jupyter examples, workshop examples, and all the examples in Cantera/cantera? This case makes more sense to me to have a single "examples" repo, especially if the examples (for all the interfaces) were not going to be installed any more. It feels weird to have a project repo with no examples, but I'm starting to wrap my head around it 🤓 As far as CI, we can have Conda packages to make that a little bit easier. You can see in my workflows branch that the PyPI builds will run on every commit to main... I'm planning something similar for the Conda packages, so that shouldn't be too hard.

ischoegl · 2022-01-17T16:50:15Z

Or is the concern more that all the examples we have created over the years should be collected in one place, including the Jupyter examples, workshop examples, and all the examples in Cantera/cantera? This case makes more sense to me to have a single "examples" repo, especially if the examples (for all the interfaces) were not going to be installed any more.

Yes, this is my point.

However, updating the examples would then require two pull requests, one to actually change the example and another to update the submodule commit in Cantera/cantera.

I believe this is similar to what is already necessary for any substantial code change that affects Cantera/cantera-website. In most instances, I believe that examples are an 'afterthought', regardless of them being extremely important. I guess the sample repo would require its own CI, which can be done. But it would still have to be triggered whenever Cantera/cantera is changed.

speth · 2022-01-17T17:44:58Z

I think the following are good goals to work toward:

Make it easier to find the examples, for all user interfaces
Make sure that all examples are covered by CI testing

However, I strongly disagree that any examples should be moved out of the main repository. The examples should be continuously updated to keep pace with the main branch, and the easiest way to do that is if they can be updated in the same commits and as part of the same PRs as the base library.

One of the main reasons for keeping the Jupyter examples out of the main repo is because Jupyter notebooks really don't play nice with Git and tend to result in very messy histories. Keeping them in a repository by themselves helps us keep that mess isolated.

I don't think submodules are a great solution for this kind of scenario. They're really best for the purpose that we use them now, i.e. vendoring dependencies that we don't have to update all that often. If we start using them for a more tightly coupled repository, then there will be a lot of churn, since you will need to have a new commit in the parent repo every time you want to bump the commit checked out from the submodule. Even with our infrequent updates to submodule versions, I've seen these get messed up in various pull requests where users end up inadvertently updating or reverting changes to the submodules.

I think the second goal above can be accomplished by having a CI job as part of Cantera/cantera that clones and runs the notebooks from the cantera-jupyter repository, as @bryanwweber already mentioned. We already have a CI job that runs all of the Python examples, and the Fortran and C/C++ examples are already run as part of scons test.

For the first goal, rather than having the examples as a separately clonable Git repository, I think it would be easier for users if we just generated a .zip file with all of the examples as a CI job. I think this would help with your concern that the examples aren't that easy to find once you've installed Cantera (who is going to go digging through ~/.conda/envs/whatever/lib/python3.10/site-packages/cantera/examples/...?). With this, we could easily have available both examples for each release version, as well as a version corresponding to the current main branch. I guess in this case, it would be possible to move the Python examples out of interfaces/cython/cantera and into the samples folder, but I think that's kind of secondary.

Regarding other examples such as those from the workshops, I think there's a couple of (non-mutually-exclusive) options:

Recognize them as being historical artifacts that worked with a particular version of Cantera and may not work with any future version
For the ones that are most useful, bring them into the cantera-jupyter repository
Keep a "live" repository for workshop notebooks that we expect to keep using in future workshops that also gets tested as part of the CI for Cantera/cantera.

ischoegl · 2022-01-17T17:56:00Z

One of the main reasons for keeping the Jupyter examples out of the main repo is because Jupyter notebooks really don't play nice with Git and tend to result in very messy histories. Keeping them in a repository by themselves helps us keep that mess isolated.

I have used nbsphinx with good success elsewhere to create rendered notebooks, while saving notebooks themselves with none of the cells executed. This keeps the diffs small.

PS: adding the jupyter notebooks directly to the website would be imho a nice feature.

ischoegl · 2022-01-17T17:59:59Z

I don't think submodules are a great solution for this kind of scenario. They're really best for the purpose that we use them now, i.e. vendoring dependencies that we don't have to update all that often. If we start using them for a more tightly coupled repository, then there will be a lot of churn, since you will need to have a new commit in the parent repo every time you want to bump the commit checked out from the submodule. Even with our infrequent updates to submodule versions, I've seen these get messed up in various pull requests where users end up inadvertently updating or reverting changes to the submodules.

Fair enough.

I think it would be easier for users if we just generated a .zip file with all of the examples as a CI job. I think this would help with your concern that the examples aren't that easy to find once you've installed Cantera (who is going to go digging through ~/.conda/envs/whatever/lib/python3.10/site-packages/cantera/examples/...?).

👍

I guess in this case, it would be possible to move the Python examples out of interfaces/cython/cantera and into the samples folder, but I think that's kind of secondary.

This would address #51 (which means that I'd still strongly be in favor).

bryanwweber · 2022-01-17T18:03:01Z

I think the second goal above can be accomplished by having a CI job as part of Cantera/cantera that clones and runs the notebooks from the cantera-jupyter repository, as @bryanwweber already mentioned.

@speth I actually think this would be better going the other way, waiting for the Conda packages to be built on each push to main and then installing the package and running the Jupyter notebooks via a CI job in Cantera/cantera-jupyter. But the goal is the same 😄

Even with our infrequent updates to submodule versions, I've seen these get messed up in various pull requests where users end up inadvertently updating or reverting changes to the submodules.

@ischoegl This was also my concern with having a submodule that's regularly updated in the main Cantera/cantera. Even with your example of the website, the only folks that have to do both pull requests are de facto creating a larger feature so that we have the motivation to coach them through the process. On the other hand, I sincerely doubt that Cantera/cantera#1173 would have happened correctly if the examples were in a submodule. Anyway, all of which is to say I agree with @speth that submodule is not appropriate here (regardless of whatever else happens with the examples).

I think it would be easier for users if we just generated a .zip file with all of the examples as a CI job.

I don't really see a benefit of this, since the files can be downloaded from the website, but I suppose it wouldn't hurt.

ischoegl · 2022-01-17T18:06:26Z

I figure there's is disagreement all over the place 😂. I rest my case about submodules.

Fwiw, the issue remains that Python examples are well hidden. For jupyter, I believe using nbsphinx to render them on the website, while moving non-executed notebooks to the main repo would be a potential improvement (which means that the Cantera/cantera-jupyter repo could be phased out).

speth · 2022-01-17T18:20:05Z

@speth I actually think this would be better going the other way, waiting for the Conda packages to be built on each push to main and then installing the package and running the Jupyter notebooks via a CI job in Cantera/cantera-jupyter. But the goal is the same 😄

I'm not going to pretend to understand what kind of behavior can be triggered by a push to Cantera/cantera, but if this workflow is possible, then sure. I guess the effect of what you're suggesting is that this would only run when a PR is merged, not on every update to every open PR? I think that makes sense, but I'm curious where the information that the the CI job to test the Jupyter examples goes -- that is, how do we see / get notified if it fails?

I don't really see a benefit of this, since the files can be downloaded from the website, but I suppose it wouldn't hurt.

I was thinking that the benefit was that you could download all the examples at once this way, and just extract them into your homedir or whatever location you liked.

PS: adding the jupyter notebooks directly to the website would be imho a nice feature.

We already have the rendered Jupyter notebooks available on the website. Is there something wrong with this: https://cantera.org/examples/jupyter/reactors/NonIdealShockTube.ipynb.html that I'm not seeing?

ischoegl · 2022-01-17T18:46:16Z

We already have the rendered Jupyter notebooks available on the website. Is there something wrong with this: https://cantera.org/examples/jupyter/reactors/NonIdealShockTube.ipynb.html that I'm not seeing?

whoops. didn't recall that! But the point is that nbsphinx will generate output from non-executed notebooks, which play 'nice' with git.

speth · 2022-01-18T02:53:16Z

whoops. didn't recall that! But the point is that nbsphinx will generate output from non-executed notebooks, which play 'nice' with git.

I see -- being able to keep the cantera-jupyter repository as just being non-executed notebooks would probably be beneficial in terms of keeping the source modifications more trackable. But I still wouldn't want to put them in the main repository. My experience is that it's still very easy to get lots of spurious changes, even from just using different versions of Jupyter to edit the notebooks.

bryanwweber · 2022-01-18T03:08:37Z

But the point is that nbsphinx will generate output from non-executed notebooks

This is only true if nbsphinx is allowed to execute the notebooks, which means that the execution environment must contain all the dependencies of the notebooks. Although this wouldn't be too difficult with conda, it would substantially increase the time it takes to run builds of the website.

There are also services that enable nicer diffing of Notebooks in the GitHub interface, for example: https://www.reviewnb.com/ (from a quick Google search)

I don't have a super strong preference one way or the other, but I thought it would be worth pointing out.

ischoegl · 2022-01-18T03:45:48Z

Although this wouldn't be too difficult with conda, it would substantially increase the time it takes to run builds of the website.

Setup isn't too bad, and I don't think the build time is a game stopper.

I've actually done this with ctwrap, see microcombustion/ctwrap. Got all excited about using that in one of my classes last year, which was promptly canceled. As I result, I still haven't updated to Cantera 2.51 😢

ischoegl · 2022-01-19T01:48:01Z

@speth and @bryanwweber: thanks for your input here! I believe that the outcome of the discussion is already covered by #51 - closing.

decaluwe · 2022-01-19T17:04:50Z

I was thinking that the benefit was that you could download all the examples at once this way, and just extract them into your homedir or whatever location you liked.

Just weighing in to agree with @speth here. Different users will have different preferences, but I personally don't want to hop around a website. Just give me a file folder and let me explore its contents. :)

ischoegl added the feature-request New feature request label Jan 17, 2022

ischoegl changed the title ~~Move examples into sub-repository / consolidate with Jupyter~~ Move examples into submodule / consolidate with Jupyter Jan 17, 2022

ischoegl changed the title ~~Move examples into submodule / consolidate with Jupyter~~ Make Cantera examples a submodule / consolidate with Jupyter Jan 17, 2022

bryanwweber mentioned this issue Jan 18, 2022

Add package workflow Cantera/cantera#1158

Merged

5 tasks

ischoegl closed this as completed Jan 19, 2022

This was referenced Jan 19, 2022

Make Python sample location more accessible #51

Closed

Automatically push copies of Cantera examples to dedicated repository Cantera/cantera#1177

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Cantera examples a submodule / consolidate with Jupyter #129

Make Cantera examples a submodule / consolidate with Jupyter #129

ischoegl commented Jan 17, 2022

bryanwweber commented Jan 17, 2022

bryanwweber commented Jan 17, 2022

ischoegl commented Jan 17, 2022 •

edited

bryanwweber commented Jan 17, 2022

ischoegl commented Jan 17, 2022 •

edited

speth commented Jan 17, 2022

ischoegl commented Jan 17, 2022 •

edited

ischoegl commented Jan 17, 2022 •

edited

bryanwweber commented Jan 17, 2022 •

edited

ischoegl commented Jan 17, 2022 •

edited

speth commented Jan 17, 2022

ischoegl commented Jan 17, 2022

speth commented Jan 18, 2022

bryanwweber commented Jan 18, 2022

ischoegl commented Jan 18, 2022

ischoegl commented Jan 19, 2022

decaluwe commented Jan 19, 2022

Make Cantera examples a submodule / consolidate with Jupyter #129

Make Cantera examples a submodule / consolidate with Jupyter #129

Comments

ischoegl commented Jan 17, 2022

bryanwweber commented Jan 17, 2022

bryanwweber commented Jan 17, 2022

ischoegl commented Jan 17, 2022 • edited

bryanwweber commented Jan 17, 2022

ischoegl commented Jan 17, 2022 • edited

speth commented Jan 17, 2022

ischoegl commented Jan 17, 2022 • edited

ischoegl commented Jan 17, 2022 • edited

bryanwweber commented Jan 17, 2022 • edited

ischoegl commented Jan 17, 2022 • edited

speth commented Jan 17, 2022

ischoegl commented Jan 17, 2022

speth commented Jan 18, 2022

bryanwweber commented Jan 18, 2022

ischoegl commented Jan 18, 2022

ischoegl commented Jan 19, 2022

decaluwe commented Jan 19, 2022

ischoegl commented Jan 17, 2022 •

edited

ischoegl commented Jan 17, 2022 •

edited

ischoegl commented Jan 17, 2022 •

edited

ischoegl commented Jan 17, 2022 •

edited

bryanwweber commented Jan 17, 2022 •

edited

ischoegl commented Jan 17, 2022 •

edited