Teaching, training, documentation, and coordinating documentation #7

jpivarski · 2023-06-28T01:04:47Z

Helping users find and understand the software they need, keeping documentation up to date, and documenting procedures that cut across multiple packages.

jpivarski · 2023-06-29T13:02:53Z

From @alexander-held in #9 (comment):

how do users get help for things they cannot find in documentation?

That's an important point that I didn't mention anywhere, in any group descriptions. It sounds to me like it should go in #7 (here). Having places to ask questions accounts for the fact that teaching, training, and documentation won't be all-inclusive; there will always be something that isn't clear. But then, knowing where those places are is also something to be learned from the training or documentation.

I'd say that discussions about real-time chat (Gitter, Discord, etc.) and help message boards (Discourse, GitHub Discussions, StackOverflow, etc.) should be included in discussions about teaching/training/documentation, and therefore be in this group.

amangoel185 · 2023-07-06T01:49:11Z

I'd be interested in this group!

Might have an indirect overlap with #6 too (as is naturally the case with a lot of the topics) - how open source work is delegated, how much time can be officially dedicated to training, documentation etc.

btovar · 2023-07-08T11:43:49Z

+1

JMolinaHN · 2023-07-11T16:51:50Z

Very interesting topic. I am interested in participating.

mattbellis · 2023-07-15T21:20:54Z

+1

ph-ilten · 2023-07-18T20:10:22Z

+1 Interested in coordinating tutorials and some best practice people are using for tutorials.

ianna · 2023-07-19T15:47:32Z

+1

klieret · 2023-07-19T16:03:32Z

+1 interested.

Another topic that wasn't directly mentioned yet: How do we deal with Binder having reduced resources? Can Codespaces replace Binder (at HSF Training we tested it successfully recently with ROOT and Scikit-HEP)? What other options do we have?

mattbellis · 2023-07-19T20:18:23Z

@klieret I've never tried it with ROOT, but I've had good luck with Google Colab and installing uproot and awkward. Here's a sample that we've used to show people how to access some CMS open data files, converted to nanoAOD+, that we are hosting on Google Cloud Storage, as a test case.

https://colab.research.google.com/drive/16XiPu8W_1RQox-B6VeEcDuCmBfDngVqH?usp=sharing

It doesn't solve all issues, and like many things Google-related, who knows how long it will be available. :) But from an educator's viewpoint, it's a great tool.

mdsokoloff · 2023-07-19T21:06:08Z

@klieret I've never tried it with ROOT, but I've had good luck with Google Colab and installing uproot and awkward. Here's a sample that we've used to show people how to access some CMS open data files, converted to nanoAOD+, that we are hosting on Google Cloud Storage, as a test case.

https://colab.research.google.com/drive/16XiPu8W_1RQox-B6VeEcDuCmBfDngVqH?usp=sharing

It doesn't solve all issues, and like many things Google-related, who knows how long it will be available. :) But from an educator's viewpoint, it's a great tool.

I have been working with high school interns doing LHCb analysis. We started using Jupyter notebooks on local servers. They prefer to work in Google Colab (and are doing so). They have been using uproot, awkward, and iMinuit with .root files on servers. I still prefer Jupyter notebooks on a private server (or personal computer), but Colab lets them work together more easily.

clelange · 2023-07-25T13:36:50Z

I'd be interesting in discussing deprecating documentation and ensuring the latest greatest is the actual entrypoint. A lot of people find outdated documentation and examples and waste their time trying to understand why it doesn't work for them...

agoose77 · 2023-07-25T15:06:37Z

@nsmith- here's a few links pertaining to MyST Markdown:

MyST is a spec for a Markdown flavour, and also a brand for open-source tools (https://mystmd.org/guide/quickstart-jupyter-lab-myst)

Jupyter Book builds web "books" using Sphinx, and can read MyST, execute and render notebooks, and integrate with existing Sphinx projects (often by dropping the top-level Jupyter Book CLI and using the Sphinx components). You can add tags to cells to make them drop-downs, just as you can add tags to MyST admonitions to do the same for non-cells.

henryiii · 2023-07-25T16:04:05Z

+1

See https://learn.scientific-python.org/development/guides/docs/

mattbellis · 2023-07-25T17:29:25Z

I'd be interesting in discussing deprecating documentation and ensuring the latest greatest is the actual entrypoint. A lot of people find outdated documentation and examples and waste their time trying to understand why it doesn't work for them...

+1000 to this @clelange

klieret · 2023-07-27T16:17:25Z

Live notes from the discussion from Tue

Tuesday session: User experience for physics data analysis tools/Documentation/Training

Present: Matthew Bellis, Angus, Oksana, Mason, Aman, Zoe, Kilian, Ben, Remco, Clemens, Josue, Benjamin, Juraj

gh issue

Questions & discussion

Documentation

How to connect the different resources? Scikit-HEP is decentralized development, but central vision.

Relation between different kinds of matrial: Diataxis: https://diataxis.fr/ - lots of work but can be used as an overall guide

Training

How to adapt to different prerequisite knowledges?

How to make training discoverable?

HSF Training Center

have plausible web analysis around all of our training material investigate discoverability/user behavior

Had GSoC proposal to rebuild this in a more dynamic website able to list more and filter it by need (and some alpha-versions were made that we could start from)

Need to find good balance between too narrow in focus and too wide

Negative example for "too wide": "Awesome lists" that keep on expanding and stop being useful

Should we have a service for providing a Binder service.

Contribution

Should we have a mechanism for third-parties to contribute documentation to specific repositories from a centralised source?

How can we get users (especially new users) to write documentation?

How can we break down hurdles:

Lean more heavily on Web-IDE (GitHub codespaces etc.) for PRs

Essential to give users the option to give feedback quickly (without having to create issues/pull requests). Ideal: feedback/comment button. Example sphinx-comments

Could people earn "Karma". Become official contributors to the project if they report a documentation issue.

Angus: Let's come up with a "social ethos" to help build a community of users. How to build a community?

Matthew: People are very shy/apprehensive about doing things "in public". Could there be a sandbox for it? Or having private tickets (also think of GDPR)? Philip: Pythia does this.

Matthew: People may start to prefer video resources rather than reading material. One problem is that it is harder to keep video up-to-date.

Kilian: There are already some prototypes video documentation for some HSF Training Modules (like Docker etc.).

Juraj: Tutorials ready from CI.

Angus: create stubs:

Users can see which topics are already identified

New developers can see where to start contributing

Long-term developers can contribute in free time

User feedback:

How can we hear back from the students?

Other ideas/suggestions:

Benjamin: Workshop about "what to do if there's no documentation?"

Philip: Discussions sounds similar to HEP Forge: Find packages, link them together on a page. Do we repeat history? So what can we learn from that? Reasons HEP Forge failed:

ran out of funding

overambitious: wanted to be more than just an organizing/discoverability project but wanted to solve versioning (and failed)

switch from SVN to fabricator/git (and lost people in the switch)

Running code:

Angus: Should we have central "Binder" service?

Making our live easier:

Have more things as prerequisites to even out level of participants and to make sure we don't lose time with "trivail things"

Oksana: How can we train developers that they engage users and write peoper documentation.

How to write:

Conclusions

Discoverability: Need to link resources together and make them discoverable:

Documentation/training material should be interlinked

HSF Training Center can be expanded to make tutorials discoverable. Must strike balance between notable/maintained and inclusive. Considerations:

Resources have to be curated. Set minimum standards for quality and notability.

Be clear about scope, don't be HEP forge or one of the awesome-xyz lists

Plausible can be used to understand where users come from/where they go

Contributions: Making it easier to contribute/give comments (rather than opening PRs): Options include:

teaching people about GitHub Web IDEs for simple PRs

include more feedback/comment buttons (like sphinx-comments), ideally also anonymous.

Prerequisites: Having prerequisites for workshops can "even out" experience levels of people and avoid "trivial questions"

Maintainability: guaranteeing that code examples work (see also CI remark; documentation from Jupyter notebooks) and that interlinking is correct (e.g. linkcheck).

Hacking away & other ideas:

Regular "documentation day"

Half-day workshop as part of PyHEP to help developers write good (or any at all) user guides/documentation. Could also use that to get everyone to interlink things.

To be continued on Thursday

klieret · 2023-07-31T13:42:50Z

Final notes (restructured and including material from plenary session on Thu):

Teaching, training, documentation and coordination

Present Tuesday session: Matthew Bellis, Angus, Oksana, Mason, Aman, Zoe, Kilian, Ben, Remco, Clemens, Josue, Benjamin, Juraj

gh issue

Questions & discussion

Documentation interlinking:

How to connect the different resources (how to guides, training)? Scikit-HEP is decentralized development, but central vision. Possible solution: Analysis gallery

Analysis Gallery: Should there be a central "analysis" gallery

LHCb StarterKit Analysis Essentials has an example analysis for beginners that brings everything together

Astropy also has learn.astropy.org, a collection of notebooks that serve as howto-guides

Matthew: Does the complexity of our "real analyses" map onto such simple examples?

Matt: Also astropy is just a single package (that has majority of fields behind it)

Matthew: ROOT has gallery of examples. Could convert these. (https://root.cern/doc/master/group__Tutorials.html)

Alex: How to curate? What's the threshold for "too simple" to "too hard"

Aman: Could have an interactive "map" that can be clickable and links to the documentation

Jim: Might host training and documentation together

Clemens: Could have "learning paths"

Discoverability

How to make training discoverable?

HSF Training Center

have plausible web analysis around all of our training material investigate discoverability/user behavior

Had GSoC proposal to rebuild this in a more dynamic website able to list more and filter it by need (and some alpha-versions were made that we could start from)

Need to find good balance between too narrow in focus and too wide

Negative example for "too wide": "Awesome lists" that keep on expanding and stop being useful

How to interlink different documentations/trainings

Relation between different kinds of matrial: Diataxis: https://diataxis.fr/ - lots of work but can be used as an overall guide

Getting people to contribute & getting user feedback

Should we have a mechanism for third-parties to contribute documentation to specific repositories from a centralised source?

How can we get users (especially new users) to write documentation?

How to add a "comment" box for notebooks:

Jim: Could have link to gh issue/hackmd/etc.

How to give notification to developer

Could do "hypothesis"

How can we break down hurdles:

Lean more heavily on Web-IDE (GitHub codespaces etc.) for PRs

Essential to give users the option to give feedback quickly (without having to create issues/pull requests). Ideal: feedback/comment button. Example sphinx-comments, sphinx-disqus (from this FAQ on RTD)

Could people earn "Karma". Become official contributors to the project if they report a documentation issue.

Angus: Let's come up with a "social ethos" to help build a community of users. How to build a community?

Matthew: People are very shy/apprehensive about doing things "in public". Could there be a sandbox for it? Or having private tickets (also think of GDPR)? Philip: Pythia does this.

Juraj: Tutorials ready from CI.

Angus: create stubs:

Users can see which topics are already identified

New developers can see where to start contributing

Long-term developers can contribute in free time

Training paradigms:

Matthew: People may start to prefer video resources rather than reading material. One problem is that it is harder to keep video up-to-date.

Kilian: There are already some prototypes video documentation for some HSF Training Modules (like Docker etc.).

Having more things as prerequisites to even out level of participants and to make sure we don't lose time with "trivial things"

Platforms for running code for training:

Angus: Should we have central "Binder" service?

Forum & chat

similar to ROOT forum?

Both chat and forum have merit

Been over a year that we talked about this.

what to use for chat? discord?

How to balance chat vs forum?

Jerry: Bot in chat that generates discourse post, initially invisible, for posterity.

Forum ~ stackoverflow-ish

Other ideas/suggestions:

Benjamin: Workshop about "what to do if there's no documentation?"

Philip: Discussions sounds similar to HEP Forge: Find packages, link them together on a page. Do we repeat history? So what can we learn from that? Reasons HEP Forge failed:

ran out of funding

overambitious: wanted to be more than just an organizing/discoverability project but wanted to solve versioning (and failed)

switch from SVN to fabricator/git (and lost people in the switch)

Oksana: How can we train developers that they engage users and write peoper documentation.

Office hours similar to scipy? - Repeating new contributors meeting

✨ Conclusions & actionable items ✨

Discoverability: Need to link resources together and make them discoverable:

Documentation/training material should be interlinked

HSF Training Center can be expanded to make tutorials discoverable. Must strike balance between notable/maintained and inclusive. Considerations:

Resources have to be curated. Set minimum standards for quality and notability.

Be clear about scope, don't be HEP forge or one of the awesome-xyz lists

Plausible can be used to understand where users come from/where they go

Contributions: Making it easier to contribute/give comments (rather than opening PRs): Options include:

teaching people about GitHub Web IDEs for simple PRs

include more feedback/comment buttons (like sphinx-comments), ideally also anonymous.

include stubs for things that are missing in docs and should be filled in

Prerequisites: Having prerequisites for workshops can "even out" experience levels of people and avoid "trivial questions"

Feedback: Jim recommended directly implementing feedback buttons into notebooks (e.g., via Slido)

Maintainability: guaranteeing that code examples work (see also CI remark; documentation from Jupyter notebooks) and that interlinking is correct (e.g. linkcheck).

Hacking away & other ideas:

Regular "documentation day"

Half-day workshop as part of PyHEP to help developers write good (or any at all) user guides/documentation. Could also use that to get everyone to interlink things.

Actionable items

Fork (or take inspiration from) https://learn.astropy.org

hsf training has around 3-5 alternative "from scratch" implementations/PoCs of similar platforms that we could consider/start from. Might also rope in some of the GSoC candidates with JS knowledge.

Purpose is to provide a minimal (basis?) set of tutorials that provide a starting point for self-learning(?).

Do we split tutorial from guides at the top-level, or as a filter criterion?

Design an updated pipeline/map for AS (maybe even interactive) - https://iris-hep.org/as.html

Can be non-linear, to provide a better visual overview of how different packages integrate/can be used at different stages

Adding a place for videos (for the ones that have been presented live), and a place for time estimate: "This tutorial will take 10 minutes."

Easy pipeline for contributors to add new ones.

eduardo-rodrigues · 2023-08-10T14:30:27Z

Hello @klieret, all. Thank you for making this executive summary. Very handy for people not present and in general. Very instructive also 👍 .

You say above

How to connect the different resources (how to guides, training)? Scikit-HEP is decentralized development, but central vision. > Possible solution: Analysis gallery

What do you mean by decentralised? After all we have always been community-driven/-oriented.

Note also that the idea of an analysis gallery has been with us for a while, see scikit-hep/scikit-hep.github.io#108. It would be excellent if anyone would be willing to push the idea over the creation threshold.

agoose77 · 2023-08-10T14:56:24Z

I can elaborate slightly based upon my recollection.

The challenge with getting started in our ecosystem is that we operate in a very decentralised fashion, crudely on a per-package basis, with some developers maintaining a subset of packages with greater overview. But, users aren't likely to think about their problems in terms of Python packages when e.g. asking the question "how do you read a ROOT file from some CMS experiment and perform this analysis?". It would be nice if we had a central hub that gave users a starting point to

understand how the community is structured (packages, people, etc)
understand which tools they might need (tutorials, guides etc)

An analysis gallery is one aspect to this, but we also discussed a user forum.

pfackeldey · 2023-08-11T13:59:30Z

Hi @agoose77,

I'd like to add a comment/idea to your post which came to my mind after this discussion at PyHEP.dev.

We talked about how important workflow systems are because they encapsulate logical steps in an analysis and connect them in a graph (e.g. Luigi/law). This results in the pattern you describe: one logical step does not necessary involve only one package of the ecosystem, and users (at least me) think in these logical steps. One exemplary step I have in mind is usually done in a typical (CMS) analysis workflow:

Before writing datacards (for combine) you have a bunch of histograms from your coffea processing step, and now you want to read them, rebin them, potentially apply some modifications (e.g. smoothing), and writing them back into ROOT TH1. The steps are:

Read histograms from the coffea output (e.g. pickle) [uproot (reading)]
Calculate a new binning (+ e.g. smoothing) [numpy / numba]
Apply a new binning [boost-histogram / hist]
Write the rebinned histograms to ROOT TH1 [uproot (writing)]

I've added the packages necessary for this logical unit of an analysis in square brackets behind each step (at least that's what we used in our analysis). While each package has wonderful documentation about its own API and usage, there is no - or at least very few - documentation about such a whole logical unit of an analysis.

Thinking very far in the future now: It might be cool to have a sketch like this:

but rather arranged in typical logical units of a full analysis, where users can click on a logical unit and see (real world) examples how a multitude of packages from the ecosystem can be used to solve these common steps.

At the same time this arrangement/sketch might be encouraging for users to use and think in workflows for their analysis as it is already arranged like this.

Sorry to chime in on the discussion as a user, and please ignore the noise if this has already been discussed.

Best, Peter

klieret · 2023-08-21T12:27:50Z

@eduardo-rodrigues @agoose77 Sorry for the late reply! We'll have a dedicated meeting today at 5pm CERN time to discuss how to combine the "analysis gallery" ideas together with a revamp of our training center.

What do you mean by decentralised? After all we have always been community-driven/-oriented

Ah, probably 'decentralized' wasn't the right word (perhaps 'modularized' would have been more factual). This was meant in comparison to ROOT where everything is a single package (which as certain advantages for new learners).

agoose77 · 2023-08-21T16:02:24Z

Chat log from Zoom: https://gist.github.com/agoose77/01a22f4c2a3e33c424815ab68ffa2731

eduardo-rodrigues · 2023-08-30T16:34:38Z

Hello all. Many thanks for your clarifications and comments. Makes very much sense.

Unfortunately you caught me about to go on hols and I'm now catching up; almost done. I will for sure follow what I can.

jpivarski added 2023 PyHEP.dev 2023 topical-group Topic for discussion labels Jun 28, 2023

alexander-held mentioned this issue Jun 29, 2023

User experience for physics data analysis tools #9

Closed

redeboer mentioned this issue Jul 10, 2023

Fitting tools, combined fits, partial wave analysis, and machine learning #5

Closed

redeboer added the documentation see #7 label Jul 11, 2023

klieret mentioned this issue Aug 21, 2023

[page request] Applications Gallery scikit-hep/scikit-hep.github.io#108

Open

jpivarski closed this as completed Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Teaching, training, documentation, and coordinating documentation #7

Teaching, training, documentation, and coordinating documentation #7

jpivarski commented Jun 28, 2023

jpivarski commented Jun 29, 2023

amangoel185 commented Jul 6, 2023 •

edited

btovar commented Jul 8, 2023

JMolinaHN commented Jul 11, 2023

mattbellis commented Jul 15, 2023

ph-ilten commented Jul 18, 2023

ianna commented Jul 19, 2023

klieret commented Jul 19, 2023

mattbellis commented Jul 19, 2023

mdsokoloff commented Jul 19, 2023 •

edited

clelange commented Jul 25, 2023

agoose77 commented Jul 25, 2023 •

edited

henryiii commented Jul 25, 2023

mattbellis commented Jul 25, 2023

klieret commented Jul 27, 2023

Tuesday session: User experience for physics data analysis tools/Documentation/Training

klieret commented Jul 31, 2023

Teaching, training, documentation and coordination

✨ Conclusions & actionable items ✨

eduardo-rodrigues commented Aug 10, 2023

agoose77 commented Aug 10, 2023

pfackeldey commented Aug 11, 2023

klieret commented Aug 21, 2023 •

edited

agoose77 commented Aug 21, 2023

eduardo-rodrigues commented Aug 30, 2023

Teaching, training, documentation, and coordinating documentation #7

Teaching, training, documentation, and coordinating documentation #7

Comments

jpivarski commented Jun 28, 2023

jpivarski commented Jun 29, 2023

amangoel185 commented Jul 6, 2023 • edited

btovar commented Jul 8, 2023

JMolinaHN commented Jul 11, 2023

mattbellis commented Jul 15, 2023

ph-ilten commented Jul 18, 2023

ianna commented Jul 19, 2023

klieret commented Jul 19, 2023

mattbellis commented Jul 19, 2023

mdsokoloff commented Jul 19, 2023 • edited

clelange commented Jul 25, 2023

agoose77 commented Jul 25, 2023 • edited

henryiii commented Jul 25, 2023

mattbellis commented Jul 25, 2023

klieret commented Jul 27, 2023

Tuesday session: User experience for physics data analysis tools/Documentation/Training

klieret commented Jul 31, 2023

Teaching, training, documentation and coordination

✨ Conclusions & actionable items ✨

eduardo-rodrigues commented Aug 10, 2023

agoose77 commented Aug 10, 2023

pfackeldey commented Aug 11, 2023

klieret commented Aug 21, 2023 • edited

agoose77 commented Aug 21, 2023

eduardo-rodrigues commented Aug 30, 2023

amangoel185 commented Jul 6, 2023 •

edited

mdsokoloff commented Jul 19, 2023 •

edited

agoose77 commented Jul 25, 2023 •

edited

klieret commented Aug 21, 2023 •

edited