Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve PR Labeling #3273

Open
allgandalf opened this issue Mar 8, 2024 · 10 comments
Open

Improve PR Labeling #3273

allgandalf opened this issue Mar 8, 2024 · 10 comments

Comments

@allgandalf
Copy link
Collaborator

Bug Description

We recently implemented automated labeling of Pull Requests in #3199, #3204,

This was with very limited scope in mind, and by looking at some comments in the review from @infotroph #3204 (comment), #3204 (comment) I am of the opinion that we need to improve the labels to better define the scope of the Pull requests,

The file for the labels is : https://github.com/PecanProject/pecan/blob/develop/.github/labeler.yml , no need to update automation just the labels. :)

If someone could first propose over here the file structure we should follow and some basic implementation details then it would really be helpful for us

@allgandalf
Copy link
Collaborator Author

allgandalf commented Mar 8, 2024

Adding good first issue label, as this is a beginner friendly issue

Also, asked for help on slack: https://pecanproject.slack.com/archives/C06E3309CJK/p1709884113573149

@Sweetdevil144
Copy link
Contributor

I've been brainstorming about our PR labeling and came up with some ideas to tweak our process. Thought I'd share them here :

  1. Refining Labels: How about splitting 'Documentation' into 'User' and 'Developer' docs? It could help us quickly spot the focus of changes.

  2. Sub-Categories: In sections like 'Models' or 'Modules', sub-categories might be a game-changer. For example, 'Models' could have 'atmosphere' for file data.atmosphere` and similar sub-labels.

  3. Language-Specific Labels: Adding labels for R, Fortran, bash etc., could streamline reviews, especially when changes are language-specific.

  4. Smarter Automation?: While initially said no changes to automation, I'm wondering if it's worth exploring a system that picks up keywords from PRs for label suggestions.

Let me know what you think. If we're on the same page, I can get started on these updates.

@Sweetdevil144
Copy link
Contributor

Here's how I want to proceed with the labeler.yaml:

'Documentation/User':
  - book_source/**
  - documentation/user_guides/**

'Documentation/Developer':
  - CONTRIBUTING.md
  - DEBUGGING.md
  - DEV-INTRO.md

# Introducing sub-categories under Models
'Models/Climate':
  - models/climate/**

'Models/Biophysical':
  - models/biophysical/**

# Adding a new label for Configuration changes
'Configuration':
  - '**/*.config'
  - setup.sh

# Specific labels for different programming languages
'Python':
  - '**/*.py'

'R':
  - '**/*.R'

'JavaScript':
  - '**/*.js'

These changes aim to make our labeling more intuitive and aligned with our project's structure. Let me know what you think! Also, as an addition to the Documentation part. Like, the book-source already has details about changes whgich will be made to book source. Similarly we can maybe move DEV-INTRO.md IN 'DEVELOPER' documentation and README.md within 'USER' guides.

@Sweetdevil144
Copy link
Contributor

Insights @GandalfGwaihir ??

@allgandalf
Copy link
Collaborator Author

@infotroph , if you can find time to review this proposaal, overall I find it useful and what @Sweetdevil144 suggested here is really good I guess, but would love to hear what you have to say :)

@allgandalf allgandalf self-assigned this Mar 18, 2024
@infotroph
Copy link
Member

To define how labeling should work, we should start by agreeing who these labels are for and what they need them to do. Here are a couple user stories that I'm picturing -- please push back if you think these aren't right.

  • "As a first-time contributor to PEcAn who has found a small bug in [module], I want to see if it is fixed in a currently-open PR before I open my own, so that I can avoid duplicating effort"
  • "As a regular contributor to [component] of PEcAn, I want to easily see whether a PR touches [component], so that I can be responsive where my reviews are needed without being overwhelmed by notifications about modules I do not own"
  • "As a maintainer who works across the whole system, I want an overview of what components have changes pending, so that I can group my review effort and make sure that people working on related systems are talking to each other as they work."

For these particular three users (who, to be clear, I just made up!), I suspect the right labeling granularity would be something like individual models (Many people only use PEcAn with one model, so they'll pay attention to the sipnet label but not basgra or vise-versa), a few sub-categories of modules (something like "meteorology", "data assimilation", "format wrangling", ... ?), and single broad-brush labels for topics like "build", "CI", "documentation", "Docker", "web". But my understanding of the labeler patterns is that they need to correspond to paths that actually exist within the directory, so I'm not immediately sure how to define the subcategories other than by listing all the packages that should get that label.

I don't see a need for language-specific labels -- to a first approximation, PEcAn is in R and the exceptions are in places that will be ~obvious from the other labels (e.g. if we had a PHP label it would appear in basically the same PRs labeled web).

@allgandalf
Copy link
Collaborator Author

@Sweetdevil144 your thoughts on ^ ?

@Sweetdevil144
Copy link
Contributor

I suspect the right labeling granularity would be something like individual models (Many people only use PEcAn with one model, so they'll pay attention to the sipnet label but not basgra or vise-versa), a few sub-categories of modules (something like "meteorology", "data assimilation", "format wrangling", ... ?), and single broad-brush labels for topics like "build", "CI", "documentation", "Docker", "web".

I agree with this. Splitting labeling according to sub-models/sub-modules would help a User determine the scope of modules/models affected by a particular PR.

other than by listing all the packages that should get that label

That would be the case for now as Regex-based Path Matching would help in determining individual paths. (But it would be entirely manual, that is, defining which specific sub-modules/sub-models to be implemented, maybe any other approach to optimise/automate it?)

I don't see a need for language-specific labels -- to a first approximation,

That works fine for me. Also, an increased number of labels would surely overwhelm a beginner contributor.

@allgandalf
Copy link
Collaborator Author

I know you're busy with your GSoC project @Sweetdevil144 , when you find time can you give me template of how the labelling should be now after all this discussion ? I'll come up with a PR for the same by next month

@Sweetdevil144
Copy link
Contributor

Hey. Sorry for late response. I think the only changes that can be done should be

  • The Documentation splitting into Dev-Docs and User-Docs
  • Splitting modules and models into each separate paths.

It would go something like this :

# Documentation divided into User and Developer specific
'Documentation/User':
  - book_source/**
  - documentation/user_guides/**
  - README.md

'Documentation/Developer':
  - CONTRIBUTING.md
  - DEBUGGING.md
  - DEV-INTRO.md
  - documentation/developer_guides/**

... Rest of Blocks

# Model-specific sub-categories
'Models/sipnet':
  - models/sipnet/**

'Models/basgra':
  - models/basgra/**

# Modules with defined sub-categories
'Modules/Date-Atmosphere':
  - modules/data.atmosphere/**

'Modules/Data-Assimilation':
  - modules/data_assimilation/**

Should this work fine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants