Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11804: [Developer] Offer to create JIRA issue #9598

Closed
wants to merge 4 commits into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Feb 27, 2021

Rationale

Currently all contributors are required to make a JIRA account and do some mechanical JIRA creation to create well formed Arrow PRs. This is mindless work and people who are used to it may forget the barrier it imposes on casual contributions and burden on maintainers to keep JIRA synced.

Here is a good example of such a PR where we almost lost a potential contributor: #9576 (comment) (and despite @pierwill and I both working for InfluxData I swear I didn't put him up to this :) )

More discussion can be found on this mailing list thread

Changes

This PR modifies the merge_pr.py to offer to create a JIRA and link it back to github if a pre-existing JIRA can not be found. For the existing workflow where the PR title contains the appropriate JIRA reference, there is no change. With this change, devs / maintainers would only have to ensure the title of PRs started with the correct components, and JIRAs could be automatically created, if they so desire

The script requires ARROW_GITHUB_API_TOKEN to have been set to some token that has the ability to update PR descriptions.

Current behavior

If merge_pr.py is invoked on a PR without a JIRA ticket reference the following error occurs

(arrow_dev) alamb@MacBook-Pro:~/Software/arrow2/rust$ ../dev/merge_arrow_pr.py 9594
ARROW_HOME = /Users/alamb/Software/arrow2/dev
PROJECT_NAME = arrow
Restoring head pointer to 0f647261
Note: switching to '0f647261'.
...

HEAD is now at 0f6472610 ARROW-11778: [Rust] Cast from LargeUtf8 to Numerical and temporal types
Traceback (most recent call last):
  File "/Users/alamb/Software/arrow2/rust/../dev/merge_arrow_pr.py", line 597, in <module>
    cli()
  File "/Users/alamb/Software/arrow2/rust/../dev/merge_arrow_pr.py", line 566, in cli
    pr = PullRequest(cmd, github_api, PR_REMOTE_NAME, jira_con, pr_num)
  File "/Users/alamb/Software/arrow2/rust/../dev/merge_arrow_pr.py", line 309, in __init__
    self.jira_issue = self._get_jira()
  File "/Users/alamb/Software/arrow2/rust/../dev/merge_arrow_pr.py", line 336, in _get_jira
    self.cmd.fail("PR title should be prefixed by a jira id "
  File "/Users/alamb/Software/arrow2/rust/../dev/merge_arrow_pr.py", line 270, in fail
    raise Exception(msg)
Exception: PR title should be prefixed by a jira id ARROW-XXX or PARQUET-XXX, but found [Rust] Add link to the doc

New Behavior

With the changes in this PR, the maintainer is prompted to optionally create the JIRA ticket:

(arrow_dev) alamb@ip-10-0-0-49:~/Software/arrow2$ ./dev/merge_arrow_pr.py 9594
ARROW_HOME = /Users/alamb/Software/arrow2/dev
PROJECT_NAME = arrow
No JIRA link found
  Looked for PR title prefixed by ARROW-XXX or PARQUET-XXX
=== NEW JIRA ===
Summary		[Rust] Add link to the doc
Assignee	NONE
Components	Rust
Status		New

Create JIRA and link to PR? (y/n): y
  Created ARROW-11818

=== Pull Request #9594 ===
title	ARROW-11818: [Rust] Add link to the doc
source	alamb/alamb/test_change
target	master
url	https://api.github.com/repos/apache/arrow/pulls/9594
=== JIRA ARROW-11818 ===
Summary		[Rust] Add link to the doc
Assignee	NOT ASSIGNED!!!
Components	Rust
Status		Open
URL		https://issues.apache.org/jira/browse/ARROW-11818

Proceed with merging pull request #9594? (y/n):
...

Follow on Work

  1. If this script is accepted, I would like to file a ticket to improve the wording of the message left by the JIRA bot example to make it clear that the creation of a JIRA account + ticket is not required (though perhaps still encouraged for large changes)

  2. Add additional mappings from PR description to components (I only added the ones I use and a few others to prove out the idea)

@github-actions
Copy link

@ritchie46
Copy link
Contributor

I think this is a good idea, if feels mechanical :).

Piggybacking a bit on this on communications in general. I find the level of entry (asides from PRs quite high). In the past I've asked something on the mailing list and on a JIRA issue and did not get any response. Is there at any time something considered as a chat/ discussions forum like gitter or github discussions?

@alamb
Copy link
Contributor Author

alamb commented Feb 28, 2021

In the past I've asked something on the mailing list and on a JIRA issue and did not get any response. Is there at any time something considered as a chat/ discussions forum like gitter or github discussions?

@jorgecarleitao has mentioned something about that as well. I personally find it hard to keep up with the firehose of updates across the various channels of (PR, JIRA, mailing list) and so I spend most of my efforts / time in github as that is where the code lives.

@elferherrera
Copy link
Contributor

I would agree that a discord channel would be easier to follow than the mailing list. It would also help new contributors to get familiar and ask questions that may take a lot of time to answer in the mailing list

@jorgecarleitao
Copy link
Member

@ritchie46 and @elferherrera , I agree with you. Arrow did have a slack channel, but that was closed. See rational here: https://mail-archives.apache.org/mod_mbox/arrow-dev/201806.mbox/%3cCAJPUwMDyiZfXUWdW1Mk7UwKbbvd--Dws6oOVqjBF3ZadTmXTGg@mail.gmail.com%3e .

I suggest that you raise that in the mailing list so that we keep this discussion focused on JIRA's burden.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • When should we set assignee?
  • It may be useful that we have a mode or separated tool just doing this. If we can do this before merge stage, we can sync all discussions after running this feature into the associated JIRA issue.

# Return the best matching JIRA component from a PR title, if any
# "[Rust] Fix something" --> Rust
# "[Rust][DataFusion] Fix" --> Rust - DataFusion
# "[CPP] Fix " --> "C++"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# "[CPP] Fix " --> "C++"
# "[C++] Fix " --> "C++"

print("Assignee\tNONE")
print("Components\t{}".format(component))
print("Status\t\tNew")
description = ("Issue automatically created " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using the pull request description as is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale to add a note on provenance was:

  1. I felt for auto generated tickets it was unlikely the author would keep JIRA and the Pull Request synced, and the PR was likely to be the source of truth
  2. Hint to anyone reading the JIRA issue they should look at the PR if they wanted to know the current status of the change.

I am happy to remove the auto-link if that is the consensus

@alamb
Copy link
Contributor Author

alamb commented Mar 1, 2021

When should we set assignee?

@kou in order to set the JIRA assignee I think the code would need a mapping from github username to JIRA username. I could not find any such mapping. Since the original provenance of the code is tracked in Github (as github usernames), if someone feels the JIRA assignee field set is important, they can always post process tickets without the assignee set, find the github author, and update the JRIA accordingly.

It may be useful that we have a mode or separated tool just doing this. If we can do this before merge stage, we can sync all discussions after running this feature into the associated JIRA issue.

I think we could invoke this function as a standalone tool pretty easily. That might be a good follow on PR for anyone who would use it. However, I think getting contributors to switch to JIRA for discussion rather than Github pull request comments will be harder (as it requires changing people's behavior :) )

alamb added a commit that referenced this pull request Mar 1, 2021
This is a test PR with a minor fix, that has no JIRA issue, that I am using to validate my new fancy script (#9598). However, I think it is a reasonable change on its own

Closes #9594 from alamb/alamb/test_change

Authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
@elferherrera
Copy link
Contributor

@jorgecarleitao after reading the discussion about slack channel I understand why they closed it. It makes sense since they want to keep all conversations stored and they also want traceability. However, we should specify that these real time chats should be used only for quick questions and ad hoc conversations. If someone wants to get consensus on a topic then it would be directed to the mailing list. That should be one of the many rules people using the chat must follow. Also, there should be individual channels for each language. The small mini cells that are being created for each language should be able to talk to each other in a rather simple and quick way. Don't you think so?

@nealrichardson
Copy link
Contributor

As an alternative approach, what if we had a GitHub Actions comment bot that would create a JIRA issue from your issue or PR? That would solve @kou's concern on the mailing list about only adding the JIRA at the end. Apache Way questions aside, a practical consideration for wanting the JIRA sooner is that we use JIRA actively to prepare a release, and it would be difficult to see how many open issues are left to merge for the current release if the issues haven't been made.

@nealrichardson
Copy link
Contributor

(I meant to start that with: thanks for taking the initiative to make our processes better! ❤️)

@alamb
Copy link
Contributor Author

alamb commented Mar 3, 2021

(I meant to start that with: thanks for taking the initiative to make our processes better! ❤️)

Thank you!

As an alternative approach, what if we had a GitHub Actions comment bot that would create a JIRA issue from your issue or PR?

@neilrichardson,

I think having the actions bot offer to create a JIRA issue for you would work great for me personally. The reason I didn't start from there was

  1. Auto creating a ticket effectively removes the choice of should a ticket be created from the PR author and maintainer, and I worried we might create tickets accidentally for people who simply forgot the reference in the title as well as run into philosophical objections to such behavior.
  2. Some surmountable but annoying technical concerns over credentials and not wanting to sort out JIRA authorization tokens / process for github bots (e.g. whose JIRA account credentials would it use, etc, who would it assign, etc.)

In general, I felt the approach to adding an optional, manually acknowledged step, to merge_pr was a minimal change to the workflow that improved experiences (merge_pr.py was going to exit anyways), and left the maintainer in control of if a JIRA should be created.

Clearly my attempt to dodge the "should we still be using JIRA" conversation failed miserably - LOL. I know of few other topics that incite such passion than engineering process and workflow tools!

@nealrichardson
Copy link
Contributor

In general, I felt the approach to adding an optional, manually acknowledged step

I agree. What I'm envisioning is that the existing bot that says "Thanks for opening a pull request! Please name your PR with ARROW-XXXX: Issue Title, or go to JIRA and make an issue" would instead say "Thanks for opening a pull request! Please name your PR with ARROW-XXXX: Issue Title, or comment please make a JIRA on the PR to have me make one for you". And a similar message for new GitHub Issues. That way, the contributor doesn't have to leave GitHub if they don't want, but we get the JIRA issue up front too.

@@ -259,6 +268,21 @@ def get_pr_data(self, number):
return get_json("%s/pulls/%s" % (self.github_api, number),
headers=self.headers)

def update_pr_title(self, number, title):
if not self.headers:
msg = ("Can not update PR {} title to '{}'. " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Can not/Cannot/



# If no jira ID can be found for the PR, offer to create one
# returns returns the github pr_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate word: returns returns

return pr_data

components = [{"name": component}]
summary = "{}".format(title)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, f-strings could be used in most of the places where format is currently being used.

Before:

>>> title = "Hello"
>>> summary = "{}".format(title)
>>> summary
'Hello'

After:

>>> title = "Hello"
>>> summary = f"{title}"
>>> summary
'Hello'

@@ -547,6 +566,86 @@ def get_pr_num():
return input("Which pull request would you like to merge? (e.g. 34): ")


_JIRA_COMPONENT_REGEX = re.compile(r'(\[[^\]]*\])+')

# Maps PR title prefixes to JIRA components
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Python be in this mapping?

)

cmd.continue_maybe("Create JIRA and link to PR?")
issue = jira_con.create_issue(project='ARROW',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

The bracketing style isn't totally consistent in this file, but it appears to be either:

    issue = jira_con.create_issue(project='ARROW',
                                  summary=summary,
                                  description=description,
                                  issuetype={'name': 'Bug'},
                                  components=components)

or:

    issue = jira_con.create_issue(
        project='ARROW',
        summary=summary,
        description=description,
        issuetype={'name': 'Bug'},
        components=components,
    )

But not:

    issue = jira_con.create_issue(project='ARROW',
                                  summary=summary,
                                  description=description,
                                  issuetype={'name': 'Bug'},
                                  components=components,
                                  )

@alamb
Copy link
Contributor Author

alamb commented Mar 11, 2021

Thanks @dianaclarke -- I am not sure what the fate of this PR will become. I thought it improved the workflow but the discussion on the mailing list left me a bit confused.

My confusion largely stems from the various different topics that came up but I never did get an answer to this particular PR's code (nor do I really want to spend time starting another lengthy discussion about the pros and cons of JIRA :) )

@dianaclarke
Copy link
Contributor

I'm also not entirely sure where the conversations left off. Mostly, I was just reading this code as a learning exercise, so don't mind me ;)

@alamb
Copy link
Contributor Author

alamb commented Apr 14, 2021

I think we failed to reach consensus that this PR did more good than harm, so closing it

@alamb alamb closed this Apr 14, 2021
alamb added a commit to apache/arrow-rs that referenced this pull request Apr 20, 2021
This is a test PR with a minor fix, that has no JIRA issue, that I am using to validate my new fancy script (apache/arrow#9598). However, I think it is a reasonable change on its own

Closes #9594 from alamb/alamb/test_change

Authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
This is a test PR with a minor fix, that has no JIRA issue, that I am using to validate my new fancy script (apache#9598). However, I think it is a reasonable change on its own

Closes apache#9594 from alamb/alamb/test_change

Authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
This is a test PR with a minor fix, that has no JIRA issue, that I am using to validate my new fancy script (apache#9598). However, I think it is a reasonable change on its own

Closes apache#9594 from alamb/alamb/test_change

Authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants