Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Data Schema] Add an original_status field to table pull_requests and use status to indicate standardized statuses #4745

Closed
6 of 7 tasks
Startrekzky opened this issue Mar 22, 2023 · 7 comments · Fixed by apache/incubator-devlake-website#521
Assignees
Labels
type/feature-request This issue is a proposal for something new
Milestone

Comments

@Startrekzky
Copy link
Contributor

Startrekzky commented Mar 22, 2023

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Use case

DevLake has to standardize pull request statuses from GitHub, GitLab, and BitBucket to provide pre-defined PR/MR metrics in dashboards such as Engineering Overview, as the queries in this dashboard should clearly define which PR is open, merged, or closed in any potential tools.

However, if DevLake transformed the original PR statuses into standardized statuses, users might be confused as the standardized values are different from the original ones.

Therefore, both the original and standardized value of PR statuses need to save in table.pull_requests.

Description

There should be three possible values of status in pull_requests. All PR/MRs from GitHub, GitLab, BitBucket, and Azure Repo should be transformed into these values:

  • OPEN
  • MERGED
  • CLOSED

For BitBucket PRs:

  • BitBucket OPEN -> OPEN
  • BitBucket MERGED -> MERGED
  • BitBucket DECLINED -> CLOSED

For GitHub PRs from RESTful:

  • GitHub OPEN -> OPEN
  • GitHub CLOSED and merged_date is not null -> MERGED
  • GitHub CLOSED and merged_date is null -> CLOSED

For GitHub PRs from GraphQL:

  • GitHub OPEN -> OPEN
  • GitHub MERGED -> MERGED
  • GitHub CLOSED -> CLOSED

For GitLab PRs:

  • GitLab opened -> OPEN
  • GitLab merged -> MERGED
  • GitLab closed or locked -> CLOSED

For Azure PRs (refer to this doc, can you please help verify this? @CamilleTeruel ):

  • Azure active -> OPEN
  • Azure merged -> MERGED
  • Azure abandoned -> CLOSED

To Do:

  • Add an original_status field to table pull_requests to save the original PR/MR status
  • Convert each tool's original PR/MR statuses to standardized statuses and save them to the field status in table pull_requests based on the above rules.
  • Check if dashboards that contain PR-status-realated metrics are affected by this change. For example, GitHub, GitLab, BitBucket, and Engineering Overview dashboard, etc.
  • Update the pull_request description in the schema doc with the above rules.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Startrekzky Startrekzky added the type/feature-request This issue is a proposal for something new label Mar 22, 2023
@Startrekzky Startrekzky added this to the v0.18.0 milestone Mar 22, 2023
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

@github-actions
Copy link

github-actions bot commented May 2, 2023

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

@CamilleTeruel
Copy link
Contributor

For Azure PRs (refer to this doc, can you please help verify this? @CamilleTeruel ):
* Azure pending or 'notSet' or 'notApplicable' -> OPEN
* Azure succeeded -> MERGED
* Azure failed or error -> CLOSED

The azure statuses we get from API are described here
Currently we have:

  • Azure active -> OPEN
  • Azure completed -> MERGED
  • Azure abandoned -> CLOSED

@Startrekzky
Copy link
Contributor Author

Startrekzky commented May 10, 2023

@CamilleTeruel Are these all Azure PR statuses? I noticed that there're statuses like 'pending', 'failed', and 'error' as described in the Azure API docs. Have we standardized them or kept the original values?

@CamilleTeruel
Copy link
Contributor

@CamilleTeruel Are these all Azure PR statuses?

There is also all and notSet. all is just used as a a query parameter for searching PRs and notSet is mapped to None but I never met it in real world data.

I noticed that there're statuses like 'pending', 'failed', and 'error' as described in the Azure API docs. Have we standardized them or kept the original values?

Those are the original values.
The pending, failed, and error are not values of the same concept of "status".
Confusingly, there are two things that are called PR status in azure devops:

  • there is the status property of pull requests objects you get from _apis/git/repositories/{repositoryId}/pullrequests which tell whether a PR is open ("active") closed ("abandonned") or merged ("completed"). This is the one we are interested in here.
  • there are the values that a third party CI tool can attach to a PR that is accessible from /_apis/git/repositories/{repositoryId}/pullRequests/{pullRequestId}/statuses that we don't collect.

@Startrekzky
Copy link
Contributor Author

That's very clear. Thank you, @CamilleTeruel

@Startrekzky
Copy link
Contributor Author

Startrekzky commented Jun 29, 2023

I tested the PR statuses with the SQL below and it passed:

SELECT distinct SUBSTRING_INDEX(id,':',1) as source, status, original_status FROM pull_requests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature-request This issue is a proposal for something new
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants