Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What do we do when(if?) major tools stop calling the default branch master? #5215

Closed
chris48s opened this issue Jun 14, 2020 · 15 comments
Closed
Labels
service-badge Accepted and actionable changes, features, and bugs

Comments

@chris48s
Copy link
Member

⛈️ There's a storm brewing... time for a GitHub essay 📚

For the last ~15 years the standard name for a git default branch has been master. This convention was established by git and all major git-based ecosystems have inherited the decision. Shields inherits this norm: Branch is an optional param. We expect your default branch to be called master. If you want a badge for a non-default branch, or you've called your default branch something else, we require you to specify it. This assumption applies to source control services (github, bitbucket, gitlab, etc) and source control-adjacent services (CI, coverage tools, etc). There are currently a number of discussions going on in various git communities/businesses about changing the standard name for a default branch. For example:

It is worth noting that in all cases platforms are looking at changing the default for newly created repos. None are looking at retrospectively forcing a change on repos that already exist. Given GitHub and GitLab are working on it I wouldn't be surprised to see bitbucket follow suit, although that is entirely unsubstantiated speculation.

Speaking of unsubstantiated speculation, I figure there's probably 3 ways things might play out here..

🗞️ Read all about it - nothing happened [highly unlikely]

Anticlimax. None of this goes anywhere. All common git source control tools/platforms continue to call their default branch master.

🌈 Why can't we all just get along? [also quite unlikely]

The entire git community unites and reaches a consensus. All git clients and platforms unanimously decide on a single blessed solution. The gods of git decree "Thy default branch shall henceforth be called primary". The villagers rejoice.

💥 The great split [most likely]

Turns out naming things is hard. Who knew? 🤔 Nobody can agree on anything. The ecosystem splinters into a fragmented mess where everyone's default branch is variously called default, main, trunk, develop, release, base, primary etc. Defaults vary by platform, convention varies within platforms. Default vs main is the new tabs vs spaces. Master traditionalists rage against the Trunk reformers. You've met developers before, right?


So.. what are we going to do? Well first, we wait..

Right now today, if you create a git repo using any mainstream tool, the default branch is going to be called master unless you explicitly change it. While a small number of repositories (shields included) are looking at making a change, even if all the major platforms change the default today, the vast majority of git repositories created in the last 15 years have a default branch called master. "Your default branch is probably called master" might not be everyone's favourite assumption, but it is currently accurate and backwards compatible.

..but then what? Change might happen fast, more likely it will play out slowly, with different tools making changes at different times, but at some point we need to react. I think there are 3 important principles that are worth having in mind as we think about solutions (spoiler alert: we probably have to compromise at least one of them):

⬅️ Backwards Compatibility

We strive to maintain backwards compatibility and usually prioritise backwards compatibility for our existing users ahead of other objectives we would like to pursue as maintainers. If you put a badge in your README in 2014 and the service is still running, the badge should still render today. On occasions where we have made a non-BC change, we don't do so lightly. Apart from any other consideration we don't register or track our users and we basically have no good way to communicate a non-BC change to 99% of our userbase. This kind of ties our hand here.

🏎️ Performance

A badge should be quick to render. In most contexts where our badges are displayed there's a hard limit of 4 seconds on a request. In general we try usually avoid accepting badges that require more than one network call to render (although there is some non-zero number of exceptions to this).

🌱 Reflect Community Norms

Shields is a project which is descriptive rather than prescriptive. If your package registry uses a weird version schema, we implement a weird version parser/sorter. We don't tell you you're doing versioning wrong. We reflect the established norms of the communities we serve.


So.. at some point things change and we need to react. What are the available options? We'll probably end up variously adopting several of these strategies depending on upstream decisions/support/etc.

🤷 Do nothing

Continue to assume the main branch is called master if it is not specified, even if upstream services change their default for newly created repos. The big advantage of this is maintaining backwards compatibility for existing users. The big downside is our defaults don't line up with the upstream defaults moving forwards, which will be less intuitive for new users.

🔨 The big BC break

The opposite approach. Upstream services change their default and we change with them. This would be the biggest non-BC change we've ever made. Even if we leave it a long time, there's never a good time to do it. Community backlash ensues.

⚠️ Make branch a required param

:branch? becomes :branch. The one character change that broke everyone's readme 😱 This would also be a huge BC break, but a maybe a more "egalitarian" one in that it breaks everything for everyone (users of the new and old defaults alike). This strategy could be particularly necessary if upstream services go in the direction of "lets make a decision not to decide" and defer the decision to users rather than standardize and if a large number of services don't provide a way to find out the default branch.

🧠 Try to be clever

In some cases, it may be possible for us to perform some kind of query which asks an upstream service "what is the default branch for this repo/project called". If this is possible, its going to be a good solution for new and existing users. I see 2 likely downsides. One is that it will depend heavily on upstream support. Maybe github's API gives us a way to find out the name of the default branch but coveralls doesn't (for example) which potentially leads to inconsistency (if I use the defaults I have to specify my branch name with some services but not others). The other big problem is likely to be performance. If the workflow ends up being one request to find out the name of the default branch then another request to get [metric we care about] for that branch, potentially we double our network latency for a substantial subset of our badges if we rely on this approach widely. If we can do it in a single request and defer the process of determining the default upstream without sacrificing performance this is an ideal solution, but I don't anticipate it being available in all cases.

💡 Your idea here

Other thoughts? Suggestions welcome... 🙏


Finally, to give an idea of the likely scope this issue, here is a rough list of all the service families where 1 or more badges has the concept of an optional branch parameter:

  • appveyor
  • azure-devops
  • bitbucket
  • bitrise
  • buildkite
  • circleci
  • cirrus
  • codacy
  • codecov
  • codefactor
  • codeship
  • continuousphp
  • coveralls
  • drone
  • github
  • gitlab
  • nycrc
  • osslifecycle
  • packagist
  • requires
  • scrutinizer
  • shippable
  • travis
  • visual-studio-app-center
  • wercker

This is a sizeable list and includes a number of our most popular/highest traffic service families.

@chris48s chris48s added the needs-discussion A consensus is needed to move forward label Jun 14, 2020
@JoeIzzard
Copy link
Contributor

One potential solution to maintain backwards compatibility, and future compatibility would be to fallback on master as an assumption. So for example, assuming main is the new name used:

  • Request against the branch "Main"
  • If this fails, try again with branch "Master"

This has the benefit of maintaining backwards compatibility with existing projects and avoiding a potential performance hit for new projects. It would mean all existing projects, that don't choose to update (Some may do) would have a performance hit on each badge.

Another problem I could see developing if Git providers go different routes, for example Git choosing 'trunk', GitHub choosing 'main' and Gitlab choosing 'primary' for example, would be that CI/CD providers could make an assumption for each codebase provider. This would mean we would need to know the provider to poll the CI/CD in this manner.

@calebcartwright
Copy link
Member

Thanks for laying all this out Chris! I didn't get a chance to read through everything yet, but curious to what degree our redirectors could be of use as it relates to making "breaking" changes to the routes (either assumed default branch or making branch a required param) while maintaining BC (for example, have redirectors that handle the current/assumed-default-branch-named-master case by redirecting /foo/bar to /foo/bar/master)

@andersk
Copy link

andersk commented Jun 15, 2020

Maybe github's API gives us a way to find out the name of the default branch but coveralls doesn't (for example)…

FYI, the default branch is part of the standard Git protocol—as it must be, since it’s used in git clone and git remote set-head --auto. One way to query it directly is

$ git ls-remote --symref https://github.com/desktop/desktop.git HEAD
ref: refs/heads/development	HEAD
21ab3ce0d6f3e8b36a5c6794df7388bf4bcca72b	HEAD

@flying-sheep
Copy link

flying-sheep commented Jun 15, 2020

Should be easy, right?

Call the branchless URLs “legacy”, keep them pointing at “master” and change your UI so it'll give people URLs with explicitly stated branches.

So the UX would be that someone enters the repo name in your UI, you then do a git request to find out the default branch name, and then show a page that gives the user URLs to copy that contain both repo name and default branch.

In fact, going that route from the start would have been more correct and more user friendly for people already not using “master” as default branch name. Nobody needs branchless URLs.

@chris48s
Copy link
Member Author

Thanks for the replies on this, all 👍 There are several great suggestions here I had not thought of.

@JoeIzzard commented:

So for example, assuming main is the new name used:

  • Request against the branch "Main"
  • If this fails, try again with branch "Master"

This has the benefit of maintaining backwards compatibility with existing projects and avoiding a potential performance hit for new projects. It would mean all existing projects, that don't choose to update (Some may do) would have a performance hit on each badge.

Nice suggestion. In a lot of cases we would be making two requests, but as more new projects are created over time (or more existing projects choose to adopt it), we would gradually head towards more and more projects requiring only a single API call over time.

@andersk commented:

FYI, the default branch is part of the standard Git protocol—as it must be, since it’s used in git clone and git remote set-head --auto. One way to query it directly is git ls-remote --symref https://github.com/desktop/desktop.git HEAD

Thanks for the suggestion, but when we render a badge relating to a git repo we don't take a local checkout of the repo (and we couldn't for performance reasons). We need to rely on what is exposed via the APIs for affected platforms.

@calebcartwright commented:

curious to what degree our redirectors could be of use as it relates to making "breaking" changes to the routes (either assumed default branch or making branch a required param) while maintaining BC (for example, have redirectors that handle the current/assumed-default-branch-named-master case by redirecting /foo/bar to /foo/bar/master)

I think

  • branch is not a required param. If you don't specify it, we'll assume master
  • branch is a "required" param (but if you don't specify it we'll assume master for backwards-compatibility via a redirect - don't tell anyone though 😉 )

functionally achieve exactly the exact same thing from the end user's point of view, but one of them makes one request and the other makes two requests (at least at the badge level). Maybe there is a difference in how we communicate it and how we present things in the front-end though. We could at least show branch as a required param in the builder UI even if its not really required at the routing layer, which brings us nicely to...

@flying-sheep commented:

Should be easy, right?

Probably not, but I admire your optimism! 😄

Call the branchless URLs “legacy”, keep them pointing at “master” and change your UI so it'll give people URLs with explicitly stated branches.

So the UX would be that someone enters the repo name in your UI, you then do a git request to find out the default branch name, and then show a page that gives the user URLs to copy that contain both repo name and default branch.

This is another nice suggestion I hadn't considered. This way we wouldn't change anything at all at the routing level, no breaking change, but we could do a bit more in the front-end to make a smarter suggestion to make a sensible suggestion for new users/repos.

One thing we don't really know is how many people actually use https://shields.io/ as their primary mechanism to generate badges. There are definitely a lot of badges that get generated via copy & paste or generated by template projects/README generation tools.

@flying-sheep
Copy link

flying-sheep commented Jun 15, 2020

Ha, yeah, I’ve been a developer for too long to make assumptions about something in a codebase I know nothing about, sorry!

There are definitely a lot of badges that get generated via copy & paste or generated by template projects/README generation tools.

Good point. Maybe we could enumerate those before making assumptions about their existence? If people copy&paste instead of using an “official” way, they’re on their own, but if there’s tools, those tools should do the right thing.

Thanks for the suggestion, but when we render a badge relating to a git repo we don't take a local checkout of the repo

You misunderstood. This doesn’t check out anything (and neither requires a checked-out copy), ls-remote works on … remotes. Check out man git-ls-remote

Therefore another option of course is to always query a repo’s default branch name when a request to a branchless URL is made. Feels like it’ll be wasteful and slow though …

  • Request against the branch "Main"
  • If this fails, try again with branch "Master"

Not a fan of enhancing the existing assumption magic by making more assumptions. This would have all the downsides of querying the actual default branch name (as mentioned above) without the added advantage of actually being correct. Just imagine one of both or both branches being there and none of them being the default branch, that’d be confusing!

@andersk
Copy link

andersk commented Jun 15, 2020

@andersk commented:

FYI, the default branch is part of the standard Git protocol—as it must be, since it’s used in git clone and git remote set-head --auto. One way to query it directly is git ls-remote --symref https://github.com/desktop/desktop.git HEAD

Thanks for the suggestion, but when we render a badge relating to a git repo we don't take a local checkout of the repo (and we couldn't for performance reasons). We need to rely on what is exposed via the APIs for affected platforms.

The ls-remote command does not need a local checkout. It works entirely remotely.

@chris48s
Copy link
Member Author

The ls-remote command does not need a local checkout. It works entirely remotely.

Thanks for clarifying! TIL 🎓

So we could actually shell out to git to reliably get the default branch name for any repo regardless of what the platform exposes via the API 🙂 It is still 2 network requests though, which does have a performance cost.
I need to do a bit of thinking about how that applies to the git-adjacent tools (e.g: build/coverage tools). I wonder if we always know enough from the badge URL to infer where the repo is hosted or if we only do in a subset of cases.

@JoeIzzard
Copy link
Contributor

JoeIzzard commented Jun 15, 2020

Not a fan of enhancing the existing assumption magic by making more assumptions. This would have all the downsides of querying the actual default branch name (as mentioned above) without the added advantage of actually being correct. Just imagine one of both or both branches being there and none of them being the default branch, that’d be confusing!

I admit it's not a foolproof method, my view was that if master and main exist, it uses the new default, which if that isn't correct you need to specify it as you do now anyway. It would protect performance on new projects but at the expense of the old; which when compared results in less of a hit overall.

Another option could also be using the domain as a distinguisher. There was talk about dropping img as a requirement to move onto Heroku and allow requests directly to shields.io for badges. Would there be a way to detect the URL, and if coming from img, which could be assumed legacy, change default? For example:

  • Badge request to img.shields.io would assume default is master
  • Badge request to shields.io would assume the default is main

If you are not using the default for that version, you need to specify. This maintains one API call for existing and future badges, while also allowing us to use a default so the branch parameter remains optional. It would complicate the move to Heroku (Issue #5014) a bit and may mean we limit shields.io to only serve places with a new default (Or platforms with no change required or intended) to prevent people generating badges with a default change coming.

@chris48s
Copy link
Member Author

chris48s commented Jun 18, 2020

I did a bit more digging through all the services with an optional branch param. One bit of good news is that in many of the above cases we are already issuing a branchless query to the upstream service so we're already in a position where we will inherit the upstream service's decision. The list of service families where we've got 1 or more badges where we're actually explicitly making the assumption about default branch name is:

which is a much more manageable list than I originally feared 🎉

I suspect Packagist branch aliases is probably the most complicated case so I'm going to save that particular can of worms for another day 🐛

With bitbucket, github, gitlab, nycrc, osslifecycle and travis, we do always know where the repo is hosted so we know we can reliably query the remote for the default branch. With the Github badges it might make sense to see if we can rewrite some of them to use the GraphQL API and do it in one call though.

Shippable supports both GitHub and Bitbucket, but we don't know from the shippable projectId where the repo is hosted. If that (and maybe packagist) are the only services where we have to do something different I reckon we can live with that.

I think the next steps are to

  • investigate the best approach is for github badges - does the V4 API help us out here?
  • have a look at a POC for the 2x requests approach for the affected bitbucket/gitlab/nycrc/osslifecycle/travis badges and see what the performance is like

I'm happy to pick that up and I'll worry about shippable/packagist after.

@andersk
Copy link

andersk commented Jun 19, 2020

You may be able to pass the symbolic ref name HEAD to some APIs where you might otherwise pass a branch name, e.g. https://api.github.com/repos/badges/shields/commits/masterhttps://api.github.com/repos/badges/shields/commits/HEAD.

@chris48s
Copy link
Member Author

Getting there - PRs for everything except packagist branch aliases are either merged or open

@chris48s
Copy link
Member Author

Still haven't got round to working on it properly yet, but a bit of reading on composer/packagist: https://blog.packagist.com/composer-and-default-git-branches/

The packagist API is now surfacing a default-branch param, so I guess anywhere we were hard-coding dev-master we should now be looking for the alias where default-branch = True and using that instead to bring our logic into line with the composer 2 behaviour.


📰 in other news
Atlassian confirm their move: https://bitbucket.org/blog/moving-away-from-master-as-the-default-name-for-branches-in-git
Git 2.28 introduces init.defaultBranch: https://github.blog/2020-07-27-highlights-from-git-2-28/

@chris48s
Copy link
Member Author

chris48s commented Sep 6, 2020

#5474 closes the final outstanding service

@chris48s chris48s closed this as completed Sep 6, 2020
@JamesBrittain95

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service-badge Accepted and actionable changes, features, and bugs
Projects
None yet
Development

No branches or pull requests

6 participants