Inform and Act

Read our official statements regarding recent executive orders and find resources to help you take action. Learn more

Introducing review requests

You can now request a review explicitly from collaborators, making it easier to specify who you'd like to review your pull request.

You can also see a list of people who you are awaiting review from in the pull request page sidebar, as well as the status of reviews from those who have already left them.

gif of requesting review from sidebar

Pending requests for review will also show in the merge box. They do not affect mergability, however, so you can still merge your pull request even if you are still awaiting review from another collaborator.

image of merge box showing a requested review

Learn more about requesting reviews in our Help docs.

Relative links for GitHub pages

You've been able to use relative links when authoring Markdown on for a while. Now, those links will continue to work when published via GitHub Pages.

If you have a Markdown file in your repository at docs/, and you want to link from that file to docs/, you can do so with the following markup:

[a relative link](

When you view the source file on, the relative link will continue to work, as it has before, but now, when you publish that file using GitHub Pages, the link will be silently translated to docs/another-page.html to match the target page's published URL.

Under the hood, we're using the open source Jekyll Relative Links plugin, which is activated by default for all builds.

Relative links on GitHub Pages also take into account custom permalinks (e.g., permalink: /docs/page/) in a file's YAML front matter, as well as prepend project pages' base URL as appropriate, ensuring links continue to work in any context.

Happy (consistent) publishing!

Git 2.11 has been released

The open source Git project has just released Git 2.11.0, with features and bugfixes from over 70 contributors. Here's our look at some of the most interesting new features:

Abbreviated SHA-1 names

Git 2.11 prints longer abbreviated SHA-1 names and has better tools for dealing with ambiguous short SHA-1s.

You've probably noticed that Git object identifiers are really long strings of hex digits, like 66c22ba6fbe0724ecce3d82611ff0ec5c2b0255f. They're generated from the output of the SHA-1 hash function, which is always 160 bits, or 40 hexadecimal characters. Since the chance of any two SHA-1 names colliding is roughly the same as getting struck by lightning every year for the next eight years1, it's generally not something to worry about.

You've probably also noticed that 40-digit names are inconvenient to look at, type, or even cut-and-paste. To make this easier, Git often abbreviates identifiers when it prints them (like 66c22ba), and you can feed the abbreviated names back to other git commands. Unfortunately, collisions in shorter names are much more likely. For a seven-character name, we'd expect to see collisions in a repository with only tens of thousands of objects2.

To deal with this, Git checks for collisions when abbreviating object names. It starts at a relatively low number of digits (seven by default), and keeps adding digits until the result names a unique object in the repository. Likewise, when you provide an abbreviated SHA-1, Git will confirm that it unambiguously identifies a single object.

So far, so good. Git has done this for ages. What's the problem?

The issue is that repositories tend to grow over time, acquiring more and more objects. A name that's unique one day may not be the next. If you write an abbreviated SHA-1 in a bug report or commit message, it may become ambiguous as your project grows. This is exactly what happened in the Linux kernel repository; it now has over 5 million objects, meaning we'd expect collisions with names shorter than 12 hexadecimal characters. Old references like this one are now ambiguous and can't be inspected with commands like git show.

To address this, Git 2.11 ships with several improvements.

First, the minimum abbreviation length now scales with the number of objects in the repository. This isn't foolproof, as repositories do grow over time, but growing projects will quickly scale up to larger, future-proof lengths. If you use Git with even moderate-sized projects, you'll see commands like git log --oneline produce longer SHA-1 identifiers. [source]

That still leaves the question of what to do when you somehow do get an ambiguous short SHA-1. Git 2.11 has two features to help with that. One is that instead of simply complaining of the ambiguity, Git will print the list of candidates, along with some details of the objects. That usually gives enough information to decide which object you're interested in. [source]

SHA-1 candidate list

Of course, it's even more convenient if Git simply picks the object you wanted in the first place. A while ago, Git learned to use context to figure out which object you meant. For example, git log expects to see a commit (or a tag that points to a commit). But other commands, like git show, operate on any type of object; they have no context to guess which object you meant. You can now set the core.disambiguate config option to prefer a specific type. [source]

Automatically disambiguating between objects

Performance Optimizations

One of Git's goals has always been speed. While some of that comes from the overall design, there are a lot of opportunities to optimize the code itself. Almost every Git version ships with more optimizations, and 2.11 is no exception. Let's take a closer look at a few of the larger examples.

Delta Chains

Git 2.11 is faster at accessing delta chains in its object database, which should improve the performance of many common operations. To understand what's going on, we first have to know what the heck a delta chain is.

You may know that Git avoids storing files multiple times, because all data is stored in objects named after the SHA-1 of the contents. But in a version control system, we often see data that is almost identical (i.e., your files change just a little bit from version to version). Git stores these related objects as "deltas": one object is chosen as a base that is stored in full, and other objects are stored as a sequence of change instructions from that base, like "remove bytes 50-100" and "add in these new bytes at offset 50". The resulting deltas are a fraction of the size of the full object, and Git's storage ends up proportional to the size of the changes, not the size of all versions.

As files change over time, the most efficient base is often an adjacent version. If that base is itself a delta, then we may form a chain of deltas: version two is stored as a delta against version one, and then version three is stored as a delta against version two, and so on. But these chains can make it expensive to reconstruct the objects when we need them. Accessing version three in our example requires first reconstructing version two. As the chains get deeper and deeper, the cost of reconstructing intermediate versions gets larger.

For this reason, Git typically limits the depth of a given chain to 50 objects. However, when repacking using git gc --aggressive, the default is bumped to 250, with the assumption that it would make a significantly smaller pack. But that number was chosen somewhat arbitrarily, and it turns out that the ideal balance between size and CPU actually is around 50. So that's the default in Git 2.11, even for aggressive repacks. [source]

Even 50 deltas is a lot to go through to construct one object. To reduce the impact, Git keeps a cache of recently reconstructed objects. This works out well because deltas and their bases tend to be close together in history, so commands like git log which traverse history tend to need those intermediate bases again soon. That cache has an adjustable size, and has been bumped over the years as machines have gotten more RAM. But due to storing the cache in a fairly simple data structure, Git kept many fewer objects than it could, and frequently evicted entries at the wrong time.

In Git 2.11, the delta base cache has received a complete overhaul. Not only should it perform better out of the box (around 10% better on a large repository), but the improvements will scale up if you adjust the core.deltaBaseCacheLimit config option beyond its default of 96 megabytes. In one extreme case, setting it to 1 gigabyte improved the speed of a particular operation on the Linux kernel repository by 32%. [source, source]

Object Lookups

The delta base improvements help with accessing individual objects. But before we can access them, we have to find them. Recent versions of Git have optimized object lookups when there are multiple packfiles.

When you have a large number of objects, Git packs them together into "packfiles": single files that contain many objects along with an index for optimized lookups. A repository also accumulates packfiles as part of fetching or pushing, since Git uses them to transfer objects over the network. The number of packfiles may grow from day-to-day usage, until the next repack combines them into a single pack. Even though looking up an object in each packfile is efficient, if there are many packfiles Git has to do a linear search, checking each packfile in turn for the object.

Historically, Git has tried to reduce the cost of the linear search by caching the last pack in which an object was found and starting the next search there. This helps because most operations look up objects in order of their appearance in history, and packfiles tend to store segments of history. Looking in the same place as our last successful lookup often finds the object on the first try, and we don't have to check the other packs at all.

In Git 2.10, this "last pack" cache was replaced with a data structure to store the packs in most recently used (MRU) order. This speeds up object access, though it's only really noticeable when the number of packs gets out of hand.

In Git 2.11, this MRU strategy has been adapted to the repacking process itself, which previously did not even have a single "last found" cache. The speedups are consequently more dramatic here; repacking the Linux kernel from a 1000-pack state is over 70% faster. [source, source]

Patch IDs

Git 2.11 speeds up the computation of "patch IDs", which are used heavily by git rebase.

Patch IDs are a fingerprint of the changes made by a single commit. You can compare patch IDs to find "duplicate" commits: two changes at different points in history that make the exact same change. The rebase command uses patch IDs to find commits that have already been merged upstream.

Patch ID computation now avoids both merge commits and renames, improving the runtime of the duplicate check by a factor of 50 in some cases. [source, source]

Advanced filter processes

Git includes a "filter" mechanism which can be used to convert file contents to and from a local filesystem representation. This is what powers Git's line-ending conversion, but it can also execute arbitrary external programs. The Git LFS system hooks into Git by registering its own filter program.

The protocol that Git uses to communicate with the filter programs is very simple. It executes a separate filter for each file, writes the filter input, and reads back the filter output. If you have a large number of files to filter, the overhead of process startup can be significant, and it's hard for filters to share any resources (such as HTTP connections) among themselves.

Git 2.11 adds a second, slightly more complex protocol that can filter many files with a single process. This can reportedly improve checkout times with many Git LFS objects by as much as a factor of 80.

Git LFS improvements

The original protocol is still available for backwards compatibility, and the new protocol is designed to be extensible. Already there has been discussion of allowing it to operate asynchronously, so the filter can return results as they arrive. [source]


  • In our post about Git 2.9, we mentioned some improvements to the diff algorithm to make the results easier to read (the --compaction-heuristic option). That algorithm did not become the default because there were some corner cases that it did not handle well. But after some very thorough analysis, Git 2.11 has an improved algorithm that behaves similarly but covers more cases and does not have any regressions. The new option goes under the name --indent-heuristic (and diff.indentHeuristic), and will likely become the default in a future version of Git. [source]
  • Ever wanted to see just the commits brought into a branch by a merge commit? Git now understands negative parent-number selectors, exclude the given parent (rather than selecting it). It may take a minute to wrap your head around that, but it means that git log 1234abcd^-1 will show all of the commits that were merged in by 1234abcd, but none of the commits that were already on the branch. You can also use ^- (omitting the 1) as a shorthand for ^-1. [source]
  • There's now a credential helper in contrib/ that can use GNOME libsecret to store your Git passwords. [source]
  • The git diff command now understands --submodule=diff (as well as setting the diff.submodule config to diff), which will show changes to submodules as an actual patch between the two submodule states. [source]
  • git status has a new machine-readable output format that is easier to parse and contains more information. Check it out if you're interested in scripting around Git. [source]
  • Work has continued on converting some of Git's shell scripts to C programs. This can drastically improve performance on platforms where extra processes are expensive (like Windows), especially in programs that may invoke sub-programs in a loop. [source, source]

The whole shebang

That's just a sampling of the changes in Git 2.11, which contains over 650 commits. Check out the the full release notes for the complete list.

[1] It's true. According to the National Weather Service, the odds of being struck by lightning are 1 in a million. That's about 1 in 220, so the odds of it happening in 8 consecutive years (starting with this year) are 1 in 2160.

[2] It turns out to be rather complicated to compute the probability of seeing a collision, but there are approximations. With 5 million objects, there's about a 1 in 1035 chance of a full SHA-1 collision, but the chance of a collision in 7 characters approaches 100%. The more commonly used metric is "numbers of items to reach a 50% chance of collision", which is the square root of the total number of possible items. If you're working with exponents, that's easy; you just halve the exponent. Each hex character represents 4 bits, so a 7-character name has 228 possibilities. That means we expect a collision around 214, or 16384 objects.

Git Merge 2017 tickets are now available


Tickets for Git Merge 2017 are now on sale 🎉

Git Merge is the pre-eminent Git-focused conference: a full day offering technical talks and user case studies, plus a full day of pre-conference, add-on workshops for Git users of all levels. Git Merge 2017 will take place February 2-3 in Brussels.

Confirmed Speakers

  • Durham Goode, Facebook
  • Santiago Perez De Rosso, MIT
  • Carlos Martin Nieto, GitHub

Git users of all levels are invited to dive into a variety of topics with some of the best Git trainers in the world. Learn about improving workflows with customized configurations, submodules and subtrees, getting your repo under control, and much more. Workshops are included in the cost of a conference ticket, but space is limited. Make sure to RSVP when you get your conference ticket.

Sponsorship Opportunities
By sponsoring Git Merge, you are supporting a community of users and developers dedicated a tool that's become integral to your development workflow. Check out the Sponsorship Prospectus for more information.

Tickets are €99 and all proceeds are donated to the Software Freedom Conservancy. General admission also includes entrance to the after party.


New in the shop: The Octoplush

It's time to cozy up with the all new Octoplush collectable-available now in the GitHub Shop.

Share the Octoplush with friends and family. Just don't feed these octocats. They're already stuffed.

Now through Tuesday, enjoy 30% off everything in the GitHub Shop with discount code OCTOCYBER2016 and free shipping for orders over $30.

GitHub Extension now supports Visual Studio 2017 RC

The GitHub Extension for Visual Studio now supports Visual Studio 2017 RC, including support for cloning repositories directly from the Visual Studio Start Page. We've also improved our startup time to get you productive as fast as possible.

This release, version, is available as an optional installation component in the installer for both Visual Studio 2015 and Visual Studio 2017 RC. You can also install it directly from the Visual Studio gallery or download it from

Last year we introduced the GitHub Extension for Visual Studio as an open source project under the MIT license. You can log issues and contribute to the extension in the repository.

Clone from GitHub on Start Page

New in the Shop: GitHub Activity Book

Go ahead, color outside the lines with the GitHub Activity Book starring our very own Mona the Octocat! Now available in the GitHub Shop.

Activity Book

GitKraken joins the Student Developer Pack

GitKraken is now part of the Student Developer Pack. Students can manage Git projects in a faster, more user-friendly way with GitKraken's Git GUI for Windows, Mac, and Linux.

GitKraken joins the Student Developer Pack

GitKraken is a cross-platform GUI for Git that makes Git commands more intuitive. The interface equips you with a visual understanding of branching, merging and your commit history. GitKraken works directly with your repositories with no dependencies—you don’t even need to install Git on your system. You’ll also get a built-in merge tool with syntax highlighting as well as one-click undo and redo for when you make mistakes. Other features of GitKraken are:

  • Drag and drop to merge, rebase, reset, push
  • Resizable, easy-to-understand commit graph
  • File history and blame
  • View image diffs in app
  • Fuzzy finder and command palette
  • Submodules and Gitflow support
  • Easily clone, add remotes, and open pull requests in app
  • Keyboard shortcuts
  • Dark and light color themes
  • GitHub integration

Members of the pack get GitKraken Pro free for one year. With GitKraken Pro, Student Developer Pack members will get all the features of GitKraken plus:

  • The ability to resolve merge conflicts in the app
  • Multiple profiles for work and personal use
  • Support for GitHub Enterprise

Students can get free access to professional developer tools from companies like Datadog, Travis CI, and Unreal Engine. The Student Developer Pack lets you learn, experiment, and build software with the tools developers use at work every day without worrying about cost.

Students, get a Git GUI now with your pack.

Operation Code: connecting tech and veterans

Today is Veteran’s Day here in the United States, or Remembrance Day in many places around the world, when we recognize those who have served in the military. Today many businesses will offer veterans a cup of coffee or a meal, but one organization goes further.

You might have watched ex-Army Captain David Molina speak at CodeConf LA, or GitHub Universe about Operation Code, a nonprofit he founded in 2014 after he couldn’t use the benefits of the G.I. Bill to pay for code school. Operation Code lowers the barrier of entry into software development and helps military personnel in the United States better their economic outcomes as they transition to civilian life. They leverage open source communities to provide accessible online mentorship, education, and networking opportunities.

The organization is also deeply invested in facilitating policy changes that will allow veterans to use their G.I. Bill benefits at coding schools and boot camps, speeding up their re-entry to the workforce. Next week Captain Molina will testify in Congress as to the need for these updates. The video below explains more about their work.

Operation Code - On a mission to expand the GI Bill

Although Operation Code currently focuses on the United States, they hope to develop a model that can be replicated throughout the world.

Why Operation Code matters

Operation Code is working to address a problem that transcends politics. Here's a look into the reality U.S. veterans face:

  • The unemployment rate for veterans over the age of 18 as of August 2016 is 3.9% for men and 7.0% for women.
  • As of 2014, less that seven percent of enlisted personnel have a Bachelor’s degree or higher
  • More than 200,000 active service members leave the military every year, and are in need of employment
  • U.S. Studies show that members of underrepresented communities are more frequently joining the military to access better economic and educational opportunities

How you can help

GitHub is headed to AWS re:Invent


The GitHub team is getting ready for AWS re:Invent on November 28, and we'd love to meet you there.

Why? GitHub works alongside AWS to ensure your code is produced and shipped quickly and securely, giving you a platform that plugs right into existing workflows, saving time and allowing your team to use tools they’re already familiar with. And at AWS re:Invent, we’re hosting events throughout the week to help you learn how GitHub and AWS work together.

Level up your DevOps program

DevOps is a never-ending journey, and implementing the best tools and practices for the job is only the beginning. Hear from Accenture Federal Services’ Natalie Bradley and GitHub’s Matthew McCullough about how GitHub Enterprise and AWS formed the backbone of a DevOps program that not only raised code quality and shipping speed, but defined how to scale tools for thousands of users.

Unwind at TopGolf

Join us on Tuesday, the 29th for a party at TopGolf—the perfect place to unwind from a full day of travel, training, or meetings. Tee time is 7:30 PM at the MGM Grand. RSVP today.

Meet with GitHub Engineers

You'll also have a chance to get some in-depth advice from our team of technical Hubbers headed to Vegas by scheduling a 1:1 chat with them.

You can visit the Octobooth on the expo floor to watch live demos, talk to one of our product specialists, or just grab some swag. Stop by and say hi at booth #607.

GitHub Enterprise 2.8 is now available with code review, project management tools, and Jupyter notebook rendering

GitHub Enterprise 2.8 adds power and versatility directly into your workflow with Reviews for more streamlined code review and discussion, Projects to bring development-centric project management into GitHub, and Jupyter notebook rendering to visualize data.

Code better with Reviews

Reviews help you build flexible code review workflows into your pull requests, streamlining conversations, reducing notifications, and adding more clarity to discussions. You can comment on specific lines of code, formally "approve" or "request changes" to pull requests, batch comments, and have multiple conversations per line. These initial improvements are only the first step of a much greater roadmap toward faster, friendlier code reviews.

Organize projects while staying close to your code

With Projects, you can manage work directly from your GitHub repositories. Create task cards from pull requests, issues, or notes and drag and drop them into categorized columns. You can use categories like "In-progress", "Done", "Never going to happen", or any other framework your team prefers. Move the cards within a column to prioritize them or from one column to another as your work progresses. And with notes, you can capture every early idea that comes up as part of your standup or team sync, without polluting your list of issues.

Visualize data-driven workflows with Jupyter Notebook rendering

Producing and sharing data on GitHub is a common challenge for researchers and data scientists. Jupyter notebooks make it easy to capture those data-driven workflows that combine code, equations, text, and visualizations. And now they render in all your GitHub repositories.

Share your story as a developer

This release takes the contribution graph to new heights with your GitHub timeline—a snapshot of your most important triumphs and contributions. Curate and showcase your defining moments, from pinned repositories that highlight your best work to a profile timeline that chronicles important milestones in your career.

Amp up administrator visibility and security enforcement

GitHub Enterprise 2.8 gives administrators more ways to enforce security policies, understand and improve performance, and get developers the support they need. Site admins can now enforce the use of two-factor authentication at the organization level, efficiently visualize LDAP authentication-related problems—like polling, repeated failed login attempts, and slow servers—and direct users to their support website throughout the appliance.

Upgrade today

Upgrade to GitHub Enterprise 2.8 today to start using these new features and keep improving the way your team works. You can also check out the release notes to see what else is new or enable update checks to automatically check for the latest releases of GitHub Enterprise.

What's new in GitHub Pages with Jekyll 3.3

GitHub Pages has upgraded to Jekyll 3.3.0, a release with some nice quality-of-life features.

First, Jekyll 3.3 introduces two new convenience filters, relative_url and absolute_url. They provide an easy way to ensure that your site's URLs will always appear correctly, regardless of where or how your site is viewed. To make it easier to use these two new filters, GitHub Pages now automatically sets the site.url and site.baseurl properties, if they're not already set.

This means that starting today {{ "/about/" | relative_url }} will produce /repository-name/about/ for Project Pages (and /about/ for User Pages). Additionally, {{ "/about/" | absolute_url }} will produce for Project Pages (and for User Pages or if you have a custom domain set up).

Second, with Jekyll 3.3, when you run jekyll serve in development, it will override your url value with http://localhost:4000. No more confusion when your locally-modified CSS isn't loading because the URL is set to the production site. Additionally, site.url and absolute_url will now yield http://localhost:4000 when running a server locally.

Finally, to make it easier to vendor third-party dependencies via package managers like Bundler or NPM (or Yarn), Jekyll now ignores the vendor and node_modules directories by default, speeding up build times and avoiding potential errors. If you need those directories included in your site, set exclude: [] in your site's configuration file.

For more information, see the Jekyll changelog and if you have any questions, please let us know.

Happy publishing!

Game Off Theme Announcement

GitHub Game Off 2016 Theme is Hacking, Modding, or Augmenting

We announced the GitHub Game Jam, our very own month-long game jam, a few weeks ago. Today, we're announcing the theme and officially kicking it off. Ready player one!

The Challenge

You have the entire month of November to create a game loosely based on the theme hacking, modding and/or augmenting.

What do we mean by loosely based on hacking, modding and/or augmenting? Here are some examples:

  • an endless runner where you hack down binary trees in your path with a pixelated axe
  • a modern take on a classic e.g. a roguelike set in a 3D or VR world
  • an augmented reality game bringing octopus/cat hybrids into the real world

Unleash your creativity. You can work alone or with a team and build for any platform or device. The use of open source game engines and libraries is encouraged but not required.

We'll highlight some of our favorites games on the GitHub blog, and the world will get to enjoy (and maybe even contribute to or learn from) your creations.

How to participate

  • Sign up for a free personal account if you don't already have one
  • Fork the github/game-off-2016 repository to your personal account (or to a free organization account)
  • Clone the repository on your computer and build your game
  • Push your game source code to your forked repository before December 1st
  • Update the file to include a description of your game, how to play or download it, how to build and compile it, what dependencies it has, etc
  • Submit your final game using this form

It's dangerous to go alone

If you're new to Git, GitHub, or version control

  • Git Documentation: everything you need to know about version control and how to get started with Git
  • GitHub Help: everything you need to know about GitHub
  • Questions about GitHub? Please contact our Support team and they'll be delighted to help you
  • Questions specific to the GitHub Game Off? Please create an issue. This will be the official FAQ

The official Twitter hashtag for the Game Off is #ggo16. We look forward to playing your games.

GLHF! <3

Save the Date: Git Merge 2017

Git Merge 2017 February 2-3 in Brussels

We’re kicking off 2017 with Git Merge, February 2-3 in Brussels. Join us for a full day of technical talks and user case studies, plus a day of pre-conference workshops for Git users of all levels (RSVP is required, as space is limited). If you’ll be in Brussels for FOSDEM, come in early and stop by. Just make sure to bundle up!

Git Merge is the pre-eminent Git-focused conference dedicated to amplifying new voices from the Git community and to showcasing thought-provoking projects from contributors, maintainers, and community managers. When you participate in Git Merge, you’ll contribute to one of the largest and most forward-thinking communities of developers in the world.

Call for Speakers
We're accepting proposals starting now through Monday, November 21. Submit a proposal and we’ll email you back by Friday, December 9. For more information on our process and what kind of talks we’re seeking, check out our Call For Proposals (CFP).

Code of Conduct
Git Merge is about advancing the Git community at large. We value the participation of each member and want all attendees to have an enjoyable and fulfilling experience. Check out our Code of Conduct for complete details.

Git Merge would not be possible without the help of our sponsors and community partners. If you're interested in sponsoring Git Merge, you can download the sponsorship prospectus for more information.

Tickets are €99 and all proceeds are donated to the Software Freedom Conservancy. General Admission includes access to the pre-conference workshops and after party in addition to the general sessions.


See you in Brussels!

Incident Report: Inadvertent Private Repository Disclosure

On Thursday, October 20th, a bug in GitHub’s system exposed a small amount of user data via Git pulls and clones. In total, 156 private repositories of users were affected (including one of GitHub's). We have notified everyone affected by this private repository disclosure, so if you have not heard from us, your repositories were not impacted and there is no ongoing risk to your information.

This was not an attack, and no one was able to retrieve vulnerable data intentionally. There was no outsider involved in exposing this data; this was a programming error that resulted in a small number of Git requests retrieving data from the wrong repositories.

Regardless of whether or not this incident impacted you specifically, we want to sincerely apologize. It’s our responsibility not only to keep your information safe but also to protect the trust you have placed in us. GitHub would not exist without your trust, and we are deeply sorry that this incident occurred.

Below is the technical analysis of our investigation, including a high-level overview of the incident, how we mitigated it, and the specific measures we are taking to safeguard against incidents like this from happening in the future.

High-level overview

In order to speed up unicorn worker boot times, and simplify the post-fork boot code, we applied the following buggy patch:


The database connections in our rails application are split into three pools: a read-only group, a group used by Spokes (our distributed Git back-end), and the normal Active Record connection pool. The read-only group and the Spokes group are managed manually, by our own connection handling code. This meant the pool was shared between all child processes of the rails application when running using the change. The new line of code disconnected only ConnectionPool objects that are managed by Active Record, whereas the previous snippet would disconnect all ConnectionPool objects held in memory.

The impact of this bug for most queries was a malformed response, which errored and caused a near immediate rollback. However, a very small percentage of the queries responses were interpreted as legitimate data in the form of the file server and disk path where repository data was stored. Some repository requests were routed to the location of another repository. The application could not differentiate these incorrect query results from legitimate ones, and as a result, users received data that they were not meant to receive.

When properly functioning, the system works as sketched out roughly below. However, during this failure window, the MySQL response in step 4 was returning malformed data that would end up causing the git proxy to return data from the wrong file server and path.

System Diagram

Our analysis of the ten-minute window in question uncovered:

  • 17 million requests to our git proxy tier, most of which failed with errors due to the buggy deploy
  • 2.5 million requests successfully reached git-daemon on our file server tier
  • Of the 2.5 million requests that reached our file servers, the vast majority were "already up to date" no-op fetches
  • 40,000 of the 2.5 million requests were non-empty fetches
  • 230 of the 40,000 non-empty requests were susceptible to this bug and served incorrect data
  • This represented 0.0013% of the total operations at the time

Deeper analysis and forensics

After establishing the effects of the bug, we set out to determine which requests were affected in this way for the duration of the deploy. Normally, this would be an easy task, as we have an in-house monitor for Git that logs every repository access. However, those logs contained some of the same faulty data that led to the misrouted requests in the first place. Without accurate usernames or repository names in our primary Git logs, we had to turn to data that our git proxy and git-daemon processes sent to syslog. In short, the goal was to join records from the proxy, to git-daemon, to our primary Git logging, drawing whatever data was accurate from each source. Correlating records across servers and data sources is a challenge because the timestamps differ depending on load, latency, and clock skew. In addition, a given Git request may be rejected at the proxy or by git-daemon before it reaches Git, leaving records in the proxy logs that don’t correlate with any records in the git-daemon or Git logs.

Ultimately, we joined the data from the proxy to our Git logging system using timestamps, client IPs, and the number of bytes transferred and then to git-daemon logs using only timestamps. In cases where a record in one log could join several records in another log, we considered all and took the worst-case choice. We were able to identify cases where the repository a user requested, which was recorded correctly at our git proxy, did not match the repository actually sent, which was recorded correctly by git-daemon.

We further examined the number of bytes sent for a given request. In many cases where incorrect data was sent, the number of bytes was far larger than the on-disk size of the repository that was requested but instead closely matched the size of the repository that was sent. This gave us further confidence that indeed some repositories were disclosed in full to the wrong users.

Although we saw over 100 misrouted fetches and clones, we saw no misrouted pushes, signaling that the integrity of the data was unaffected. This is because a Git push operation takes place in two steps: first, the user uploads a pack file containing files and commits. Then we update the repository’s refs (branch tips) to point to commits in the uploaded pack file. These steps look like a single operation from the user’s point of view, but within our infrastructure, they are distinct. To corrupt a Git push, we would have to misroute both steps to the same place. If only the pack file is misrouted, then no refs will point to it, and git fetch operations will not fetch it. If only the refs update is misrouted, it won’t have any pack file to point to and will fail. In fact, we saw two pack files misrouted during the incident. They were written to a temporary directory in the wrong repositories. However, because the refs-update step wasn’t routed to the same incorrect repository, the stray pack files were never visible to the user and were cleaned up (i.e., deleted) automatically the next time those repositories performed a “git gc” garbage-collection operation. So no permanent or user-visible effect arose from any misrouted push.

A misrouted Git pull or clone operation consists of several steps. First, the user connects to one of our Git proxies, via either SSH or HTTPS (we also support git-protocol connections, but no private data was disclosed that way). The user’s Git client requests a specific repository and provides credentials, an SSH key or an account password, to the Git proxy. The Git proxy checks the user’s credentials and confirms that the user has the ability to read the repository he or she has requested. At this point, if the Git proxy gets an unexpected response from its MySQL connection, the authentication (which user is it?) or authorization (what can they access?) check will simply fail and return an error. Many users were told during the incident that their repository access “was disabled due to excessive resource use.”

In the operations that disclosed repository data, the authentication and authorization step succeeded. Next, the Git proxy performs a routing query to see which file server the requested repository is on, and what its file system path on that server will be. This is the step where incorrect results from MySQL led to repository disclosures. In a small fraction of cases, two or more routing queries ran on the same Git proxy at the same time and received incorrect results. When that happened, the Git proxy got a file server and path intended for another request coming through that same proxy. The request ended up routed to an intact location for the wrong repository. Further, the information that was logged on the repository access was a mix of information from the repository the user requested and the repository the user actually got. These corrupted logs significantly hampered efforts to discover the extent of the disclosures.

Once the Git proxy got the wrong route, it forwarded the user’s request to git-daemon and ultimately Git, running in the directory for someone else’s repository. If the user was retrieving a specific branch, it generally did not exist, and the pull failed. But if the user was pulling or cloning all branches, that is what they received: all the commits and file objects reachable from all branches in the wrong repository. The user (or more often, their build server) might have been expecting to download one day’s commits and instead received some other repository’s entire history.

Users who inadvertently fetched the entire history of some other repository, surprisingly, may not even have noticed. A subsequent “git pull” would almost certainly have been routed to the right place and would have corrected any overwritten branches in the user’s working copy of their Git repository. The unwanted remote references and tags are still there, though. Such a user can delete the remote references, run “git remote prune origin,” and manually delete all the unwanted tags. As a possibly simpler alternative, a user with unwanted repository data can delete that whole copy of the repository and “git clone” it again.

Next steps

To prevent this from happening again, we will modify the database driver to detect and only interpret responses that match the packet IDs sent by the database. On the application side, we will consolidate the connection pool management so that Active Record's connection pooling will manage all connections. We are following this up by upgrading the application to a newer version of Rails that doesn't suffer from the "connection reuse" problem.

We will continue to analyze the events surrounding this incident and use our investigation to improve the systems and processes that power GitHub. We consider the unauthorized exposure of even a single private repository to be a serious failure, and we sincerely apologize that this incident occurred.