Skip to content

GitHub availability this week

GitHub.com suffered two outages early this week that resulted in one hour and 46 minutes of downtime and another hour of significantly degraded performance. This is far below our standard of quality, and for that I am truly sorry. I want to explain what happened and give you some insight into what we're doing to prevent it from happening again.

First, some background

During a maintenance window in mid-August our operations team replaced our aging pair of DRBD-backed MySQL servers with a 3-node cluster. The servers collectively present two virtual IPs to our application: one that's read/write and one that's read-only. These virtual IPs are managed by Pacemaker and Heartbeat, a high availability cluster management stack that we use heavily in our infrastructure. Coordination of MySQL replication to move 'active' (a MySQL master that accepts reads and writes) and 'standby' (a read-only MySQL slave) roles around the cluster is handled by Percona Replication Manager, a resource agent for Pacemaker. The application primarily uses the 'active' role for both reads and writes.

This new setup provides, among other things, more efficient failovers than our old DRBD setup. In our previous architecture, failing over from one database to another required a cold start of MySQL. In the new infrastructure, MySQL is running on all nodes at all times; a failover simply moves the appropriate virtual IP between nodes after flushing transactions and appropriately changing the read_only MySQL variable.

Monday, September 10th

The events that led up to Monday's downtime began with a rather innocuous database migration. We use a two-pass migration system to allow for zero-downtime MySQL schema migration. This has been a relatively recent addition, but we've used it a handful of times without any issue.

Monday's migration caused higher load on the database than our operations team has previously seen during these sorts of migrations. So high, in fact, that they caused Percona Replication Manager's health checks to fail on the master. In response to the failed master health check, Percona Replication manager moved the 'active' role and the master database to another server in the cluster and stopped MySQL on the node it perceived as failed.

At the time of this failover, the new database selected for the 'active' role had a cold InnoDB buffer pool and performed rather poorly. The system load generated by the site's query load on a cold cache soon caused Percona Replication Manager's health checks to fail again, and the 'active' role failed back to the server it was on originally.

At this point, I decided to disable all health checks by enabling Pacemaker's maintenance-mode; an operating mode in which no health checks or automatic failover actions are performed. Performance on the site slowly recovered as the buffer pool slowly reached normal levels.

Tuesday, September 11th

The following morning, our operations team was notified by a developer of incorrect query results returning from the node providing the 'standby' role. I investigated the situation and determined that when the cluster was placed into maintenance-mode the day before, actions that should have caused the node elected to serve the 'standby' role to change its replication master and start replicating were prevented from occurring. I determined that the best course of action was to disable maintenance-mode to allow Pacemaker and the Persona Replication Manager to rectify the situation.

Upon attempting to disable maintenance-mode, a Pacemaker segfault occurred that resulted in a cluster state partition. After this update, two nodes (I'll call them 'a' and 'b') rejected most messages from the third node ('c'), while the third node rejected most messages from the other two. Despite having configured the cluster to require a majority of machines to agree on the state of the cluster before taking action, two simultaneous master election decisions were attempted without proper coordination. In the first cluster, master election was interrupted by messages from the second cluster and MySQL was stopped.

In the second, single-node cluster, node 'c' was elected at 8:19 AM, and any subsequent messages from the other two-node cluster were discarded. As luck would have it, the 'c' node was the node that our operations team previously determined to be out of date. We detected this fact and powered off this out-of-date node at 8:26 AM to end the partition and prevent further data drift, taking down all production database access and thus all access to github.com.

As a result of this data drift, inconsistencies between MySQL and other data stores in our infrastructure were possible. We use Redis to query dashboard event stream entries and repository routes from automatically generated MySQL ids. In situations where the id MySQL generated for a record is used to query data in Redis, the cross-data-store foreign key relationships became out of sync for records created during this window.

Consequentially, some events created during this window appeared on the wrong users' dashboards. Also, some repositories created during this window were incorrectly routed. We've removed all of the leaked events, and performed an audit of all repositories incorrectly routed during this window. 16 of these repositories were private, and for seven minutes from 8:19 AM to 8:26 AM PDT on Tuesday, Sept 11th, were accessible to people outside of the repository's list of collaborators or team members. We've contacted all of the owners of these repositories directly. If you haven't received a message from us, your repository was not affected.

After confirming that the out-of-date database node was properly terminated, our operations team began to recover the state of the cluster on the 'a' and 'b' nodes. The original attempt to disable maintenance-mode was not reflected in the cluster state at this time, and subsequent attempts to make changes to the cluster state were unsuccessful. After tactical evaluation, we team determined that a Pacemaker restart was necessary to obtain a clean state.

At this point, all Pacemaker and Heartbeat processes were stopped on both nodes, then started on the 'a' node. MySQL was successfully started on the 'a' node and assumed the 'active' role. Performance on the site slowly recovered as the buffer pool slowly reached normal levels.

In summary, three primary events contributed to the downtime of the past few days. First, several failovers of the 'active' database role happened when they shouldn't have. Second, a cluster partition occurred that resulted in incorrect actions being performed by our cluster management software. Finally, the failovers triggered by these first two events impacted performance and availability more than they should have.

In ops I trust

The automated failover of our main production database could be described as the root cause of both of these downtime events. In each situation in which that occurred, if any member of our operations team had been asked if the failover should have been performed, the answer would have been a resounding no. There are many situations in which automated failover is an excellent strategy for ensuring the availability of a service. After careful consideration, we've determined that ensuring the availability of our primary production database is not one of these situations. To this end, we've made changes to our Pacemaker configuration to ensure failover of the 'active' database role will only occur when initiated by a member of our operations team.

We're also investigating solutions to ensure that these failovers don't impact performance when they must be performed, either in an emergency situation or as a part of scheduled maintenance. There are various facilities for warming the InnoDB buffer pool of slave databases that we're investigating and testing for this purpose.

Finally, our operations team is performing a full audit of our Pacemaker and Heartbeat stack focusing on the code path that triggered the segfault on Tuesday. We're also performing a strenuous round of hardware testing on the server on which the segfault occurred out of an abundance of caution.

Status Downtime on Tuesday, September 11th

We host our status site on Heroku to ensure its availability during an outage. However, during our downtime on Tuesday our status site experienced some availability issues.

As traffic to the status site began to ramp up, we increased the number of dynos running from 8 to 64 and finally 90. This had a negative effect since we were running an old development database addon (shared database). The number of dynos maxed out the available connections to the database causing additional processes to crash.

We worked with Heroku Support to bring a production database online that would be able to handle the traffic the site was receiving. Once this database was online we saw an immediate improvement to the availability of the status site.

Since the outage we've added a database slave to improve our availability options for unforeseen future events.

Looking ahead

The recent changes made to our database stack were carefully made specifically with high availability in mind, and I can't apologize enough that they had the opposite result in this case. Our entire operations team is dedicated to providing a fast, stable GitHub experience, and we'll continue to refine our infrastructure, software, and methodologies to ensure this is the case.

Searching and Filtering Stars

You can already see all the repositories you’ve starred at github.com/stars, but today your Stars page just got a whole lot better.

Search

Looking for a repository you starred? Search for it:

Search starred repositories

Filter by Language

Sometimes you can’t remember the name of a repository, though, so we’ve added a language breakdown of your stars, too:

Starred repository language filter

Filter by Repository Type

You can also filter your stars by what type of repository you’re looking for — public, private, and so on:

Starred repository type filter

Mix and match

The best thing? All of these filters are stackable, so you can search all of your stars for other people’s public JavaScript repositories matching “plugin”, for example.

Keyboard Navigation

As with most pages on GitHub, the Stars page responds to a few keyboard shortcuts:

  • Use j and k to move up and down the list quickly.
  • Type enter to go to the selected repository.
  • Press cmd+enter (OS X) or ctrl+enter (Windows, Linux) to open the selected repository in a new tab.
  • Hit / to quickly focus the search field.

Go ahead; check out the new Stars page.

Happy stargazing! :sparkles::star2::star::dizzy:

GitHub Enterprise 11.10.280 Release

We're excited to announce the latest release of GitHub Enterprise. We've been working hard to bring you the most recent features from GitHub.com, and we're shipping this version with our new Commit Status API included, and much more. Along with a variety of general improvements and adjustments, this new release brings the following features from GitHub.com:

In addition, we're also including several new Enterprise specific features:

Subversion Bridge

For quite some time GitHub.com has had full support for using subversion clients to interact with git repositories hosted there. This feature wasn't really widely known, which resulted in a more recent blog post detailing how it can be used.

Support for this feature was not immediately available with GitHub Enterprise, but has been widely requested. We're extremely happy to announce that this release of Enterprise now includes the full Subversion Bridge that's available in GitHub.com! After upgrading, it can be taken advantage of immediately with no additional configuration.

New GitHub Enterprise Header Design

One of the benefits of having GitHub Enterprise is that it's nearly identical to GitHub.com. Unfortunately, it can almost be too identical and it was possible to mix up which site you were on. With this change it will be easier to differentiate whether you're visiting GitHub.com or your own GitHub Enterprise installation by taking a look at Enterprise's new gorgeous header:

Log Forwarding

Now, you can centralize the logs produced by GitHub Enterprise by forwarding them to your own logging servers. We're using rsyslog to stream application and server logs out of the appliance for your convenience – you just need to specify an endpoint that can consume forwarded syslog data (e.g., logstash and splunk):

VirtualBox Guest Additions Installer

Beginning with the 11.10.260 release, we've included a utility to help install VMware Tools on Enterprise installations running in VMware environments. Beginning with this release, we're also including a utility to help install VirtualBox Guest Additions. You can get instructions on how to do this here.

New CLI Utilities

We're also including a couple new CLI utilities to help manage the appliance:

  • ghe-vm-reboot – Not all GitHub Enterprise appliance administrators have direct access to the hypervisor the VM is running on. Previously, they might have to go through a separate VM management team to perform a system reboot if that were necessary. Now this is possible by using this script.
  • ghe-repo-repair – This utility will detect any repositories on-disk that may have incorrect permissions on the server. If any are detected, you can optionally correct those permissions as well.
$ ghe-repo-repair -f
 --> Checking for repositories with bad permissions...
 --> Fixing permissions for /data/repositories/gist/1.git...
 --> * Fixing ownership permissions...
 --> * Fixing directory permissions...
 --> * Fixing file permissions...
 --> Done.

Updated Admin Tools UI

The UI for the Admin Tools dashboard has been updated to be more consistent with how settings are represented in the user account settings area:

Administrative actions are now grouped more logically and have better descriptions as well.

Sticky Protocol Selection

This feature has actually been on GitHub.com for a while, but wasn't announced on the blog. Some users prefer SSH over the now default HTTP protocol for cloning. Now the last protocol a user selected will be remembered when viewing any other repository page or creating a new repository. The instructions displayed on empty repositories for what to do next will also display the appropriate steps based on that preference.


We hope you enjoy these features as much as we do. Don't forget that there is more information available about GitHub Enterprise at https://enterprise.github.com/.

Watcher API Changes

We recently changed the Watcher behavior on GitHub. What used to be known as "Watching" is now "Starring". Starring is basically a way to bookmark interesting repositories. Watching is a way to indicate that you want to receive email or web notifications on a Repository.

This works well on GitHub.com, but poses a problem for the GitHub API. How do we change this in a way that developers can gracefully upgrade their applications? We’re currently looking at rolling out the changes in three phases over an extended period of time.

Today we are announcing the first steps towards separate "Watching" and "Starring" APIs.

New API Endpoints

There are some new Star endpoints for the API. Don't fret, your old Watch endpoints are still working. The new Watch endpoints are available too. However, the paths use the internal "subscriber" and "subscriptions" terms so they don't clash with legacy apps using the Watch endpoints.

We've detailed the slow transition of the Star and Watch endpoints, which will help provide a smooth upgrade path.

Changes Blog

The API Developer site now has a Changes blog for upcoming breaking changes. It also has a low volume Atom feed for subscriptions. If you are developing tools on top of the GitHub API, you should keep up through this blog, or the @GitHubAPI Twitter account.

How we keep GitHub fast

The most important factor in web application design is responsiveness. And the first step toward responsiveness is speed. But speed within a web application is complicated.

Our strategy for keeping GitHub fast begins with powerful internal tools that expose and explain performance metrics. With this data, we can more easily understand a complex production environment and remove bottlenecks to keep GitHub fast and responsive.

Performance dashboard

Response time as a simple average isn’t very useful in a complex application. But what number is useful? The performance dashboard attempts to give an answer to this question. Powered by data from Graphite, it displays an overview of response times throughout github.com.

We split response times by the kind of request we’re serving. For the ambiguous items:

  • Browser - A page loaded in a browser by a logged in user.
  • Public - A page loaded in a browser by a logged out user.

Clicking one of the rows allows you to dive in and see the mean, 98th percentile, and 99.9th percentile response times.

The performance dashboard shows performance information, but it doesn't explain. We needed something more fine-grained and detailed.

Mission control bar

GitHub staff can browse the site in staff mode. This mode is activated via a keyboard shortcut and provides access to staff-only features, including our Mission control bar. When it’s showing, we see staff-only features and have the ability to moderate the site. When it’s hidden, we’re just regular users.

Spoiler alert: you might notice a few things in this screenshot that haven’t fully shipped yet.

The left-hand side shows which branch is currently deployed and the total time it took to serve and render the page. For some browsers (like Chrome), we show a detailed breakdown of the various time periods that make up a rendered page. This is massively useful in understanding where slowness comes from: the network, the browser, or the application.

The right-hand side is a collection of various application metrics for the given page. We show the current compressed javascript & css size, background job queue, and various data source times. For the ambiguous items:

  • render – How long did it take to render this page on the server?
  • cache – memcached calls.
  • sql – MySQL calls.
  • gitGrit calls.
  • jobs – The current background job queue.

When we’re ready to make a page fast we can dive into some of these numbers by clicking on them. We’ve hijacked many features from rack-bug and query-reviewer to produce these breakdowns.

And many more…

It goes without saying that we use many other tools like New Relic, Graphite, and plain old UNIX-foo to aid in our performance investigations as well.

A lot of the numbers in this post are much slower than I’d like them to be, but we’re hoping with better transparency we’ll be able to deliver the fastest web application that’s ever existed.

As @tnm says: it’s not fully shipped until it’s fast.

Commit Status API

Today, we shipped an API for third party services to attach statuses to commits.

We created this API to allow services to color the discussion on pull requests. For example, we use this API internally with our continuous integration (CI) setup to automatically update the status of every commit on every branch. Now, when we discuss pull requests, we can easily and automatically ensure they are safe to merge.

We also designed this API to be flexible. The API allows the service to determine the meaning of whether a commit is successful or not. As a result, the API can be used for just about anything — like design reviews or ensuring contributor license agreements are filed.

Pull Requests

Pull requests show the status of the most recent commit, as well as the (optional) description and URL metadata attached to the status.

Every commit in the pull request features a status indicator.

The merge button for the pull request will also take status into account.

The merge indicator for the pull request takes status into account and offers a warning if the status is not 'success'. The pull request can still be merged if the status is not successful though - we just warn you about it.

Services Available Today

Travis CI already pushes status information for all Travis-enabled repositories without any additional configuration required. For example, this pull request in Gollum already has statuses attached. See their blog post for more details.

Sprint.ly offers status integration for project management. For enabled repositories, they push status updates when tickets change state. Check out their blog post for the more details.

Using the Commit Status API

Integrating with our Status API is simple. Each status includes a state, a SHA, a repository, and an optional URL and description. The state reflects whether or not a SHA (commit) is successful at that point in time. The states we currently support are "pending", "success", "failure", and "error". If the status provides a URL or description, we display it in our UI.

Statuses cannot be changed once added to a commit (they're completely immutable), but any number of statuses may be attached to a single commit. We only display the most recent status for any given commit in our UI.

For more details on the nitty gritty see the API docs.

We are very excited to see what integrations you, the community, develop around this API!

Deploying at GitHub

Deploying is a big part of the lives of most GitHub employees. We don't have a release manager and there are no set weekly deploys. Developers and designers are responsible for shipping new stuff themselves as soon as it's ready. This means that deploying needs to be as smooth and safe a process as possible.

The best system we've found so far to provide this flexibility is to have people deploy branches. Changes never get merged to master until they have been verified to work in production from a branch. This means that master is always stable; a safe point that we can roll back to if there's a problem.

The basic workflow goes like this:

  • Push changes to a branch
  • Wait for the build to pass on our CI server
  • Tell Hubot to deploy it
  • Verify that the changes work and fix any problems that come up
  • Merge the branch into master

Not too long ago, however, this system wasn't very smart. A branch could accidentally be deployed before the build finished, or even if the build failed. Employees could mistakenly deploy over each other. As the company has grown, we've needed to add some checks and balances to help us prevent these kinds of mistakes.

Safety First

The first thing we do now, when someone tries to deploy, is make a call to Janky to determine whether the current CI build is green. If it hasn't finished yet or has failed, we'll tell the deployer to fix the situation and try again.

Next we check whether the application is currently "locked". The lock indicates that a particular branch is being deployed in production and that no other deploys of the application should proceed for the moment. Successful builds on the master branch would otherwise get deployed automatically, so we don't want those going out while a branch is being tested. We also don't want another developer to accidentally deploy something while the branch is out.

The last step is to make sure that the branch we're deploying contains the latest commit on master that has made it into production. Once a commit on master has been deployed to production, it should never be “removed” from production by deploying a branch that doesn’t have that commit in it yet.

We use the GitHub API to verify this requirement. An endpoint on the github.com application exposes the SHA1 that is currently running in production. We submit this to the GitHub compare API to obtain the "merge base", or the common ancestor, of master and the production SHA1. We can then compare this to the branch that we're attempting to deploy to check that the branch is caught up. By using the common ancestor of master and production, code that only exists on a branch can be removed from production, and changes that have landed on master but haven't been deployed yet won't require branches to merge them in before deploying.

If it turns out the branch is behind, master gets merged into it automatically. We do this using the new :sparkles:Merging API:sparkles: that we're making available today. This merge starts a new CI build like any other push-style event, which starts a deploy when it passes.

At this point the code actually gets deployed to our servers. We usually deploy to all servers for consistency, but a subset of servers can be specified if necessary. This subset can be by functional role — front-end, file server, worker, search, etc. — or we can specify an individual machine by name, e.g, 'fe7'.

Watch it in action

What now? It depends on the situation, but as a rule of thumb, small to moderate changes should be observed running correctly in production for at least 15 minutes before they can be considered reasonably stable. During this time we monitor exceptions, performance, tweets, and do any extra verification that might be required. If non-critical tweaks need to be made, changes can be pushed to the branch and will be deployed automatically. In the event that something bad happens, rolling back to master only takes 30 seconds.

All done!

If everything goes well, it's time to merge the changes. At GitHub, we use Pull Requests for almost all of our development, so merging typically happens through the pull request page. We detect when the branch gets merged into master and unlock the application. The next deployer can now step up and ship something awesome.

How do we do it?

Most of the magic is handled by an internal deployment service called Heaven. At its core, Heaven is a catalog of Capistrano recipes wrapped up in a Sinatra application with a JSON API. Many of our applications are deployed using generic recipes, but more complicated apps can define their own to specify additional deployment steps. Wiring it up to Janky, along with clever use of post-receive hooks and the GitHub API, lets us hack on the niceties over time. Hubot is the central interface to both Janky and Heaven, giving everyone in Campfire great visibility into what’s happening all of the time. As of this writing, 75 individual applications are deployed by Heaven.

Always be shipping

Not all projects at GitHub use this workflow, which is core to github.com development, but here are a couple of stats for build and deploy activity company-wide so far in 2012:

  • 41,679 builds
  • 12,602 deploys
  • The lull in mid-August was our company summit, which kicked off the following week with a big dose of inspiration. Our busiest day yet, Aug. 23, saw 563 builds and 175 deploys.

Optimizing Sales for Happiness

We've spoken about how we try to optimize for happiness at GitHub. You're also likely hear the term first principles thrown around if you spend a few days at our office. We insist on taking the time to research and discuss various solutions when faced with a new challenge rather than blindly accepting the status quo. How we approach sales at GitHub is no different.

GitHub Has a Sales Team?

We sure do, but figuring out what sales means within a very developer-centric company wasn't all unicorns and octocats. The siren song of big revenue gains can easily disrupt your team's ultimate goals and culture if you're not careful. It wasn't difficult to spot the slippery slope after hiring even one dedicated salesperson, so how do we optimize sales for happiness?

By putting our product and people first. If that means turning down revenue opportunities to remain true to this philosophy, so be it. Making gobs of money has never been our main focus, but we can operate this way and enjoy massive revenue growth across all of our products.

The first (and most important) thing we had going for us was a product that sells itself. With an unwavering focus on building the best products possible, we're in a wonderful position that 99% of our customers require very little handholding. People just want to use GitHub, so we make it easy to pay for it, and essentially get out of their way.

Sales our Way

The remaining 1% of customers is where sales comes into play. Much in the same way support guides folks through technical questions, we needed people to guide customers through business questions. Not only that, developers within larger organizations sometimes need help convincing the people with the purchasing authority to buy the products they really, really want to use.

We still call this role sales, but our team likes to think of themselves as developer liaisons. The role is akin to a sales engineer or technical account manager. It's definitely not your prototypical sales person that's cold-calling people all day to reach their quota.

In fact, all of our sales people have technical backgrounds, they just happen to also love the support and business side of things. They enjoy speaking with customers, building long-term relationships, and are comfortable navigating the inner workings of organizations to ultimately find a way to make sure those developers get to use the products they're asking us for.

No commissions

Another traditional sales tool that doesn't really makes sense for us is paying commissions. Commissions are an incentive to churn and burn through orders as fast as possible, regardless of the consequences. They also introduce massive overhead and logistic problems amongst the salespeople and the company as a whole, so we're happy to avoid all of that crap.

We want our sales people compensated just like everyone else in our organization: great salaries, great benefits, and stock options. We expect everyone to work hard, but we're all in this together and no one should feel marginalized because sales is reaping the rewards from years of hard work done by others.

At the end of the day, the key to sales at GitHub is the key to GitHub's success in general: build an awesome product and treat people well. We don't claim to have unlocked some huge business secret with this idea, but we're excited about our sales team at GitHub because we feel that doing it this way is best for our company and our customers. We're in this for the long haul!

Join our team!

If you have technical background and are stuck in a traditional sales role, where your company thinks of you as a coin-driven operator and not as a human being, please check out our job post.

We are excited to meet you!

Notifications & Stars

Today, we’re releasing a new version of our notifications system and changing the way you watch repositories on GitHub. You’ll find the new notifications indicator next to the GitHub logo that lights up blue when you have unread notifications.

There’s a lot going on in this announcement, so hold onto to something and prepare to take a step into the future.

Introducing stars

Stars are a new way to keep track of repositories that you find interesting. Any repositories you were previously watching can now be found on your stars page.

You can star or unstar a repository from the nav with the brand new star button.

A quick note: activity from starred repositories will not show up in your dashboard feed.

Notifications: now powered by the watch button

Notifications are now powered by the repositories you are watching. If you are watching a repository, you will receive notifications for all discussions.

  • Issues and their comments
  • Pull Requests and their comments
  • Comments on any commits

If you are not watching a repository, you'll only be notified for discussions for which you participate.

  • @mentions
  • Assignments
  • Commits you author or commit
  • Any discussion you've commented on

You're automatically watching a bunch of repositories based on your permissions — it's probably a good idea to go through and unwatch repositories you're not interested in.

Activity from your watched repositories will show up on your dashboard feed as well.

Auto watch

When you're given push access to a repository, we automatically watch the repository for you.

If you'd like your notifications to be 100% opt-in, you can disable this feature in the watching section.

Threading

Notifications now roll up into threads of activity, much like conversations in Gmail do. This should work great in your email client of choice if you receive notifications via email.

Simple configuration

We’ve simplified notifications settings and taken a new approach to controlling which notifications trigger emails.

Per organization email routing

For those of you who want to route notifications for work-related projects to different email addresses, you can now configure email routing based on the organization a repository lives in.

First class email

As a bonus, we're also rolling out improved notification emails today.

Enjoy!

Notification Email Improvements

Reading notifications on github.com is greatly improved thanks to the numerous enhancements shipped out earlier today. We're excited to also announce a brand new email backend with support for rich formatting (HTML), smart mail headers, and shared read state tracking.

Rich formatting

Issue, Pull Request, and all comment-related email notifications are now delivered with both HTML and plain text parts. Markdown formatting such as image embeds, links, lists, code blocks, bold, emphasis, and blockquotes are supported. GitHub enhancements like Emoji, @mentions, and issue references all work too.

We've tested under most popular desktop, web, mobile, and terminal based mail readers. Here's a taste of what you might see in Gmail:

gmail

See your mail client's configuration or settings area if you prefer plain text email. Most clients support disabling HTML content, either globally or for specific senders.

Mail header refinements

Notification emails now make better use of the To, Cc, and Bcc headers to signal message importance. Instead of addressing notification email To you, the Bcc and Cc fields are used. Bcc is used unless you're participating in the message, in which case Cc is used instead. This can be used to enable advanced mail client filters and interface behaviors.

Examples of mail client features that take advantage of these changes:

  • Gmail's mute feature. Muting a thread means that new messages will not show up in your Inbox unless you are @mentioned.
  • Importance indicators such as Gmail's tiny chevron guys.
  • Gmail's Priority Inbox uses To and Cc to prioritize messages that are sent directly to you.

All email is now delivered with the From address: notifications@github.com.

Shared notification read state

Notifications that are read as HTML email are automatically marked as read in the github.com notifications interface. An invisible image is embedded in each mail message to enable this.

You must allow viewing images in mail from notifications@github.com in order for this feature to work. Most mail readers allow images by default, but some require confirmation before images are displayed.

Email Verification

We are rolling out email address verification today:

Verified email addresses will enable our support team to better assist you if you lose your password or have issues with missing email from us. Current notifications and emails from GitHub will function normally for verified and unverified email addresses at this time.

This is a great time to verify the email addresses you use for notifications. We will be throwing up a reminder banner soon for users who have no verified email addresses.

Friendlier Edit and Delete Actions

Last night we shipped some friendlier edit and delete actions on your comments. Previously the actions would appear on hover at the top right of the comment box.

Screen Shot 2012-08-03 at 10 39 31 AM

The goal was to give the edit and delete actions prominent placing without obscuring the text below.

So we fixed it!

Screen Shot 2012-08-03 at 10 31 30 AM

The icons to the far right are the new edit and delete actions. Making life easier for all of us.

Surviving the SSHpocolypse

Over the past few days, we have had some issues with our SSH infrastructure affecting a small number of Git SSH operations. We apologize for the inconvenience, and are happy to report that we've completed one round of architectural changes in order to make sure our SSH servers keep their sparkle. :sparkles:

As we've said before, we use GitHub to build GitHub, so the recent intermittent SSH connection failures have been affecting us as well.

Before today, every Git operation over SSH would open its own connection to our MySQL database during the authentication step. In the past this wasn't a problem, however, we've started seeing sporadic issues as our SSH traffic has grown.

Realizing we were potentially on the cusp of a more serious situation, we patched our SSH servers to increase timeouts, retry connections to the database, and verbosely log failures. After this initial pass of incremental changes aimed to pinpoint the source of the problem, we realized this piece of our infrastructure wasn't as easily modified as we would have liked. We decided to take a more drastic approach.

Starting on Tuesday, I worked with @jnewland to retire our 4+ year-old SSH patches and rewrite them all from scratch. Rather than opening a database connection for each SSH client, we call out to a shared library plugin (written in C) that lives in our Rails app. The library uses an HTTP endpoint exposed by our Rails app in order to check for authorized public keys. The Rails app is backed by a web server with persistent database connections, which keeps us from creating unbounded database connections, as we were doing previously. This is pretty neat because, like all code that lives in the GitHub Rails app, we can redeploy it near-instantly at any time. This gives us tremendous flexibility in continuing to scale our SSH services.

@jnewland deployed the changes around 9:20am Thursday and things seem to be in much better shape now. Below is a graph that shows connections to the mysql database. You can see a drastic reduction in the number of database connections:

You can also observe an overall smaller number of SSH server processes (they're not all stuck because of contention on the database server anymore):

Of course, we are also exploring additional scalability improvements in this area.

Anywho, sorry for the mess. As always, please ping our support team if you see any further issues on github.com where Git over SSH hangs up randomly.

GitHub Android App Released

We are extremely pleased to announce the initial release of the GitHub Android App available on Google Play. The app is free to download and you can also browse the code from the newly open sourced repository.

This release includes support for working with Issues and Gists as well as an integrated news feed for keeping up to date with all your organizations, friends, and repositories.

The app features a dashboard for quick access to all your created, watched, and assigned issues so you can always stay connected with the discussion and progress. You can also view and bookmark any repository's issue list with configurable filters for labels, milestones, and assignees.

Head over to the github/android repository to see exactly how the app was built, report any feature requests or issues, and stay up to date as development of the app continues.

The GitHub Android app was built on some great open source projects that are definitely worth checking out if you are looking to build your own Android apps or want to contribute to the GitHub or Gaug.es apps:

Try Git In Your Browser

Try Git

Today we're launching a unique and easy way, in the format of a Code School interactive course, for new Git and GitHub users to try both the tool and the service without a single bit of software installation.

If you know of a developer, designer, or other knowledge worker that would benefit from using Git but that hasn't made the leap just yet, send them over to try.github.com when they have a spare 15 minutes.

Try Git Site

Made from 100% real Git and GitHub

This site even interacts with GitHub via OAuth and will push your resultant tutorial repository to your GitHub account as a repo named try_git.

Try Git Site

Pairs with Intro-to-Git Videos

If you are interested in a matched set of introductory Git videos, check out the video page of the Git SCM site too.

Something went wrong with that request. Please try again.