Subscribe

Inform and Act

Read our official statements regarding recent executive orders and find resources to help you take action. Learn more

Git.io: GitHub URL Shortener

Do you have a GitHub URL you'd like to shorten? Use Git.io!

$ curl -i https://git.io -F "url=https://github.com/..."
HTTP/1.1 201 Created
Location: https://git.io/abc123

$ curl -i https://git.io/abc123
HTTP/1.1 302 Found
Location: https://github.com/...

You can specify your own code to setup your own vanity URL:

$ curl -i https://git.io -F "url=https://github.com/technoweenie" \
    -F "code=t"
HTTP/1.1 201 Created
Location: https://git.io/t

You can currently see Git.io in action on certain 3rd party services, such as Campfire.

![](https://github-images.s3.amazonaws.com/blog/2011/gitio.jpg)

The Code

Git.io was written and deployed by myself and @atmos as a big experiment. Our goals:

  • Use Riak.
  • Deploy on Rackspace Cloud.

You can assemble your own with Guillotine, the URL shortening hobby kit. It's written in Ruby as a Sinatra app, and supports storing links in Riak or a relational DB.

Though a URL shortener is about the easiest project one could take on, it gave us a chance to experiment. As a result, I've been able to spread my excitement about Riak to more people at GitHub. The Riak 1.0 upgrade gave us a chance to experiment with a rolling upgrade across our cluster (for the new version, and the new leveldb backend). We also have better support with our Hubot and puppet scripts for managing deployments through Rackspace Cloud.

Edit like an Ace

Ace is a code editor written in JavaScript. It powers Cloud9 IDE and, as of today, file editing on GitHub.

If you're using a recent version of Safari, Chrome, or Firefox here's how it works:

1. Hit the "Edit" button (or the e hotkey) on any blob

2. Edit your code

The basics should all work: TAB to indent, Shift+TAB to unindent, Command+/ (OSX) or Control+/ (Win/Linux) to comment out or uncomment a line.

3. Preview your changes

4. Commit!

Options

While we try to guess whether your file is using tabs or spaces and the indentation level, you can set those yourself using the options in the top right of the editor view:

Richtext

If the file you're editing is Markdown, Textile, or any other richtext format GitHub supports we'll render a preview of it instead of a diff:

More modes

If your favorite language isn't being highlighted, consider adding a mode for it! Check out Ace's "Creating or Extending an Edit Mode" wiki page for more info.

Ace

This is just the start. Help us make editing on GitHub even better by forking and improving Ace at ajaxorg/ace.

As always, please email support@github.com with any bugs you find or ideas you have. Happy hacking!

Block the Bullies

GitHub has always been about collaboration: we want to make it easy for you to work with other people to build great software. Whether that's a co-worker sitting next to you or a stranger across the globe, it doesn't matter. Collaboration should be easy and fun.

Unfortunately, there will always be people whose idea of fun is bullying others. That's why today we are rolling out a feature designed to help you control the people you interact with on GitHub.

block

Every profile page on GitHub now has a settings gear which lets you block a user or report them for abuse. This feature is inspired by Twitter and other social networks that let you decide whose content you do and don't see.

This is just the beginning. Moving forward, we'll integrate this functionality more tightly with the site and continue adding features to help you better control your GitHub experience.

Finally, we're sorry if abusive users caused you to have a bad GitHub experience. When you log in to github.com you should see things that make you happy: new Pull Requests, comments on your Issues, messages from people who love your software, intense debates about the quality of code. To that end, GitHub should have had "block user" a long time ago.

Let us know if you have ideas on how we can continue to improve GitHub, and please report anyone causing trouble to support@github.com — we're here to help.

Rolling out the Redcarpet

Here at GitHub, we love Markdown. We use it everywhere: to render the wikis, issues, pull requests, and all user-generated comments. We even encourage developers to write their READMEs in this awesome markup language. In fact, we use it so much that we've learnt a few lessons on Markdown parsing the hard way.

Every day, GitHub renders thousands of Markdown documents with all kinds of user-submitted content, ranging from poorly formatted to downright malicious. Your average Markdown parser is not prepared to deal with potentially pathological inputs, and hence is vulnerable to DOS attacks. That's why we've decided to take Natacha Porté's awesome library, Upskirt, and pimped it with everything you'd expect in a Markdown library for the web - both in features and in security.

Our fork of the library also comes with a Ruby wrapper, aptly named Redcarpet. Redcarpet works as a drop-in replacement for BlueCloth and RDiscount; we've been slowly deploying it through all our frontend machines, and so far none of them has caught 🔥. We consider this a tremendous success, but since we strive for perfection, please report any rendering errors you may encounter in your Markdown documents to help us improve the library.

Finally, to celebrate the release of the new library we're enabling syntax highlighted code blocks in GitHub Flavored Markdown.

Four space indentation is now no longer required when including code, backtraces and other text in a comment, issue, Gist or any other Markdown-enabled text. Instead, simply create a fenced block with ```. An optional language identifier after the backticks will syntax highlight the code in that language.

``` ruby
require 'redcarpet'
markdown = Redcarpet.new("Hello World!")
puts markdown.to_html
```
require 'redcarpet'
markdown = Redcarpet.new("Hello World!")
puts markdown.to_html

GitHub + Rebase Support in IntelliJ

The IntelliJ platform has had GitHub integration for a few months, but today they rolled out Advanced GitHub Integration: Rebase My GitHub Fork.

What is it? Rebase support.

Now staying up to date with upstream repos is easier than ever in IntelliJ! Best of all, it's available in the free community edition of IDEA as well as the commercial edition. Integration for other products is underway.

Check their blog post for more information and a screencast, and keep watching for more awesome stuff from the JetBrains team in the future.

Download IntelliJ

Mind Control with Frickin Lasers

Several months ago I hosted a GitHub meetup in Boston to which tons of local geeks attended and drank free beer. During that meeting, I talked to a local graduate student in biophysics at Harvard named Andrew Leifer who told me that he loved GitHub and was in fact using it to collaborate on a program that accomplished mind control. with lasers. on worms.

Well, it turns out that I had not in fact been drinking too much and the project is real. Andrew's research is called CoLBeRT: Controlling Locomotion and Behavior in Real-Time and works by running real-time analysis on video of a 1mm long specially bred light-sensitive C. elegans worm. The CoLBeRT system tracks the worm as it moves and shines laser light on specific neurons as the worm is moving to stimulate or inhibit those neurons.

The system can make the worm paralyzed, lay eggs, back up, speed up or sense touch in different areas of its body, all by directing laser light into specific neurons. That's right, I said lay eggs. Check out this kick-ass laser:

If you aimed that at me, I'd probably lay eggs too.

Andrew's research has recently been published in Nature Methods and covered in Science News and Scientific American and true to his word the source code for laser worm mind control is on GitHub, aptly named MindControl, and is open source.

So, though the human brain has over 100 billion neurons instead of the worm's 302, and we're not photosensitive, this project brings us 1 evil pull request closer to total world domination. with frickin lasers.

3D GitHub Badge with Pure HTML/CSS

Nico Hagenburger released a nifty bit of HTML/CSS today that allows you to get the famous "Fork me on GitHub" banner on the corner of your site. But this time, with a twist. Literally. If you hover over the banner in Safari, the banner will flip around and show some alternate text on the other side. Fancy!

Go take a look and try it out for yourself!

Online Git Training Series

This December 14th we'll be kicking off the first in a series of all-day online Git training sessions. The series will be taught by our partner Matthew McCullough of Ambient Ideas, who has been doing excellent Git training and talking all over the world lately.

If you're interested in you or your colleagues getting a one-day crash course in Git, check out our new training page and sign up for the first course. Pants are of course, optional. As they are in our in-person trainings.

https://github.com/training/online

Hope to see you there!

Get Good with Git

If you have little or no experience on the command line, and no experience with Git, this might be the ebook for you: Getting Good With Git. From command line basics to using GitHub for collaboration, there's a ton of great stuff in this thing.

With great content and a beautiful layout, you should definitely check it out if you want to learn Git.

Sidejack Prevention Phase 3: SSL Proxied Assets

This is the third, and hopefully final, response to session hijacking on github.com. We've been safe from session hijacking for a while now but we were still serving pages with mixed-content warnings. People have complained about these warnings in the past but it still remains an issue in most browsers. We want our users to focus on getting things done and we want them to feel secure while they use our site.

A few of our pages allowed people to embed images directly via github flavored markdown. Our users find this really useful and we wanted to avoid leaving people's browsers looking like this:

insecure

You can now link to remote images in your comments/readmes/issues without creating mixed content warnings.

issue preview

We did this by rewriting the src attribute on img tags when we render github flavored markdown. The src attribute is rewritten to proxy through our normal asset servers so it appears to come from a secure source. On the backend we wrote a simple HTTP proxy in node that runs behind our normal nginx setup. The code is available here.

Please open a support ticket if you find pages on the site that are still generating mixed content warnings. So far the system seems to be holding up well and we're ready to get back to hacking on features for GitHub. Thanks for your patience over the last few weeks.

Sidejack Prevention Phase 2: SSL Everywhere

Last Tuesday, we rolled out a secure cookies for all SSL-protected pages. This meant that all private repositories, user dashboards, all admin settings (even for free users and repositories) were protected against sidejacking attempts. However, any user actions on gists and public repositories (such as issues, wikis, downloads) were still vulnerable.

Last night, we rolled out the next phase from our latest security audit: SSL everywhere. Every hit to the website, whether you're logged in or not, is over HTTPS with a secure cookie.

This is a big step, but we're still seeing some resources being served directly from other sites and giving SSL warnings. We're going to address this issue next. In the meantime your browsers might give warnings that look like this.

Insecure Resources

Our next step will be to fix these insecure assets that you might see in commit and issue comments. We're hoping to have the remaining issues fixed over the next few days.

Our new build status indicator

If you've been following us on the twitters, you know that we recently got our first office space in San Francisco. Our first course of action was to immediately purchase a stoplight.

But what do you do with a stoplight? Hook up arduino to ci joe to show the status of our build of course!

Luckily our friend Greg flew out and did the arduino magic to get it all working. He even wrote a post explaining the process and open sourced the arduino firmware.

Thanks Greg!

Campfire Service Hook Returns

“Hey, why isn’t anyone working this week?!”

Actually, they’re working hard. It’s the Service Hooks that took the week off.

37signals released a [Campfire API](http://developer.37signals.com/campfire/) which requires you to provide an API Token instead of a username / password when using Campfire’s API.

What this means is you need to enter your Campfire API Token in your Campfire Service Hook to get it working again.

Here’s our Illustrated Upgrade Guide:

Now get back to work!

Introducing Resque

Resque is our Redis-backed library for creating background jobs, placing
those jobs on multiple queues, and processing them later.

Background jobs can be any Ruby class or module that responds to
perform. Your existing classes can easily be converted to background
jobs or you can create new classes specifically to do work. Or, you
can do both.

All the details are in the README. We've used it to process
over 10m jobs since our move to Rackspace and are extremely happy with it.

But why another background library?

A Brief History of Background Jobs

We've used many different background job systems at GitHub. SQS,
Starling, ActiveMessaging, BackgroundJob, DelayedJob, and beanstalkd.
Each change was out of necessity: we were running into a
limitation of the current system and needed to either fix it or move
to something designed with that limitation in mind.

With SQS, the limitation was latency. We were a young site and heard
stories on Amazon forums of multiple minute lag times between push and
pop. That is, once you put something on a queue you wouldn't be able
to get it back for what could be a while. That scared us so we moved.

ActiveMessaging was next, but only briefly. We wanted something
focused more on Ruby itself and less on libraries. That is, our jobs
should be Ruby classes or objects, whatever makes sense for our app,
and not subclasses of some framework's design.

BackgroundJob (bj) was a perfect compromise: you could process Ruby
jobs or Rails jobs in the background. How you structured the jobs was
largely up to you. It even included priority levels, which would let
us make "repo create" and "fork" jobs run faster than the "warm some
caches" jobs.

However, bj loaded the entire Rails environment for each job. Loading
Rails is no small feat: it is CPU-expensive and takes a few
seconds. So for a job that may take less than a second, you could have
8 - 20s of added overhead depending on how big your app is, how many
dependencies it requires, and how bogged down your CPU is at that time.

DelayedJob (dj) fixed this problem: it is similar to bj, with a
database-backed queue and priorities, but its workers are
persistent. They only load Rails when started, then process jobs in a
loop.

Jobs are just YAML-marshalled Ruby objects. With some magic you can
turn any method call into a job to be processed later.

Perfect. DJ lacked a few features we needed but we added them and
contributed the changes back.

We used DJ very successfully for a few months before running into some
issues. First: backed up queues. DJ works great with small datasets,
but once your site starts overloading and the queue backs up (to, say,
30,000 pending jobs) its queries become expensive. Creating jobs can
take 2s+ and acquiring locks on jobs can take 2s+, as well. This means
an added 2s per job created for each page load. On a page that fires
off two jobs, you're at a baseline of 4s before doing anything else.

If your queue is backed up because your site is overloaded, this added
overhead just makes the problem worse.

Solution: move to beanstalkd. beanstalkd is great because it's fast,
supports multiple queues, supports priorities, and speaks YAML
natively. A huge queue has constant time push and pop operations,
unlike a database-backed queue.

beanstalkd also has experimental persistence - we need persistence.

However, we quickly missed DJ features: seeing failed jobs, seeing
pending jobs (beanstalkd only allows you to 'peek' ahead at the next
pending job), manipulating the queue (e.g. running through and
removing all jobs that were created by a bug or with a bad job name),
etc. A database-queue gives you a lot of cool features. So we moved
back to DJ - the tradeoff was worth it.

Second: if a worker gets stuck, or is processing a job that will take
hours, DJ has facilities to release a lock and retry that job when
another worker is looking for work. But that stuck worker, even
though his work has been released, is still processing a job that you
most likely want to abort or fail.

You want that worker to fail or restart. We added code so that,
instead of simply retrying a job that failed due to timeout, other
workers will a) fail that job permanently then b) restart the locked
worker.

In a sense, all the workers were babysitting each other.

But what happens when all the workers are processing stuck or long
jobs? Your queue quickly backs up.

What you really need is a manager: someone like monit or god who can
watch workers and kill stale ones.

Also, your workers will probably grow in memory a lot during the
course of their life. So you need to either make sure you never create
too many objects or "leak" memory, or you need to kill them when they
get too large (just like you do with your frontend web instances).

At this point we have workers processing jobs with god watching them
and killing any that are a) bloated or b) stale.

But how do we know all this is going on? How do we know what's sitting
on the queue? As I mentioned earlier, we had a web interface which
would show us pending items and try to infer how many workers are
working. But that's not easy - how do you have a worker you just
kill -9'd gracefully manage its own state? We added a process to
inspect workers and add their info to memcached, which our web
frontend would then read from.

But who monitors that process. And do we have one running on each
server? This is quickly becoming very complicated.

Also we have another problem: startup time. There's a multi-second
startup cost when loading a Rails environment, not to mention the
added CPU time. With lots of workers doing lots of jobs being
restarted on a non-trival basis, that adds up.

It boils down to this: GitHub is a warzone. We are constantly
overloaded and rely very, very heavily on our queue. If it's backed
up, we need to know why. We need to know if we can fix it. We need
workers to not get stuck and we need to know when they are stuck.

We need to see what the queue is doing. We need to see what jobs have
failed. We need stats: how long are workers living, how many jobs are
they processing, how many jobs have been processed total, how many
errors have there been, are errors being repeated, did a deploy
introduce a new one?

We need a background job system as serious as our web framework.
I highly recommend DelayedJob to anyone whose site is not 50%
background work.

But GitHub is 50% background work.

In Search of a Solution

In the Old Architecture, GitHub had one slice dedicated to processing
background jobs. We ran 25 DJ workers on it and all they did was run
jobs. It was known as our "utility" slice.

In the New Architecture, certain jobs needed to be run on certain
machines. With our emphasis on sharding data and high availability, a
single utility slice no longer fit the bill.

Both beanstalkd and bj supported named queues or "tags," but DelayedJob
did not. Basically we needed a way to say "this job has a tag of X"
and then, when starting workers, tell them to only be interested in
jobs with a tag of X.

For example, our "archive" background job creates tarballs and zip
files for download. It needs to be run on the machine which serves
tarballs and zip files. We'd tag the archive job with "file-serve" and
only run it on the file serving slice. We could then re-use this tag
with other jobs that needed to only be run on the file serving slice.

We added this feature to DelayedJob but then realized it was an
opportunity to re-evaluate our background job situation. Did someone
else support this already? Was there a system which met our upcoming
needs (distributed worker management - god/monit for workers on
multiple machines along with visibility into the state)? Should we
continue adding features to DelayedJob? Our fork had deviated from
master and the merge (plus subsequent testing) was not going to be fun.

We made a list of all the things we needed on paper and started
re-evaluating a lot of the existing solutions. Kestrel, AMQP,
beanstalkd (persistence still hadn't been rolled into an official
release a year after being pushed to master).

Here's that list:

  • Persistence
  • See what's pending
  • Modify pending jobs in-place
  • Tags
  • Priorities
  • Fast pushing and popping
  • See what workers are doing
  • See what workers have done
  • See failed jobs
  • Kill fat workers
  • Kill stale workers
  • Kill workers that are running too long
  • Keep Rails loaded / persistent workers
  • Distributed workers (run them on multiple machines)
  • Workers can watch multiple (or all) tags
  • Don't retry failed jobs
  • Don't "release" failed jobs

Redis to the Rescue

Can you name a system with all of these features:

  • Atomic, O(1) list push and pop
  • Ability to paginate over lists without mutating them
  • Queryable keyspace, high visibility
  • Fast
  • Easy to install - no dependencies
  • Reliable Ruby client library
  • Store arbitrary strings
  • Support for integer counters
  • Persistent
  • Master-slave replication
  • Network aware

I can. Redis.

If we let Redis handle the hard queue problems, we can focus on the
hard worker problems: visibility, reliability, and stats.

And that's Resque.

With a web interface for monitoring workers, a parent / child forking
model for responsiveness, swappable failure backends (so we can send
exceptions to, say, Hoptoad), and the power of Redis, we've found
Resque to be a perfect fit for our architecture and needs.

web ui

We hope you enjoy it. We certainly do!

NY State Senate Code on GitHub

If you’re in New York, or are interested in Open Government initiatives, you may be excited to know that the New York State Senate has opened up to the online community in a big way. They have put up a Free and Open-Source Software & Services website that provides and documents an API for all of their legislative data, feeds and widgets for that data, a browser for that data that even uses Disqus to allow you to comment on legislation, and open source software projects that help consume that data.

The cool thing for us is that they’ve put all their open source projects up on GitHub at github.com/nysenate for you to use and improve.

As a user of Open-Source software the New York Senate wants to help give back to the community that has given it so much – including this website. To meet its needs the Senate is constantly developing new code and fixing existing bugs. Not only does the Senate recognize that it has a responsibility to give back to the Open Source community, but public developments, made with public money should be public.

We are very happy that we can help them share these projects, and I hope more local and federal government efforts will open up to this degree. Congratulations to the New York Senate for moving forward with openness and accountability.