GitHub data, ready for you to explore with BigQuery

GitHub data is available for public analysis using Google BigQuery, and we’d like to help you take it for a spin.

If you'd like to learn more about what data is available and how it's been used so far, watch this conversation between GitHub Data Analyst Alyson La and Google Developer Advocate Felipe Hoffa. You'll learn the story behind the datasets and what types of analysis they make possible. You'll also see how we've visualized data with Tableau and Looker.

There's a lot of data out there, but it's all available through BigQuery in two large data sets. The original, community-led GitHub Archive project launched in 2012 and captures almost 30 million events monthly, including issues, commits, and pushes. Last year, we worked with Google to release The GitHub Public Data Set, separate tables with information on all projects that have open source licenses, including commits, file contents, and file paths.

You can also use the GH torrent project to complement the existing datasets with additional metadata.

We ran these queries on the datasets above to create the open source section of our Octoverse report, but anyone can run an analysis. Here are the results of some of the queries run so far.

  • "This should never have happened" has appeared in code comments more than a million times (hear this data point for yourself in this Changelog episode)
  • Where does open source happen? GitHub top countries shares which countries have the most open source developers per capita
  • How reliable is GitHub? Felipe runs a query to find out in GitHub reliability with BigQuery
  • There are a lot of feels in open source. Geeksta examines how emotions are expressed in GitHub commit messages
  • Jessie Frazelle analyzed looked at the top 15 projects on GitHub in terms of pull requests opened vs. pull requests closed

Happy exploring!

Flatiron School joins the GitHub Student Developer Pack

Flatiron School has joined the Student Developer Pack to offer students one free month of their Community-Powered Bootcamp, a flexible online course in web development.

Flatiron School joins the Student Developer Pack

The Community-Powered Bootcamp is a self-paced subscription program for beginners. You'll learn online using the same course of study as the Web Developer Program—a comprehensive curriculum tailored to job seekers. In a month, you can pick up a few in-demand skills and work with a community of other learners to start reaching your goals, whether they are technical literacy, a new programming language, or a new career.

The details

  • Get the first month of tuition free
  • Start 800+ hours of rigorous web development coursework
  • Take on topics like HTML, CSS, JavaScript, Node.JS, React, and Ruby on Rails
  • Learn online and at your own pace with a curated community of students
  • Build a portfolio
  • Get help when you need it, 24/7

After one month, you can sign up for a monthly subscription of $149 USD.

The Student Developer Pack gives students free access to the best developer tools from different technology companies like Datadog, GitKraken, Travis CI, and Unreal Engine. Sign up for the pack, and start learning.

New theme chooser for GitHub Pages

You can now build a GitHub Pages website with a Jekyll theme in just a few clicks.

  1. Create a new GitHub repository or go to an existing one.
  2. Open the theme chooser in the GitHub Pages section of your repository settings.
  3. Select a theme.

Theme chooser screenshot

Using a Jekyll theme means that your website content lives in Markdown files, which you can edit as needed and manage using your favorite Git workflow.

As soon as you apply a Jekyll theme to your site, you can add more pages simply by committing new Markdown files.

The theme chooser replaces the old automatic page generator which didn't use Jekyll. Rest assured, existing GitHub Pages created with the automatic page generator will automatically use a matching Jekyll theme the first time you use the theme chooser.

Finally, the Jekyll themes in the theme chooser are all open sourced on GitHub.

For additional information, check out the documentation.

Game Off IV Highlights

Last month, we challenged you to create a game based on the theme hacking, modding, and/or augmenting. For all those that submitted entries, thank you! A little holiday gift will be working its way to your inbox shortly.

Here's a selection of some of our favorites that you can play, hack on, or learn from. Enjoy!

Sir Jumpelot

Sir Jumpelot
» Play in browser · View source

The main character (who has a strong resemblance to Hubot) must survive waves of enemies with some calculated jumping. Be careful: new mods change the game mechanics as you play. Created by @MelvinPoppelaars with Unity and hosted on itch.io.

Lighting

Lighting screenshot
» Play in browser · View source

This illuminating game is inspired by a couple of real-life hacks on MIT's Green Building and the Cira Centre in Philadelphia. Created by @hiddenwaffle using three.js, tween.js, and howler.js. Hosted on GitHub Pages.

Muntz

Muntz screenshot
» Play in browser · View source

A puzzle game inspired by the work of TV Pioneer Earl "Madman" Muntz, a TV hacker from the 1950s. Attempt to simplify increasingly complex circuits. Learn a little about American TV and broadcast history in the process. Created by @joemagiv with Unity.

Boxie Coody

Boxie Coody
» Play in browser · View source

A Sokoban-like puzzle game, where a friendly bot called Coody teaches you how to complete your work, and it gets more challenging every day. Created by @LastFlower, @LastLeaf, and @tongtongggggg. Graphics created with Inkscape, sound effects created with LMMS and GeneralUser GS. Hosted on GitHub Pages.

Pongout

Pongout
» Play in browser (mobile friendly) · View source

What happens when you remix Pong with Breakout? Pongout! Created by @kurehajime and written in JavaScript. Hosted on GitHub Pages.

Mine Hacker

Mine Hacker
» Download · View source

A dungeon-crawling, roguelike Minesweeper game in which you navigate a map of hidden tiles by using the small amount of information provided by the robot's sensors. Created by @BelowParallelStudios using Unity. Music was created with Bosca Ceoil. Graphics created with Aseprite and GIMP.

Byter

Byter
» Play in browser · View source · Download

Byter is an open source clicker game, where you have to hack various targets. Created by @KyleBanks with Unity.

Hacking in Progress

Hacking in Progress
» Download (Win) · View source

Run around the computer store in this 80s-style stealth hacking game, but be sure to avoid the clerks and security cameras. Created by @flipcoder using their very own open source 2D/3D OpenGL game engine called Qor.

Airplane

Airplane
» Play in browser · View source

Enjoy this one with a friend or coworker as you try to destroy one another's airplanes. Created by @IonicaBizau. Written in JavaScript and hosted on GitHub Pages.

The Terminal

The Terminal
» Download (Windows) · Build from source (All platforms)

A cooperative hacking game in the style of Keep Talking and Nobody Explodes. One player must hack into a computer terminal, while the other player follows instructions from the manual. Created by @Juzley and @paulo6. Written in Python using Pygame.

Hackshot

Hackshot
» Play in browser · View source

A JavaScript coding game where you programmatically control a cannon to take down hordes of incoming enemies. Created by @buch415 using CodeMirror for the code editor and syntax highlighting. Hosted on GitHub Pages.

Code Explorer

Code Explorer
» Play in browser · View source

You control a voxel programmer as he (literally) jumps through the program's code. What will happen when you reach the end? Play to quine out. Created by @michalbe using JavaScript and hosted on GitHub Pages.

Pwnterman

Pwnterman
» Play in browser · View source

The README.md file suggests it's the worst game ever made. We beg to differ. This was made in two hours, and it's controlled with Vim Key Bindings H J K L. What's not to like? Created by @joshbressers.

Unnamed Hacking Themed Game

Unnamed Hacking Themed Game
» Download (Win) · View source

Use various attacks to alter enemy defenses and objects to your advantage in this 2D action platforming game. Created by @DamienPWright using Unity.

Tech Wars

Tech Wars
» Play in browser · View source · Download (Win, Mac, Linux)

The team at Vital built a Unity game that introduces a digitally enhanced virus to a tech startup. Help your infected coworkers using your Nerf darts dipped in the antidote.

Mona's Escape

Mona's Escape
» Play in browser · View source

Here's a small demo where you play as Mona trying to escape a containment cell in Area 51. Watch out for the guards, though. Cat and octopus hybrids are are not well equipped for combat. Created by @joshuashoemaker using Pixi.js

Knot Fun

Knot Fun
» View source · Store: Android + iOS

Don't let the name mislead you. This game about untangling knots is lotsa fun.* Created by @AStox using C#, Unity, and GIMP.

* If you love untangling your earphone cords, Christmas tree lights, kids' shoelaces, etc.

TPS Report Simulator

TPS Report Simulator
» Play in browser · View source

Don't let your coworkers stop you from hacking all the terminals on your way out the door. Use the red staplers and ISP CDs to your advantage. Created by @bjshively and @spaghettioh using the JavaScript game engine, Phaser. Hosted on Heroku.

Zombie Chess

Zombie Chess
» Play in browser · View source

Using open source libraries and frameworks, Gravitywell and friends created a chess game that sees iconic rock musicians fight against contemporary EDM DJs. Built with Meteor.js, Stockfish.js (AI), chessboard.js (UI), and Howler.js (SFX).

Immolation Organization

Immolation Organization
» Download from the Apple App Store · Build from source

Take control of a mob that attempts to control a group of bad guys. Created by @EtherTyper, @AnimatorJoe, and @SamHollenbeck from Westlake High School's Accessible Programming Club using. Created with Swift and SpriteKit. Available too from Apple's App Store.

Hackable Mi

Hackable Mi
» Play in browser · View source

Guide Mi through an array of puzzles and mazes with code. Created by @Vandise and hosted on Heroku.

Octopout

Octopout
» Download (Win) · View source

In this Breakout clone, octopusses have been transformed into metal creatures. Created by @panoramix360, @guilhermekf, @fpjuni, and @stopellileo using GameMaker.

Git 2.11 has been released

The open source Git project has just released Git 2.11.0, with features and bugfixes from over 70 contributors. Here's our look at some of the most interesting new features:

Abbreviated SHA-1 names

Git 2.11 prints longer abbreviated SHA-1 names and has better tools for dealing with ambiguous short SHA-1s.

You've probably noticed that Git object identifiers are really long strings of hex digits, like 66c22ba6fbe0724ecce3d82611ff0ec5c2b0255f. They're generated from the output of the SHA-1 hash function, which is always 160 bits, or 40 hexadecimal characters. Since the chance of any two SHA-1 names colliding is roughly the same as getting struck by lightning every year for the next eight years1, it's generally not something to worry about.

You've probably also noticed that 40-digit names are inconvenient to look at, type, or even cut-and-paste. To make this easier, Git often abbreviates identifiers when it prints them (like 66c22ba), and you can feed the abbreviated names back to other git commands. Unfortunately, collisions in shorter names are much more likely. For a seven-character name, we'd expect to see collisions in a repository with only tens of thousands of objects2.

To deal with this, Git checks for collisions when abbreviating object names. It starts at a relatively low number of digits (seven by default), and keeps adding digits until the result names a unique object in the repository. Likewise, when you provide an abbreviated SHA-1, Git will confirm that it unambiguously identifies a single object.

So far, so good. Git has done this for ages. What's the problem?

The issue is that repositories tend to grow over time, acquiring more and more objects. A name that's unique one day may not be the next. If you write an abbreviated SHA-1 in a bug report or commit message, it may become ambiguous as your project grows. This is exactly what happened in the Linux kernel repository; it now has over 5 million objects, meaning we'd expect collisions with names shorter than 12 hexadecimal characters. Old references like this one are now ambiguous and can't be inspected with commands like git show.

To address this, Git 2.11 ships with several improvements.

First, the minimum abbreviation length now scales with the number of objects in the repository. This isn't foolproof, as repositories do grow over time, but growing projects will quickly scale up to larger, future-proof lengths. If you use Git with even moderate-sized projects, you'll see commands like git log --oneline produce longer SHA-1 identifiers. [source]

That still leaves the question of what to do when you somehow do get an ambiguous short SHA-1. Git 2.11 has two features to help with that. One is that instead of simply complaining of the ambiguity, Git will print the list of candidates, along with some details of the objects. That usually gives enough information to decide which object you're interested in. [source]

SHA-1 candidate list

Of course, it's even more convenient if Git simply picks the object you wanted in the first place. A while ago, Git learned to use context to figure out which object you meant. For example, git log expects to see a commit (or a tag that points to a commit). But other commands, like git show, operate on any type of object; they have no context to guess which object you meant. You can now set the core.disambiguate config option to prefer a specific type. [source]

Automatically disambiguating between objects

Performance Optimizations

One of Git's goals has always been speed. While some of that comes from the overall design, there are a lot of opportunities to optimize the code itself. Almost every Git version ships with more optimizations, and 2.11 is no exception. Let's take a closer look at a few of the larger examples.

Delta Chains

Git 2.11 is faster at accessing delta chains in its object database, which should improve the performance of many common operations. To understand what's going on, we first have to know what the heck a delta chain is.

You may know that Git avoids storing files multiple times, because all data is stored in objects named after the SHA-1 of the contents. But in a version control system, we often see data that is almost identical (i.e., your files change just a little bit from version to version). Git stores these related objects as "deltas": one object is chosen as a base that is stored in full, and other objects are stored as a sequence of change instructions from that base, like "remove bytes 50-100" and "add in these new bytes at offset 50". The resulting deltas are a fraction of the size of the full object, and Git's storage ends up proportional to the size of the changes, not the size of all versions.

As files change over time, the most efficient base is often an adjacent version. If that base is itself a delta, then we may form a chain of deltas: version two is stored as a delta against version one, and then version three is stored as a delta against version two, and so on. But these chains can make it expensive to reconstruct the objects when we need them. Accessing version three in our example requires first reconstructing version two. As the chains get deeper and deeper, the cost of reconstructing intermediate versions gets larger.

For this reason, Git typically limits the depth of a given chain to 50 objects. However, when repacking using git gc --aggressive, the default is bumped to 250, with the assumption that it would make a significantly smaller pack. But that number was chosen somewhat arbitrarily, and it turns out that the ideal balance between size and CPU actually is around 50. So that's the default in Git 2.11, even for aggressive repacks. [source]

Even 50 deltas is a lot to go through to construct one object. To reduce the impact, Git keeps a cache of recently reconstructed objects. This works out well because deltas and their bases tend to be close together in history, so commands like git log which traverse history tend to need those intermediate bases again soon. That cache has an adjustable size, and has been bumped over the years as machines have gotten more RAM. But due to storing the cache in a fairly simple data structure, Git kept many fewer objects than it could, and frequently evicted entries at the wrong time.

In Git 2.11, the delta base cache has received a complete overhaul. Not only should it perform better out of the box (around 10% better on a large repository), but the improvements will scale up if you adjust the core.deltaBaseCacheLimit config option beyond its default of 96 megabytes. In one extreme case, setting it to 1 gigabyte improved the speed of a particular operation on the Linux kernel repository by 32%. [source, source]

Object Lookups

The delta base improvements help with accessing individual objects. But before we can access them, we have to find them. Recent versions of Git have optimized object lookups when there are multiple packfiles.

When you have a large number of objects, Git packs them together into "packfiles": single files that contain many objects along with an index for optimized lookups. A repository also accumulates packfiles as part of fetching or pushing, since Git uses them to transfer objects over the network. The number of packfiles may grow from day-to-day usage, until the next repack combines them into a single pack. Even though looking up an object in each packfile is efficient, if there are many packfiles Git has to do a linear search, checking each packfile in turn for the object.

Historically, Git has tried to reduce the cost of the linear search by caching the last pack in which an object was found and starting the next search there. This helps because most operations look up objects in order of their appearance in history, and packfiles tend to store segments of history. Looking in the same place as our last successful lookup often finds the object on the first try, and we don't have to check the other packs at all.

In Git 2.10, this "last pack" cache was replaced with a data structure to store the packs in most recently used (MRU) order. This speeds up object access, though it's only really noticeable when the number of packs gets out of hand.

In Git 2.11, this MRU strategy has been adapted to the repacking process itself, which previously did not even have a single "last found" cache. The speedups are consequently more dramatic here; repacking the Linux kernel from a 1000-pack state is over 70% faster. [source, source]

Patch IDs

Git 2.11 speeds up the computation of "patch IDs", which are used heavily by git rebase.

Patch IDs are a fingerprint of the changes made by a single commit. You can compare patch IDs to find "duplicate" commits: two changes at different points in history that make the exact same change. The rebase command uses patch IDs to find commits that have already been merged upstream.

Patch ID computation now avoids both merge commits and renames, improving the runtime of the duplicate check by a factor of 50 in some cases. [source, source]

Advanced filter processes

Git includes a "filter" mechanism which can be used to convert file contents to and from a local filesystem representation. This is what powers Git's line-ending conversion, but it can also execute arbitrary external programs. The Git LFS system hooks into Git by registering its own filter program.

The protocol that Git uses to communicate with the filter programs is very simple. It executes a separate filter for each file, writes the filter input, and reads back the filter output. If you have a large number of files to filter, the overhead of process startup can be significant, and it's hard for filters to share any resources (such as HTTP connections) among themselves.

Git 2.11 adds a second, slightly more complex protocol that can filter many files with a single process. This can reportedly improve checkout times with many Git LFS objects by as much as a factor of 80.

Git LFS improvements

The original protocol is still available for backwards compatibility, and the new protocol is designed to be extensible. Already there has been discussion of allowing it to operate asynchronously, so the filter can return results as they arrive. [source]

Sundries

  • In our post about Git 2.9, we mentioned some improvements to the diff algorithm to make the results easier to read (the --compaction-heuristic option). That algorithm did not become the default because there were some corner cases that it did not handle well. But after some very thorough analysis, Git 2.11 has an improved algorithm that behaves similarly but covers more cases and does not have any regressions. The new option goes under the name --indent-heuristic (and diff.indentHeuristic), and will likely become the default in a future version of Git. [source]
  • Ever wanted to see just the commits brought into a branch by a merge commit? Git now understands negative parent-number selectors, exclude the given parent (rather than selecting it). It may take a minute to wrap your head around that, but it means that git log 1234abcd^-1 will show all of the commits that were merged in by 1234abcd, but none of the commits that were already on the branch. You can also use ^- (omitting the 1) as a shorthand for ^-1. [source]
  • There's now a credential helper in contrib/ that can use GNOME libsecret to store your Git passwords. [source]
  • The git diff command now understands --submodule=diff (as well as setting the diff.submodule config to diff), which will show changes to submodules as an actual patch between the two submodule states. [source]
  • git status has a new machine-readable output format that is easier to parse and contains more information. Check it out if you're interested in scripting around Git. [source]
  • Work has continued on converting some of Git's shell scripts to C programs. This can drastically improve performance on platforms where extra processes are expensive (like Windows), especially in programs that may invoke sub-programs in a loop. [source, source]

The whole shebang

That's just a sampling of the changes in Git 2.11, which contains over 650 commits. Check out the the full release notes for the complete list.


[1] It's true. According to the National Weather Service, the odds of being struck by lightning are 1 in a million. That's about 1 in 220, so the odds of it happening in 8 consecutive years (starting with this year) are 1 in 2160.

[2] It turns out to be rather complicated to compute the probability of seeing a collision, but there are approximations. With 5 million objects, there's about a 1 in 1035 chance of a full SHA-1 collision, but the chance of a collision in 7 characters approaches 100%. The more commonly used metric is "numbers of items to reach a 50% chance of collision", which is the square root of the total number of possible items. If you're working with exponents, that's easy; you just halve the exponent. Each hex character represents 4 bits, so a 7-character name has 228 possibilities. That means we expect a collision around 214, or 16384 objects.

New in the shop: The Octoplush

It's time to cozy up with the all new Octoplush collectable-available now in the GitHub Shop.

Share the Octoplush with friends and family. Just don't feed these octocats. They're already stuffed.

Now through Tuesday, enjoy 30% off everything in the GitHub Shop with discount code OCTOCYBER2016 and free shipping for orders over $30.

New in the Shop: GitHub Activity Book

Go ahead, color outside the lines with the GitHub Activity Book starring our very own Mona the Octocat! Now available in the GitHub Shop.

Activity Book

GitKraken joins the Student Developer Pack

GitKraken is now part of the Student Developer Pack. Students can manage Git projects in a faster, more user-friendly way with GitKraken's Git GUI for Windows, Mac, and Linux.

GitKraken joins the Student Developer Pack

GitKraken is a cross-platform GUI for Git that makes Git commands more intuitive. The interface equips you with a visual understanding of branching, merging and your commit history. GitKraken works directly with your repositories with no dependencies—you don’t even need to install Git on your system. You’ll also get a built-in merge tool with syntax highlighting as well as one-click undo and redo for when you make mistakes. Other features of GitKraken are:

  • Drag and drop to merge, rebase, reset, push
  • Resizable, easy-to-understand commit graph
  • File history and blame
  • View image diffs in app
  • Fuzzy finder and command palette
  • Submodules and Gitflow support
  • Easily clone, add remotes, and open pull requests in app
  • Keyboard shortcuts
  • Dark and light color themes
  • GitHub integration

Members of the pack get GitKraken Pro free for one year. With GitKraken Pro, Student Developer Pack members will get all the features of GitKraken plus:

  • The ability to resolve merge conflicts in the app
  • Multiple profiles for work and personal use
  • Support for GitHub Enterprise

Students can get free access to professional developer tools from companies like Datadog, Travis CI, and Unreal Engine. The Student Developer Pack lets you learn, experiment, and build software with the tools developers use at work every day without worrying about cost.

Students, get a Git GUI now with your pack.

Operation Code: connecting tech and veterans

Today is Veteran’s Day here in the United States, or Remembrance Day in many places around the world, when we recognize those who have served in the military. Today many businesses will offer veterans a cup of coffee or a meal, but one organization goes further.

You might have watched ex-Army Captain David Molina speak at CodeConf LA, or GitHub Universe about Operation Code, a nonprofit he founded in 2014 after he couldn’t use the benefits of the G.I. Bill to pay for code school. Operation Code lowers the barrier of entry into software development and helps military personnel in the United States better their economic outcomes as they transition to civilian life. They leverage open source communities to provide accessible online mentorship, education, and networking opportunities.

The organization is also deeply invested in facilitating policy changes that will allow veterans to use their G.I. Bill benefits at coding schools and boot camps, speeding up their re-entry to the workforce. Next week Captain Molina will testify in Congress as to the need for these updates. The video below explains more about their work.

Operation Code - On a mission to expand the GI Bill

Although Operation Code currently focuses on the United States, they hope to develop a model that can be replicated throughout the world.

Why Operation Code matters

Operation Code is working to address a problem that transcends politics. Here's a look into the reality U.S. veterans face:

  • The unemployment rate for veterans over the age of 18 as of August 2016 is 3.9% for men and 7.0% for women.
  • As of 2014, less that seven percent of enlisted personnel have a Bachelor’s degree or higher
  • More than 200,000 active service members leave the military every year, and are in need of employment
  • U.S. Studies show that members of underrepresented communities are more frequently joining the military to access better economic and educational opportunities

How you can help

Game Off Theme Announcement

GitHub Game Off 2016 Theme is Hacking, Modding, or Augmenting

We announced the GitHub Game Jam, our very own month-long game jam, a few weeks ago. Today, we're announcing the theme and officially kicking it off. Ready player one!

The Challenge

You have the entire month of November to create a game loosely based on the theme hacking, modding and/or augmenting.

What do we mean by loosely based on hacking, modding and/or augmenting? Here are some examples:

  • an endless runner where you hack down binary trees in your path with a pixelated axe
  • a modern take on a classic e.g. a roguelike set in a 3D or VR world
  • an augmented reality game bringing octopus/cat hybrids into the real world

Unleash your creativity. You can work alone or with a team and build for any platform or device. The use of open source game engines and libraries is encouraged but not required.

We'll highlight some of our favorites games on the GitHub blog, and the world will get to enjoy (and maybe even contribute to or learn from) your creations.

How to participate

  • Sign up for a free personal account if you don't already have one
  • Fork the github/game-off-2016 repository to your personal account (or to a free organization account)
  • Clone the repository on your computer and build your game
  • Push your game source code to your forked repository before December 1st
  • Update the README.md file to include a description of your game, how to play or download it, how to build and compile it, what dependencies it has, etc
  • Submit your final game using this form

It's dangerous to go alone

If you're new to Git, GitHub, or version control

  • Git Documentation: everything you need to know about version control and how to get started with Git
  • GitHub Help: everything you need to know about GitHub
  • Questions about GitHub? Please contact our Support team and they'll be delighted to help you
  • Questions specific to the GitHub Game Off? Please create an issue. This will be the official FAQ

The official Twitter hashtag for the Game Off is #ggo16. We look forward to playing your games.

GLHF! <3

Meet Nahi: Developer and Ruby Contributor

To highlight the people behind projects we admire, we bring you the GitHub Developer Profile blog series.

Hiroshi “Nahi” Nakamura

Hiroshi “Nahi” Nakamura, currently a Site Reliability Engineer (SRE) and Software Engineer at Treasure Data, is a familiar face in Ruby circles. Over the last 25 years, he has not only grown his own career but also supports developers all over the world as a Ruby code contributor. We spoke to Nahi about his work with Ruby and open source, as well as his inspiration for getting started as a developer.

You’ll notice this interview is shared in both Japanese (which the interview was conducted in) and English—despite our linguistic differences, open source connects people from all corners of the globe.

Aki: Give me the brief overview—who is Nahi and what does he do?

簡単に自己紹介をお願いします。中村浩士さんというのは、どんな方で、何をなさっている方でしょうか?

Nahi: I have been an open source software (OSS) developer since I encountered Ruby in 1999, as well as a committer to CRuby and JRuby. Right now, I am an SRE and software engineer at Treasure Data.

1999年にRubyと出会って以来のOSS開発者で、CRuby、JRubyのコミッタです。
現在勤めているTreasure Data Inc.という会社では、SRE兼ソフトウェアエンジニアをやっています。

Aki: How long have you been developing software?

今までどのくらいの期間に渡ってソフトウエアの開発を行ってこられたのでしょうか?

Nahi: I started to write my first Basic program when I was about twelve. During college, I began work at a Japanese system development company, and for the past 25 years, I’ve worked in software development at various companies and projects.

初めてBasicでプログラムを書き始めたのは12才の頃でした。大学在学中に日本のシステム開発会社でアルバイトを始め、以後様々な会社、プロジェクトで25年ほど、ソフトウェア開発に携わっています。

Aki: Who did you look up to in your early days?

ソフトウエア開発を始められた当初、どなたを尊敬されていたか教えて頂けますか?

Nahi: The research lab that I was part of in college had wonderful mentors. In addition, Perl and Common Lisp (of course!) had open source code and taught me that I could freely enhance those programming languages by myself.

The first addition that I made was to Perl (version 4.018), and I believe it was an enhancement on string processing to make it faster. Each program that runs Perl benefited from the change, and though it was small, it gave me an incredible feeling of accomplishment.

Since then, I have had great respect for the creator of the Perl programming language, Larry Wall, whose work has provided me with opportunities like this.

大学で在籍していた研究室には素晴らしい先輩がたくさんいて、PerlやCommon
Lispなどのプログラミング言語にも(もちろん!)ソースコードがあり、自分で自由に拡張できることを教えてくれました。

はじめて拡張したのはPerl(version 4.018)で、ある文字列処理の高速化だったと思います。Perlで動く各種プログラムすべてがよい影響を受け、小さいながらも、素晴らしい達成感を得られました。

その頃から、このような機会を与えてくれた、Perl作者のLarry Wallさんを尊敬しています。

Aki: Tell us about your journey into the world of software development (first computer, first project you contributed to, first program you wrote?)

ソフトウエア開発の世界に入って行かれた頃のお話をお聞かせ頂けますか?(最初に使ったコンピューター、最初に参画されたプロジェクト、最初に書いたプログラム等)

Nahi: I discovered Ruby shortly after I started to work as a software engineer. Until then, I had written in languages like C, C++, and SQL for software for work, and in Perl for my own development support tools.

Without a strong understanding of object-oriented programming, I studied and picked up tools on my own and started contributing to projects. Back then the Ruby community was small, and even a neophyte like myself had many opportunities to interact with brilliant developers working on the project. Through Ruby, I learned many things about software development.

The first open source (we called it ‘free software’ back then) Ruby program I distributed was a logger library. To this day, whenever I type require ‘logger’ in Ruby, it reminds me of that embarrassing code I wrote long ago. The logger library distributed along with Ruby today no longer shows any vestiges of the previously-existing code—it has evolved magnificently, molded into shape on a variety of different platforms and for a variety of different use cases.

ソフトウェアエンジニアとして働き始めてしばらくして、Rubyに出会いました。それまでは、C、C++、SQLなどで仕事用のソフトウェアを書き、Perlで自分向けの開発支援ツールを書いていました。

オブジェクト指向のなんたるかもよくわからず、勉強がてらそれらツールを移植していき、またRubyコミュニティの流儀にしたがって配布し始めました。その頃はRubyコミュニティも小さく、私のような新参者でも、Rubyコミュニティにいた素晴らしい開発者たちに触れ合える機会が多くあり、Rubyを通じ、ソフトウェア開発のいろいろなことを学びました。

最初にOSS(その頃はfree softwareと呼んでいました)として配布したRubyのプログラムは、ログ取得ライブラリです。今でもRubyでrequire 'logger'すると、いつでも昔の恥ずかしいコードを思い出すことができます。今Rubyと共に配布されているものは、いろいろなプラットフォーム、いろいろな用途の元で叩かれて、立派に成長しており、その頃の面影はもうありません。

Aki: What resources did you have available when you first got into software development?

ソフトウエア開発を始められた当初、お使いになっていたリソースがどのようなものだったか教えて頂けますか?

Nahi: I wrote SQL, Common Lisp, C—and everything on vi and Emacs. Perl was easy to modify and worked anywhere, so I really treasured it as a resource in my software developer’s toolbelt.

SQL、Common Lisp、C、なんでもviとemacsで書いていました。ソフトウェア開発者のツールベルトに入れる道具として、どこでも動き、変更がし易いPerlは大変重宝しました。

Aki: What advice would you give someone just getting into software development now?

ソフトウエア開発の世界に入ったばかりの方に、どのようなアドバイスを差し上げますか?

Nahi: I think that I came to be the software engineer I am today by participating in the open source community with loads of great developers and engaging in friendly competition with them, as well as trying out the knowledge I learned from the community in my professional life. As opposed to when I first came across Ruby, there are several unique communities now and a great deal of opportunities to leverage them professionally. I really don’t have much advice to share, but I hope that everyone will seek the opportunity to get to know a lot of great engineers.

ソフトウェア開発者としての私は、よい技術者がたくさん居るOSSコミュニティに参加し、彼らの切磋琢磨に参加することと、そこで得た経験を業務で試した経験により作られたと思っています。 でも、私がRubyと出会った頃とは違い、今はそのようなコミュニティがたくさんありますし、それを業務に活かすチャンスもたくさんありますね。私ができるアドバイスはほとんどありません。みなさんがよい技術者とたくさん知り合えることを祈っています。

Aki: If you forgot everything you knew about software development, and were to start learning to code today, what programming language might you choose and why?

もしソフトウエア開発に関して現在お持ちの知識を全て忘れて、今日からプログラミングを学ぶこととなった場合、どのプログラミング言語を選びますか?またその理由を教えて頂けますか?

Nahi: I would choose either Ruby or Python. If I still knew what I know now, it would be Python. I would select a language in which the OS and network are hidden only behind a thin veil and easily identified.

RubyかPythonを選びます。もし現在の知識が残っていればPythonですね。薄い皮の下に、OSやネットワークがすぐに見えるような言語をまた選びたいと思います。

Aki: On that note, you make a huge impact as part of Ruby's core contributing team. How specifically did you get started doing that?

Rubyのコアコントリビュートチームの一員として、(コミュニティーに)大きなインパクトを与えてこられましたが、具体的にどのような形/きっかけで(Rubyコミュニティーへの貢献を)始められたか教えて頂けますか?

Nahi: After releasing my first open source software, I went on to release several Ruby libraries that I made for work, such as network proxy, csv, logger, soap, httpclient, and others. With Ruby 1.8, Matz (Yukihiro “Matz” Matsumoto, the chief designer of Ruby) put a policy in place to expand the Standard Library in order to spread Ruby. This allowed the user to do everything they needed to do without additional libraries by simply installing it. A number of the libraries that I had made were chosen as candidates at the time, and I have mainly maintained the Standard Library ever since. The announced policy to expand the Standard Library was a great coincidence for me, since it allowed me to build experience.

初めてOSSで公開して以後、業務で使うために作ったRubyのライブラリをいくつか公開していきました。network proxy、csv、logger、soap、httpclientなど。Ruby 1.8の時、MatzがRubyを広めるために、標準添付ライブラリを拡充する方針を立てました。インストールすれば、追加ライブラリなしに一通りのことができるようにしよう、というわけです。その際に、私の作っていたライブラリもいくつか候補に選ばれ、以後主に、標準ライブラリのメンテナンスをするようになりました。標準添付ライブラリ拡充方針は、私が経験を積むことが出来たという点で、大変よい偶然でした。

Aki: For new contributors to Ruby, what do you think is the most surprising thing about the process?

Rubyの新たなコントリビューターの方にとって、(Rubyコミュニティーの開発)プロセスに関し、どのような部分が最も驚きのある部分とお考えになりますか?

Nahi: To be honest, I haven’t been able to contribute to Ruby itself over the past few years, so I am not aware of the details on the specific development process. However, I think the most surprising part is that it clearly does not look like there is a process.

In reality, a group of core contributors discuss and make decisions on the direction of development and releases, so to contribute to Ruby itself, you must ultimately propose an idea or make a request to those core contributors.

That’s the same with any community, though. One defining characteristic of the process might be that the proposals can be fairly relaxed, as there is no culture of creating formal documents.

正直に言うと、この数年はRubyそのものへのコントリビュートを行えていないので、具体的な開発プロセス詳細については把握していません。が、明らかに、プロセスがあるように見えないのが、一番驚きのある部分だと思います。

実際には、開発の方向性決定、リリースの決定については、一部のコアなコントリビュータが相談しつつ行っていて、Rubyそのものへのコントリビュートは、最終的には彼/彼女らに対する提案、要望となる必要があります。でもそれは、どのコミュニティでも同じですね。文書化の文化がない分、提案もわりとルーズで構わないのは特徴かもしれません。

Aki: Okay, we have to ask. What is the most interesting pull request you've received for Ruby?

お尋ねしなくてはならないことなのですが。。今までRubyの開発を行ってこられたなかで、(中村さんが)お受けになった最も興味深い/面白いPull Requestはどのようなものでしょうか?

Nahi: While not necessarily a “pull request,” I have received all sorts of suggestions that stand out: replacing the Ruby execution engine, swapping out the regular expression library, gemifying the Standard Library, etc. As for the most memorable pull request I have received personally, one was a request to swap out the CSV library I made for a different high-speed library. When I think about it with a clear mind, it was a legitimate request, but it took forever to make the right decision.

"Pull request"という名前ではありませんが、印象深いものはたくさんあります。Ruby実行エンジンの差し替え、正規表現ライブラリの置き換え、標準ライブラリのgem化など。私個人に関するものとしては、自身の作ったcsvライブラリを、別の高速ライブラリで置き換えたい、というリクエストが一番印象深いものでした。冷静に考えて正しいリクエストでしたが、適切な判断をするために、いちいち時間がかかりました。

Aki: Outside of your open source work, you also work full time as a developer. Does your participation in open source inform choices you make at work? How?

Open Sourceに関する活動とは別に、フルタイムのソフトウエア開発者としてご勤務されていますが、Open Sourceコミュニティへの参加は職場における(日々の)意思決定にどのような影響を与えていますか?

Nahi: Active involvement in open source is one of the pillars of business at the company I currently work for, and it informs the choices the other engineers and I make unconsciously. When developing something new for the business, we never begin work on a project without examining existing open source software and the open source community. As much as possible, we try not to make anything that replicates what something else does. However, if we believe it necessary, even if existing software does the same thing, we make products the way they should be made. Then, we compete with that and contribute our version back to the world as open source. The experiences and knowledge that we pick up, and also give back through the process, is the lifeblood of software development.

Until I came to my current company a year and a half ago, I led dozens of system development projects, mainly as a technical architect in the enterprise IT world for about 15 years. Back then, I participated in open source individually rather than at my company.

現在所属している会社は、Open Sourceへの積極的な関与をビジネスの柱の一つとしていることもあり、特に意識せずとも、私および各エンジニアの意思決定に影響を与えています。ビジネスのため、何か新しい物を開発する時、既存のOpen Sourceソフトウェア、またOSSコミュニティの調査なしに作り始めることはありません。可能な限り、用途が重複するものは作りません。しかしそうと信じれば、用途が同じでも、あるべき姿のものを作ります。そしてそれは、Open Sourceとして世の中に還元し、競争していきます。そのような中で得られる、また提供できる経験、知見は、ソフトウェア開発の血液のようなものです。

唐突ですが、1年半前に現在の会社に来る前までは、15年ほど、エンタープライズITの世界で、主にテクニカルアーキテクトとして数十のシステム開発プロジェクトをリードしていました。その頃は、会社ではなく個人でOSS活動を行っていました。

Aki: Tell us about your view on where the enterprise IT world is lagging behind. How do you see the open source developer community making a contribution to change that?

エンタープライズITの世界がどのような点で(Open Source等の世界)から遅れているとお考えになるか教えて頂けますか?Open Sourceコミュニティーのソフトウエア開発者の方々が、(エンタープライズITの状況を)変革させることに、どのような貢献ができるとお考えになっているか教えて頂けますか?

Nahi: In the enterprise IT world, we were trying to create a future that was predictable in order to control the complexity of business and the possibility of change. Now, however, it is hard to predict what things will be like one or two years down the road. The influence of this unpredictability is growing so significant that it cannot be ignored. Luckily, I was given the opportunity to lead a variety of projects, and what helped me out then was the experiences and knowledge I had picked up by being involved in the open source community.

To be honest, developers participating in the open source community now have already made a variety of contributions to the enterprise IT world, and I am one of those beneficiaries. To enhance the software development flow, developers in the enterprise IT world need to participate more in open source. I would venture to say that establishing such an environment and showing understanding towards it may be thought of as further contributions on the enterprise side.

エンタープライズITの世界では、ビジネスの複雑さと変更可能性をcontrolするため、予測可能な未来を作ろうとしていました。しかし今では、1年、2年後を予測するのは困難です。この予測できないことの影響は、無視できないほど大きくなっています。私は幸いにも、各種プロジェクトをリードする機会を与えられました。その時に役立ったのは、Open Sourceコミュニティとの関わりの中で得られた経験、知見でした。

正直に言うと、現在Open Sourceコミュニティに参加している開発者は、エンタープライズITの世界に、既に様々な貢献をされていると思います。私もその恩恵を受けた一人です。
ソフトウェア開発の血液を循環させるためには、エンタープライズITの世界に居る開発者が、もっとOpen Sourceコミュニティに参加できるようにならないといけません。しいて言えば、そのような環境を整えること、理解を示すこと、などは、更なる貢献として考えられることかもしれません。

To learn more about Nahi’s contributions to Ruby, visit his GitHub profile page here. You can also learn more about Ruby itself by visiting their homepage.

GitHub Shop: Octicon sticker packs are here

Octicons are meant to be shared. Get a pack of vinyl Octicon stickers to divvy up with friends—now available in the GitHub Shop.

Octicon Stickers

Top open source launches on GitHub

The open source community on GitHub has released some of the world's most influential technologies. Earlier this month, a new dependency manager for JavaScript called Yarn was launched and hit 10,000 stars by its second day on GitHub. Stars are an important measure of the community's interest and just one of the many ways to determine a project's success.

Based on the number of stars in a project's first week, here are the top open source releases on GitHub since 2015.

Chart of total stars in first week

Anime

Anime is a flexible and lightweight JavaScript animation library by @juliangarnier. Check out some of the incredible demos.

Released: June 27, 2016
Stars in the first week: 6,013

create-react-app

create-react-app was released by Facebook. Its success is a testament to the popularity of React, the fifth most starred project on GitHub.

Released: July 22, 2016
Stars in the first week: 6,348

Clipboard.js

Clipboard.js is a lightweight JavaScript library by @zenorocha that makes it easy to copy text to the clipboard, which used to require plugins in older browsers.

Released: November 27, 2015
Stars in the first week: 6,522

Visual Studio Code

VS Code is an Electron-based code editor from Microsoft. Whether it's text editors or libraries, Microsoft is using open source to build essential tools for developers.

Released: November 18, 2015
Stars in the first week: 7,847

N1

N1 is an extensible desktop mail app built on Electron. Themes and plugins make N1 a powerful and customizable mail client.

Released: October 5, 2015
Stars in the first week: 8,588

Material Design Lite

Material Design Lite from Google lets you add a Material Design look and feel to your static content websites. Check out the showcase to see great examples of what the community has built.

Released: July 7, 2015
Stars in the first week: 9,609

React Native

Released by Facebook, React Native is a framework for building native apps and is the second React project in this list.

Released: March 26, 2015
Stars in the first week: 10,976

Tensorflow

Tensorflow is an open source library for machine learning that was released by Google. With Tensorflow, developers can build intelligent systems using the same tools used by Google for Search, Gmail, Photos, speech recognition, and many other products.

Released: November 9, 2015
Stars in the first week: 11,822

Yarn

Yarn is a dependency manager for JavaScript released by Facebook, Exponent, Google, and Tilde. It aims to ease the management of dependencies in JavaScript projects with features like deterministic dependency resolution, more efficient and resilient networking, and offline mode.

Released: October 11, 2016
Stars in the first week: 16,068

Swift

Swift is a general-purpose programing language originally unveiled by Apple in 2014 and open-sourced in 2015. Swift is already in the top 15 most popular languages used on GitHub by number of opened Pull Requests and grew by 262% in the last year.

Released: December 3, 2015
Stars in the first week: 23,097


Open source software is about more than code. Getting the community engaged early is important to building momentum, and a successful launch attracts developers, designers, community managers, users, and companies that help the project thrive.

This data was gathered from queries against the GitHub Archive dataset available on Google BigQuery.

Introducing GitHub Community Guidelines

Building software should be safe for everyone. The GitHub community is made up of millions of developers around the world, ranging from the new developer who created their first "Hello World" project to the most well-known software developers in the world. We want the GitHub community to be a welcoming environment where people feel empowered to share their opinion and aren't silenced by fear or shouted down.

Beginning today, we will be accepting feedback on proposed GitHub Community Guidelines. By outlining what we expect to see within our community, we hope to help you understand how best to collaborate on GitHub and what type of actions or content may violate our Terms of Service. The policy consists of four parts:

  1. Best practices for building a strong community - people are encouraged to be welcoming, assume no malice, stay on topic, and use clear and concise language at all times.
  2. What to do if something offends you - project maintainers are encouraged to communicate expectations and to moderate comments within their community — including locking conversations or blocking users when necessary.
  3. What behavior is not allowed on GitHub - the community will not tolerate threats of violence, hate speech, bullying, harassment, impersonation, invasions of privacy, sexually explicit content, or active malware.
  4. What happens if someone breaks the rules - GitHub may block or remove content and may terminate or suspend accounts that violate these rules.

As always, we will continue to investigate any abuse reports and may moderate public content on our site that we determine to be in violation of our Terms of Service. To be clear, GitHub does not actively seek out content to moderate. Instead, we rely on community members like you to communicate expectations, moderate projects, and report abusive behavior or content.

Additionally, we are releasing the guidelines under the Creative Commons Zero License in hopes of encouraging other platforms to establish similar norms to govern their respective communities.

These guidelines are first and foremost community guidelines and we'd like to hear your thoughts on them before they're finalized. Please get in touch with us with any feedback or questions prior to November 20th, 2016. Together, we can make the open source community a healthy, inclusive place we can all be proud of.

Get testing with Taplytics in the Student Developer Pack

Taplytics is now offering mobile testing to students in the Student Developer Pack.

Taplytics joins the Student Developer Pack

Taplytics helps mobile developers create great experiences through: A/B testing, push notifications, and custom analytics. As part of the GitHub Student Developer Pack, Taplytics will give you complete access to its suite of tools for native mobile apps.

For members of the pack Taplytics is offering full, unlimited access to the platform free for 6 months. You will be able to do visual tests on your apps and make design decisions that work best for your users. You’ll be able to get analytics around your apps that help you iterate on your app in the future. Taplytics also includes tools that help you provide users with the right information at the right time.

The Student Developer Pack gives students free access to the best developer tools from different technology companies like Datadog, Travis CI, and Unreal Engine.

Students, get testing now with your pack.