Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Hadoop tool to import the GitHub public timeline
Java Shell
Branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.



Hadoop tool to import data from the GitHub public timeline XML format.

Run with:

$ hadoop jar hubstats.jar hubstats.HubStats input output

Output format

The following fields are output:

  • event_id - The unique numeric event id
  • event_type - The type of event
  • at - The date and time of the event
  • actor - The GitHub account that created the event
  • repo_account - The GitHub account associated with the related repository
  • repo_name - The name of the repository
  • branch - The Git branch name
  • tag - The Git tag name
  • alternate_id - An alternative numeric id if the event relates to an Issue, PullRequest or Gist
  • subtype - Further subtype of the event, for example an Issues event can be 'Opened' or 'Closed'

Sample Output

Sample output from a run over 30 gigs of the timeline requests (about 300,000 GitHub events in a bit over 11 days) is available in sample-output.gz.

$ gzcat sample-output.gz | awk '{print $3}' | sort | uniq -c | sort -nr
173260 Push
48186 Create
34782 Watch
12256 Gist
11756 Issues
9453 Follow
8731 Fork
8202 Gollum
6672 PullRequest
5132 Delete
4027 CommitComment
2527 Member
1666 Download
1053 ForkApply
 288 Public
Something went wrong with that request. Please try again.