Skip to content

Commit

Permalink
Add sample output from a capture of 11 days of GitHub traffic.
Browse files Browse the repository at this point in the history
Update README.
  • Loading branch information
Andrew Crump committed Nov 30, 2010
1 parent c42bffc commit 22c640c
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 0 deletions.
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
## About
Hadoop tool to import data from the GitHub public timeline XML format.

Run with:

$ hadoop jar hubstats.jar hubstats.HubStats input output

## Output format
The following fields are output:

Expand All @@ -16,3 +20,23 @@ The following fields are output:
* *tag* - The Git tag name
* *alternate_id* - An alternative numeric id if the event relates to an Issue, PullRequest or Gist
* *subtype* - Further subtype of the event, for example an Issues event can be 'Opened' or 'Closed'

## Sample Output
Sample output from a run over 30 gigs of the timeline requests (about 300,000 GitHub events in a bit over 11 days) is available in `sample-output.gz`.

$ gzcat sample-output.gz | awk '{print $3}' | sort | uniq -c | sort -nr
173260 Push
48186 Create
34782 Watch
12256 Gist
11756 Issues
9453 Follow
8731 Fork
8202 Gollum
6672 PullRequest
5132 Delete
4027 CommitComment
2527 Member
1666 Download
1053 ForkApply
288 Public
Binary file added sample-output.gz
Binary file not shown.

0 comments on commit 22c640c

Please sign in to comment.