public
Description: Git/Cassandra Ruby Library
Homepage:
Clone URL: git://github.com/schacon/agitmemnon.git
name age message
file .document Thu Jul 30 14:45:35 -0700 2009 jewlered agitmemnon [schacon]
file .gitignore Thu Jul 30 14:45:35 -0700 2009 jewlered agitmemnon [schacon]
file DESIGN.txt Wed Sep 02 13:28:47 -0700 2009 new web ui [schacon]
file LICENSE Thu Jul 30 14:45:35 -0700 2009 jewlered agitmemnon [schacon]
file README.rdoc Wed Aug 19 13:28:30 -0700 2009 checkpoint [schacon]
file Rakefile Thu Jul 30 14:45:35 -0700 2009 jewlered agitmemnon [schacon]
file TODO.txt Fri Sep 04 10:50:32 -0700 2009 blob and directory viewing working [Scott Chacon]
file VERSION.yml Thu Jul 30 14:45:35 -0700 2009 jewlered agitmemnon [schacon]
file example.conf.xml Tue Sep 08 15:50:09 -0700 2009 fix gravatar fallback, update for example conf [schacon]
directory lib/ Mon Sep 07 11:51:05 -0700 2009 Merge remote branch 'github/master' Conflicts:... [schacon]
file post-receive.rb Thu Jul 30 14:28:53 -0700 2009 initial design [schacon]
directory scripts/ Mon Sep 07 11:50:18 -0700 2009 absolute paths [git]
directory server/ Tue Sep 08 15:50:09 -0700 2009 fix gravatar fallback, update for example conf [schacon]
directory spec/ Fri Jul 31 15:22:40 -0700 2009 initially working version with web server [schacon]
README.rdoc

Agitmemnon

Agitmemnon is a Ruby library for filling and interacting with a Cassandra cluster with a specific keyspace for storing Git repository data. This library can take a Git repo on disk and fill the Cassandra keyspace with all the data it needs, then can be used to get Git data back out of it.

Currently the Cassandra keyspace looks something like this:

    <Keyspaces>
        <Keyspace Name="Agitmemnon">
            <ColumnFamily CompareWith="UTF8Type" Name="Objects"/>
            <ColumnFamily ColumnType="Super" CompareWith="UTF8Type" CompareSubcolumnsWith="UTF8Type" Name="Repositories"/>
            <ColumnFamily CompareWith="UTF8Type" Name="CommitDiffs"/>
            <ColumnFamily ColumnType="Super" CompareWith="UTF8Type" CompareSubcolumnsWith="UTF8Type" Name="RevTree"/>
            <ColumnFamily CompareWith="UTF8Type" Name="PackCache"/>
            <ColumnFamily CompareWith="UTF8Type" Name="PackCacheIndex"/>
        </Keyspace>
    </Keyspaces>

The Repository column family keeps one row per repository, which holds all the current references and the last updated (push) time. The Objects column family holds one row per Git object, with the SHA-1 of the object as the key. This allows one global space for all the repositories objects. The CommitDiffs family is used to keep the diff of each commit for fast retrieval. Finally, the RevTree family is keeping the commit revlist and difftree for each repository (though I’m not sure if I want to do it this way yet - if so, I’ll have to shard the rows for larger repos).

The Keyspace Schema is thusly:

        Repositories (__main_listing__) {repo_handle => {'updated' => Time.now.to_i.to_s}}
        Repositories (projectname) {:heads, :tags, :remotes}
        Object (sha) {:type, :size, :data, :json (for commits/trees)}
        RevTree (projectname) {:sha => {:parents, :objects}}
        PackCacheIndex (projectname) {(cache_key) => (list of objects), ...}
        PackCache (cache_key) {:size => (size), :count => (count), :data => (list of objects)}
        # CommitDiffs (sha) {'diff' => diff}

I’ve also included a small Sinatra app that uses Agitmemnon to run a GitWeb style web UI, but it’s just for testing stuff out and isn’t very fully featured yet.

Right now this works pretty well for smaller repositories. To scale up, we need to be able to fragment large blobs into multiple smaller Object entries and then pull them together on clones. I also want to add a column family for partial packfile caches, so I can precalculate groups of objects and pull hundreds of small commit/tree objects out of the db at once on clones.

For cloning, I’ve also created a git-daemon (by modifying the dulwich project) that will actually run clones out of Agitmemnon. You can find this project here:

github.com/schacon/agitmemnon-server

Soon, this project will help in scaling Git to the MOON!

Copyright

Copyright © 2009 Scott Chacon. See LICENSE for details.