A Git-backed key-value store, for tracking changes to documents and other files over time.
Let's say you're building a scraper to see when Members of Congress post press releases to their websites and to track how those press releases change over time. You could build a purpose-built application, store each revision in a database, and then build an interface to view all the known press releases and compare their history. Stop the insanity!
But wait. What if I told you you could just commit each press release to Git, and let Git/GitHub do the heavy lifting. Without writing a single line of HTML you can browse all the press releases, see when they were changed, diff exactly how they were changed, and you've got hosting baked in.
Having built the first app more times then I'd like to admit, I thought I'd make a Gem to facilitate building lightweight apps that use Git to track changes to scraped documents (or whatever you want, really).
Want to see it in action? Check out this lightweight demo, scrapping the White House RSS feed.
ChangeAgent writes values directly to the Git database based on a given key (path), and immediately triggers a commit, providing you with both a snapshot and a timestamp for every change.
change_agent = ChangeAgent.init "path/to/repo"
=> #<ChangeAgent::Client repo="path/to/repo">
change_agent.set "foo", "bar"
=> #<ChangeAgent::Document key="foo">
change_agent.get "foo"
=> "bar"
Keys (files) are intended to be namespaced when logically grouped. In the above example, if you were storing congressional press releases, you might store Rep. Balter's Nov 26th press release on puppies as balter/2014/11/26/puppies.html
, or just balter/2014-11-26-puppies.txt
or even just balter/puppies
.
change_agent.set "foo/bar", "baz"
=> #<ChangeAgent::Document key="foo">
change_agent.get "foo/bar"
=> "baz"
It's really up to you, but you'll get performance and usability bumps the more you namespace. I'd recommend thinking about what you want the Git repo to look like when browse, and work backwards from there.
repo = "https://github.com/benbalter/change_agent_demo"
directory = "data"
change_agent = ChangeAgent::Client.new(directory, repo)
change_agent.get("foo")
=> "bar"
Ready to push your Git repo to a server? It's as simple as:
# add a remote (if there's not already one from the clone)
change_agent.add_remote "origin", "https://github.com/benbalter/change_agent_demo"
# pull in the latest data
change_agent.pull
# push the data
change_agent.push
# do both
change_agent.sync
By default, Change Agent supports token-bassed authentication. Simply pass an OAuth token via the GITHUB_TOKEN
environmental variable and ensure all remotes use the https
protocol. Change Agent will take care of the rest. You'll likely want to use a bot account for this.
Rugged supports additional authentication strategies (such as ssh key). For more information see Rugged. Here's an example of how you might implement an alternative authentication mechanism:
change_agent = ChangeAgent::Client.new "data", "https://github.com/benbalter/change_agent_demo"
creds = Rugged::Credentials::UserPassword.new :username => "benbalter", :password => "passw0rd"
change_agent.credentials = creds
Initial proof of concept
Add this line to your application's Gemfile:
gem 'change_agent'
And then execute:
$ bundle
Or install it yourself as:
$ gem install change_agent
- Fork it ( https://github.com/[my-github-username]/change_agent/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request