BuzzData Ruby Client Library
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

BuzzData Ruby Client Library

The BuzzData Ruby Client Library is a simple wrapper for the BuzzData HTTP API that allows you to easily interact with datasets on BuzzData.

Please Note: There are a couple of current issues that we are aware of, and are being fixed at the moment.


# Setup the API
require "rubygems"
require "buzzdata"
buzzdata ='YOUR_API_KEY')

# Create a Dataset
dataset = buzzdata.create_dataset(username: 'eviltrout',
                                  name: "My Awesome Dataset!",
                                  public: false,
                                  readme: "This is my awesome dataset",
                                  license: 'cc0',
                                  topics: ['testing-buzzdata'])

# Upload some data
buzzdata.upload(dataset['id'],'datasets/celebrities.csv'), 'My first dataset!')


If you already have Ruby and RubyGems installed then installation is as simple as running the following command in a terminal:

>> gem install buzzdata

Mac OS X and most Unix/Linux distributions come with an installation of Ruby and RubyGems. If you do not have Ruby and RubyGems installed please check the Ruby website for instructions.

API Documentation

Getting Started

Create an instance of the Buzzdata client:

>> buzzdata ='YOUR_API_KEY')

To make it even simpler, if you create a file, config/buzzdata.yml with your api_key in it, you can omit the key parameter:

>> buzzdata =

Downloading Data

To download data from a dataset, just do this:

>> buzzdata.download_data 'eviltrout/b-list-celebrities'

Dataset Information

Using dataset_overview you can get an overview of a Dataset's information. It returns a hash of attribute names and their values. See the API Documentation below for a list of the returned attributes.

>> ds = buzzdata.dataset_overview 'eviltrout/b-list-celebrities'
>> puts ds['name']  # outputs B-List Celebrities

Listing Datasets

You can view a user's datasets by calling datasets_list. You'll get back an array with information on their datasets.

>> datasets = buzzdata.datasets_list 'eviltrout'
>> datasets.each {|ds| puts ds['id'] }

Dataset History

You can retrieve a list of uploaded versions of a dataset by calling dataset_history:

>> buzzdata.dataset_history.each {|v| puts "version #{v['version']}!" }

Creating a Dataset

You can use the create_dataset method to create a new dataset. All fields are required:

>> ds = buzzdata.create_dataset(username: 'eviltrout',
                                name: "My Awesome Dataset!",
                                public: false,
                                readme: "This is my awesome dataset",
                                license: 'cc0',
                                topics: ['testing-buzzdata'])  

>> puts ds['id']     # outputs eviltrout/my-awesome-dataset

Uploading Data

If your account has the ability to upload data to a dataset, you can do so like this:

>> buzzdata.upload('eviltrout/b-list-celebrities','datasets/celebrities.csv'), 'Release notes...')

Publish a dataset

>> buzzdata.publish_dataset('eviltrout/b-list-celebrities')

Clone another user's dataset

>> buzzdata.clone_dataset('pete/pete-forde-s-genome')

Delete a dataset

>> buzzdata.delete_dataset('eviltrout/tasteless-dataset')    

Get a user's information

>> user = buzzdata.user_info('eviltrout')
>> puts user['name']   # Robin Ward

Search BuzzData

>>"pets").each do |r|
     puts r['label']    # Outputs each search result label

Get a list of usable Licenses

>> buzzdata.licenses

Get a list of usable Topics

>> buzzdata.topics

Updating a Dataset

In some cases, it makes more sense to update your data on a row by row basis rather than importing a whole dataset at once. For example, perhaps you made a GPS application and once an hour want to update it with your location. You could use the row level API to insert only the new locations into the dataset.

Staging your Updates

Before you get started with the Row-Level API you'll have to familiarize yourself with the concept of staging your update. Every time you upload a dataset to buzzdata, it creates a new version in the history. If you have an application that updates the dataset frequently, you would end up with many, many versions that would pollute your dataset's history. To get around this, we ask that you stage your updates before commiting them.

You can think of staging as setting aside the data that you want to commit later. You start by creating a stage with a REST call. Then you add your updates to the stage whenever you want. Once you are ready to create a new version and update the dataset, you call commit.

In the GPS example above, a good idea might be to stage a day's worth of updates, and then commit them at the end of the day. It's up to you to decide when it makes sense for your dataset to be released as a new version!

Create a Stage

>> stage_id = buzzdata.create_stage('eviltrout/pets')

Stage an insertion

>> buzzdata.create_stage('eviltrout/pets', stage_id, ['col1', 'col2', 'col3'])

Stage an update

# where 1 is the row number we're updating.
>> buzzdata.create_stage('eviltrout/pets', stage_id, 1, ['col1', 'col2', 'col3'])

Stage a delete

# in this example, 2 is the row number we're removing
>> buzzdata.delete_row('eviltrout/pets', stage_id, 2)

Commit the Stage

# Create a new version of your dataset with the updates.
>> buzzdata.commit_stage('eviltrout/pets', stage_id)

Rollback the Stage

# In case you decide you don't want to create a new version
>> buzzdata.rollback_stage('eviltrout/pets', stage_id)

Admin Level Functions

The following functions can only be used if your API key has been granted admin access:

Create a user

>> buzzdata.create_user(username: 'eviltrout_jr', email: '', password: 'aSECUREp4ssword')    

Copyright © 2011 BuzzData, released under the MIT license