Ruby script to parse CSV file(s) into a database.
Ruby
Switch branches/tags
Nothing to show
Clone or download

README.md

csv2sqlite

Ruby script to parse CSV file(s) into a sqlite database. Automatically generates the table schemas based on filenames, headers, and data types.

NOTE: For an alterative implementation in python, see https://github.com/onyxfish/csvkit

Example

Let's say you have the following data in baby-names-10.csv:

"year","name","percent","sex"
1880,"John",0.081541,"boy"
1880,"William",0.080511,"boy"
1880,"James",0.050057,"boy"
1880,"Charles",0.045167,"boy"
1880,"George",0.043292,"boy"
1880,"Frank",0.02738,"boy"
1880,"Joseph",0.022229,"boy"
1880,"Thomas",0.021401,"boy"
1880,"Henry",0.020641,"boy"

To create (or update) an sqlite3 database To parse the CSV data and create or update) an sqlite database in ./babynames.db:

ruby ~/csv2sqlite/csv2sqlite.rb baby-names-10.csv --output babynames.db

csv2sqlite will derive the table names from the file name, and will guess the type of each column, defaulting to String.

We can get the SQL dump of the new database by running sqlite3 babynames.db .dump:

PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE `baby_names_10` (`year` integer, `name` varchar(255), `percent` double precision, `sex` varchar(255));
INSERT INTO "baby_names_10" VALUES(1880,'John',0.081541,'boy');
INSERT INTO "baby_names_10" VALUES(1880,'William',0.080511,'boy');
INSERT INTO "baby_names_10" VALUES(1880,'James',0.050057,'boy');
INSERT INTO "baby_names_10" VALUES(1880,'Charles',0.045167,'boy');
INSERT INTO "baby_names_10" VALUES(1880,'George',0.043292,'boy');
INSERT INTO "baby_names_10" VALUES(1880,'Frank',0.02738,'boy');
INSERT INTO "baby_names_10" VALUES(1880,'Joseph',0.022229,'boy');
INSERT INTO "baby_names_10" VALUES(1880,'Thomas',0.021401,'boy');
INSERT INTO "baby_names_10" VALUES(1880,'Henry',0.020641,'boy');
COMMIT;

Now you can interact with the data via sqlite3:

$ sqlite3 babynames.db

sqlite3> .tables
baby_names_10

sqlite> .schema
CREATE TABLE `baby_names_10` (`year` integer, `name` varchar(255), `percent` double precision, `sex` varchar(255));

sqlite> SELECT * FROM baby_names_10 WHERE percent > .05;
1880|John|0.081541|boy
1880|William|0.080511|boy
1880|James|0.050057|boy

For more information about how to use the sqlite3 shell, see http://www.sqlite.org/sqlite.html

Even better, you can now interact with the data via the "sequel" gem's console:

sequel sqlite://babynames.db # launches IRB with sequel preloaded

irb> DB.tables
 => [:baby_names_10]

irb> DB[:baby_names_10].select(:name, :percent).limit(2).all
  [
  [0] {
    :name => "John",
      :percent => 0.081541
  },
    [1] {
      :name => "William",
      :percent => 0.080511
    }
  ]

irb> DB[:baby_names_10].select(:name, :year, :percent).where(Sequel.like(:name, 'J%')).sql
  "SELECT `name`, `year`, `percent` FROM `baby_names_10` WHERE (`name` LIKE 'J%' ESCAPE '\\')"

For more info, see the following Sequel documentation guide pages:

IRB Usage

The default outputting of arrays and objects in IRB is not pretty. Consider installing the awesome_print gem to improve matters:

gem install awesome_print

cat >> ~/.irbrc <<EOT 
require "awesome_print"
AwesomePrint.irb!
EOT

Usage

Usage:
  csv2sqlite [options] TABLENAME.csv [...]

  where [options] are:
       --irb-console, -i:   Open an IRB session after loading FILES into an
                            in-memory DB
    --sqlite-console, -s:   Execute 'sqlite3 FILENAME.db' afterwards
        --output, -o <s>:   FILENAME.db where to save the sqlite database
              --help, -h:   Show this message

Installation

Via Rubygems

TODO: not implemented yet

sudo gem install csv2sqlite

Via Bundler

I recommend installing bundler to manage gem dependencies:

gem install bundler

Then you can run the following to download dependencies specified csv2sqlite/Gemfile:

bundle install 

Installing RVM

I recommend installing RVM (Ruby Version Manager), which makes it easy to run multiple Ruby versions. To install RVM:

# download and execute the RVM install script
curl -L https://get.rvm.io | bash -s stable --ruby
# to have RVM work in the current terminal window:
source $HOME/.rvm/scripts/rvm 
# to have it work in all future terminal windows
echo "source $HOME/.rvm/scripts/rvm" >> ~/.bash_profile

Misc resources