Skip to content

coffeesam/postgres_upsert

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

postgres_upsert

Allows your rails app to load data in a very fast way, avoiding calls to ActiveRecord.

Using the PG gem and postgres's powerful COPY command, you can create thousands of rails objects in your db in a single query.

Install

Put it in your Gemfile

gem 'postgres_upsert'

Run the bundle command

bundle

Usage

The gem will add the aditiontal class method to ActiveRecord::Base

  • pg_upsert io_object_or_file_path, [options]

io_object_or_file_path => is a file path or an io object (StringIO, FileIO, etc.)

options: :delimiter - the string to use to delimit fields. Default is "," :format - the format of the file (valid formats are :csv or :binary). Default is :csv :header => specifies if the file/io source contains a header row. Either :header option must be true, or :columns list must be passed. Default true :key_column => the primary key or unique key column on your ActiveRecord table, used to distinguish new records from existing records. Default is the primary_key of your ActiveRecord model class. :update_only => when true, postgres_upsert will ONLY update existing records, and not insert new. Default is false.

pg_upsert will allow you to copy data from an arbritary IO object or from a file in the database server (when you pass the path as string). Let's first copy from a file in the database server, assuming again that we have a users table and that we are in the Rails console:

User.pg_upsert "/tmp/users.csv"

This command will use the headers in the CSV file as fields of the target table, so beware to always have a header in the files you want to import. If the column names in the CSV header do not match the field names of the target table, you can pass a map in the options parameter.

User.pg_upsert "/tmp/users.csv", :map => {'name' => 'first_name'}

In the above example the header name in the CSV file will be mapped to the field called first_name in the users table.

To copy a binary formatted data file or IO object you can specify the format as binary

User.pg_upsert "/tmp/users.dat", :format => :binary, :columns => ["id, "name"]

Which will generate the following SQL command:

COPY users (id, name) FROM '/tmp/users.dat' WITH BINARY

NOTE: binary files do not include header columns, so passing a :columns array is required for binary files.

pg_upsert supports 'upsert' or 'merge' operations. In other words, the data source can contain both new and existing objects, and pg_upsert will handle either case. Since the Postgres native COPY command does not handle updating existing records, pg_upsert accomplishes update and insert using an intermediary temp table:

This merge/upsert happend in 5 steps (assume your data table is called "users")

  • create a temp table named users_temp_### where "###" is a random number. In postgres temp tables are only visible to the current database session, so naming conflicts should not be a problem.
  • COPY the data to user_temp
  • issue a query to insert all new records from users_temp_### into users (newness is determined by the presence of the primary key in the users table)
  • issue a query to update all records in users with the data in users_temp_### (matching on primary key)
  • drop the temp table.

overriding the key_column

By default pg_upsert uses the primary key on your ActiveRecord table to determine if each record should be inserted or updated. You can override the column using the :key_field option:

User.pg_upsert "/tmp/users.dat", :format => :binary, :key_column => ["external_twitter_id"]

obviously, the field you pass must be a unique key in your database (this is not enforced at the moment, but will be)

passing :update_only = true will ensure that no new records are created, but records will be updated.

Note on Patches/Pull Requests

  • Fork the project
  • add your feature/fix to your fork(rpsec tests pleaze)
  • submit a PR
  • If you find an issue but can't fix in in a PR, please log an issue. I'll do my best.

About

Uses Postgres's powerful COPY command to upsert or merge large sets of data into ActiveRecord tables.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 100.0%