+------------------+ | __ | _|_|_|_| _| _|_|_| | uh oh >(' ) | _| _|_| _|_| _| _| _|_| _|_|_| _|_|_| | )/ | _|_|_| _| _| _| _|_|_|_| _| _|_| _|_| _| _| _|_| | /( | _| _| _| _| _| _| _| _| _| _| _|_| | / `----/ | _| _|_| _| _|_|_| _|_|_| _| _|_|_| _|_|_| | \ ~=- / | | ~^~^~^~^~^~^~^~^ | Rapid loading of CSVs for Postgres +----------------- +
In MySQL you can do this:
mysql> LOAD DATA INFILE '/tmp/data.csv' -> REPLACE INTO TABLE 'my_data' -> FIELDS (isbn, price, stock);
to load CSV data from
/tmp/data.csv into the
my_data table, replacing rows
that match on a unique constraint. Sadly, the Postgres's
COPY command does
not support this option.
This package provides a work-around to allow using CSVs to update tables in Postgres.
$ pip install foiegras $ python
>>> import psycopg2, foiegras >>> conn = psycopg2.connect("dbname=mydb") >>> goose = foiegras.Goose(conn) >>> goose.load('my_data', '/tmp/data.csv', ('isbn', 'price', 'stock'))
Yay! Don't forget to close the connection:
Interesting... tell me more
The Postgres equivalent of
LOAD DATA INFILE
COPY ... FROM ...
command, however it does not support replacing rows that match on a unique
constraint. Using CSVs to load data into a table is the fastest way of
updating a table - hence it's frustrating that Postgres doesn't
REPLACE option that MySQL does.
This package works around this issue by loading the CSV to a temporary table then
updating the master table using two
UPDATE commmands that insert new records and update
matching records respectively. All very simple.
Does this work with Django?
>>> from django.db import connection >>> connection.enter_transaction_management() >>> goose = foisgras.Goose(connection) >>> goose.load(Record._meta.db_table, '/path/to/data.csv', ('isbn', 'price', 'stock'))
Surely this has been done already?
Apparently not. There are some similar CSV loading libraries for Django
django-csvimport- This is a Django app for uploading CSV files to populate model tables. It uses the ORM directly to save new instances and so does not perform well when loading larges volumes of data.
pgloader- This looks like it might be quite good but it's hard to tell as the docs are so bad. It's also not on PyPI.
Very simple at the moment. All you have to do is instantiate a
Goose object passing a
>>> conn = psycopg2.connect("dbname=mydb") >>> goose = foiegras.Goose(conn)
Then the only method you need to know is
load which has signature:
def load(self, table_name, filepath, fields, delimiter=",", replace_duplicates=True, has_header=False): ...
replace_duplicates == False, then rows from the CSV file that match on a
unique constraint will be ignored.
Zero to tests passing in 6 presses of
$ git clone email@example.com:codeinthehole/foiegras.git $ cd foiegras $ mkvirtualenv foiegras $ python setup.py develop $ pip install -r requirements.txt $ nosetests