Skip to content

codeinthehole/foiegras

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

+------------------+
|          __      |    _|_|_|_|          _|                  _|_|_|                                 
| uh oh >(' )      |    _|        _|_|          _|_|        _|        _|  _|_|    _|_|_|    _|_|_|   
|         )/       |    _|_|_|  _|    _|  _|  _|_|_|_|      _|  _|_|  _|_|      _|    _|  _|_|       
|        /(        |    _|      _|    _|  _|  _|            _|    _|  _|        _|    _|      _|_|   
|       /  `----/  |    _|        _|_|    _|    _|_|_|        _|_|_|  _|          _|_|_|  _|_|_|     
|       \  ~=- /   |
| ~^~^~^~^~^~^~^~^ |    Rapid loading of CSVs for Postgres
+----------------- +

In MySQL you can do this:

    mysql> LOAD DATA INFILE '/tmp/data.csv'
        -> REPLACE INTO TABLE 'my_data'
        -> FIELDS (isbn, price, stock);

to load CSV data from /tmp/data.csv into the my_data table, replacing rows that match on a unique constraint. Sadly, the Postgres's COPY command does not support this option.

This package provides a work-around to allow using CSVs to update tables in Postgres.

    $ pip install foiegras 
    $ python
    >>> import psycopg2, foiegras
    >>> conn = psycopg2.connect("dbname=mydb")
    >>> goose = foiegras.Goose(conn)
    >>> goose.load('my_data', '/tmp/data.csv', ('isbn', 'price', 'stock'))

Yay! Don't forget to close the connection:

	>>> conn.close()

Interesting... tell me more

The Postgres equivalent of LOAD DATA INFILE is the COPY ... FROM ... command, however it does not support replacing rows that match on a unique constraint. Using CSVs to load data into a table is the fastest way of updating a table - hence it's frustrating that Postgres doesn't support the REPLACE option that MySQL does.

This package works around this issue by loading the CSV to a temporary table then updating the master table using two UPDATE commmands that insert new records and update matching records respectively. All very simple.

Further reading:

Does this work with Django?

Yup.

	>>> from django.db import connection
	>>> connection.enter_transaction_management()
	>>> goose = foisgras.Goose(connection)
	>>> goose.load(Record._meta.db_table, '/path/to/data.csv', ('isbn', 'price', 'stock'))

Surely this has been done already?

Apparently not. There are some similar CSV loading libraries for Django

  • django-csvimport - This is a Django app for uploading CSV files to populate model tables. It uses the ORM directly to save new instances and so does not perform well when loading larges volumes of data.

  • pgloader - This looks like it might be quite good but it's hard to tell as the docs are so bad. It's also not on PyPI.

API

Very simple at the moment. All you have to do is instantiate a Goose object passing a psycopg2 connection:

	>>> conn = psycopg2.connect("dbname=mydb")
	>>> goose = foiegras.Goose(conn)

Then the only method you need to know is load which has signature:

	def load(self, table_name, filepath, fields, delimiter=",", 
	         replace_duplicates=True, has_header=False):
	    ...

If replace_duplicates == False, then rows from the CSV file that match on a unique constraint will be ignored.

## Contribute

Zero to tests passing in 6 presses of :

	$ git clone git@github.com:codeinthehole/foiegras.git
	$ cd foiegras
	$ mkvirtualenv foiegras
	$ python setup.py develop
	$ pip install -r requirements.txt
	$ nosetests

About

The missing CSV loading library for Postgres

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages