Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
A better CSV library
Python
Branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
cordwainer
tests
.coveragerc_py27
.coveragerc_py33
.coveragerc_py34
.gitignore
LICENSE.txt
MANIFEST.in
README.rst
setup.py
tox.ini

README.rst

Cordwainer

A better CSV library

Features

  • Lets you program in both Python 2 and Python 3 as if you had the Python 3 CSV module.
  • Under Python 2, provides a Python 3 compatible csv module.
  • Under Python 3, passes through transparently.
  • Lets your CSV files be encoded any way you want.

Python 3 compatibility

import cordwainer.csv as csv ought to be equivalent to importing the Python 3 csv module, whether running with Python 2 or 3. See the Python 3 csv module documentation

CSV file encoding

The Python 2 csv module expects file handles passed to it to return data encoded in ASCII or UTF-8, and writes it to files the same way.

The Python 3 csv module expects handles passed to it to return text data, already decoded, and writes un-encoded text data to them. It's your responsiblity to arrange for conversion when you open the file, or pipe your stream through some kind of conversion.

Using Cordwainer, you can just pass an extra encoding parameter to say that your stream is providing or expecting binary data with the specified encoding, and Cordwainer will take care of all necessary conversions.

If encoding is omitted or None, Cordwainer assumes the provided stream will provide or expect un-encoded text data, just like Python 3's csv module.

Suppose you need to read a .CSV file that was written using cp720 encoding. In Python 2, you would have to arrange to read it in, decode the data to characters, then encode it again to utf-8 before you could pass it to the csv module. To write out an updated file, you have to do all that in reverse.

In Python 3, you still have to arrange to read it in and decode it before passing to csv, and encode the output.

With Cordwainer, just pass in the encoding:

import cordwainer.csv as csv

f = open("cp720file.csv", "rb")
reader = csv.reader(f, encoding="cp720")

f = open("newcp270file.csv", "wb")
writer = csv.writer(f, encoding="cp720")

Misc. Usage Notes

The encoding parameter is only applied for reading from and writing to streams.

String arguments should always be passed to the API as characters, and results are always characters.

E.g.:

  • Pass fieldnames to DictWriter as characters
  • Pass data in rows to writerow() as characters
  • next() returns rows in characters

Intended (eventually) features

  • Optional header row
  • Validate expected fields, types (probably specify a Django form to do the validation)
  • Verbose error handling - say what the problem was on what line, for every line that has an error
  • Optionally stop processing after N errors
  • Optionally import the lines that are valid while skipping invalid ones
  • Optionally do the whole thing in one transaction
  • Optionally ignore any extra columns
  • Optionally save uploaded file and then process it in a background task (to not delay the HTTP request)
  • For Excel, be flexible in deciding what sheet to import - or even import from multiple sheets from one upload
Something went wrong with that request. Please try again.