This is a Python script that converts a “properly structured” CSV file to a Postgres database table.
The CSV file has to have a heading row in which each column name turns into a valid db column name when lowercased and with spaces replaced by underscores. Most database column types will be text, but certain column names will trigger the use of the date or integer data types.
csv2db.py -f CSVfile123.csv
- Create a table named csvfile in the databse with the same name as the user’s username.
csv2db.py -p id region -t regions -db accounts < csvfile.csv
- Create a table named regions in the accounts database; the primary key is (id, region)
- The
--table_name
(-t
) option gives the name of the table to create. The table will be replaced if it already exists. If the table name is not specified, it will be the stem part of the CSV file name, lowercased and with digits and hyphens stripped. If input is from stdin and no table name is specified, the table name is test. - Column names from the header row are used to set the types of the database columns: names ending in
_date
will be of type date, names ending in_id
,_num_
, and_nbr
will be of type integer. All other columns will be of type text.- The command line options
-date
,-id
,-num
, and-nbr
suppress the use of date/integer types for the corresponding column name patterns.
- The command line options
- The
--modulus
(-m
)n
option, with a value greater than 0 for n, turns on progress reporting: display the number of lines processed every n lines (i.e., when the line count is zero modulo n). - The
--primary_key
(-p
) option gives a list of column names to use as the primary key for the resulting table. If omitted, the table is created with no primary key. - The
--columns
(-c
) option displays the column names and types for the generated table, and exits. Intended as a helper for setting up and checking the primary key settings. - There is a
--debug
(-d
) option, but it doesn’t do anything.
The program is intended as a development utility for loading spreadsheets into database tables so that standard sql queries can be used to explore the information in the sheet quickly rather than traditional spreadsheet pivot tables and lookup functions. Thus, rather than try to make the code robust in the face of anomalies, database errors are not handled, and the user is presumed equipped to make the necessary adjustments to the command line options or even the CSV file.