The index and constraint names given by PostgreSQL catalogs should not be second guessed, we need to just quote them. The identifier down casing is interesting when we get identifiers from other system for a migration, but are wrong when dropping existing indexes in PostgreSQL. Also, the :null special value from Postmodern was routing the code badly, just transform it manually to nil when fetching the index list, manually.
The previous coding decided to add 2000 to the year as an integer if it was below 2000, which parses 1999 as 3999. Oops. Trigger the correction only when the date is given on 2 digits only, parsing 04 as 2004. Dates given on 3 digits are kept as-is. Playing with the *century* special parameter allows to cancel this behavior, that maybe should be made entirely optional. It's just too common to find current years on 2 digits only, sadly.
The freetds protocol apparently sends unsigned versions of the values on the wire, so that we have to convert them to signed numbers upon reception.
The new option 'drop indexes' reuses the existing code to build all the indexes in parallel but failed to properly account for that fact in the summary report with timings. While fixing this, also fix the SQL used to re-establish the indexes and associated constraints to allow for parallel execution, the ALTER TABLE statements would block in ACCESS EXCLUSIVE MODE otherwise and make our efforts vain.
PostgreSQL rightfully forbifs DROP INDEX when the index is used to enforce a constraint, the proper SQL to issue is then ALTER TABLE DROP CONSTRAINT. Also, in such a case pg_dump issues a single ALTER TABLE ADD CONSTRAINT statement to restore the situation. Have pgloader do the same with indexes that are used to back a constraint.
In MySQL it's possible to have a bigint of 20 digits when using the "unsigned" variant of the data type, whereas in PostgreSQL there's no such variant and bigints are "limited" to the range -9223372036854775808 to +9223372036854775807 (19 digits numbers). Fix the default casting rule to switch to PostgreSQL numeric in such cases.
When loading against a table that already has index definitions, the load can be quite slow. Previous commit introduced a warning in such a case. This commit introduces the option "drop indexes" that is not used by default. When this option is used, pgloader drops the indexes before loading the data then create the indexes again with the same definitions as before. All the indexes are created again in parallel to optimize performances. Only primary key indexes can't be created in parallel, so those are created in two steps (create unique index then alter table).
Pre-existing indexes will reduce data loading performances and it's generally better to DROP the index prior to the load and CREATE them again once the load is done. See #251 for an example of that. In that patch we just add a WARNING against the situation, the next patch will also add support for a new WITH clause option allowing to have pgloader take care of the DROP/CREATE dance around the data loading.
In some cases (such as when using a very old PostgreSQL instance or an Amazon Redshift service, as in #255), the function pg_get_keywords() does not exists but we assume that pgloader might still be able to complete its job. We're better off with a static list of keywords than with a unhandled error here, so let's see what happens next with Redshift.
Related to #249, stop reporting 0 errors on sources where we failed to handle some data transformation.
The problem in #249 is that SQLite is happy processing floats in an integer field, so pgloader needs to be instructing via the CAST mechanism to cast to float at migration time. But then the transformation function would choke on integers, because of its optimisation "declare" statement. Of course the integer representation expected by PostgreSQL is float-compatible, so just instruct the function that integers are welcome to the party.
Some CSV files are using the CSV escape character internally in their fields. In that case we enter a parsing bug in cl-csv where backtracking from parsing the escape string isn't possible (or at least unimplemented). To handle the case, change the quote parameter from \" to just \ and let cl-csv use its escape-quote mechanism to decide if we're escaping only separators or just any data. See AccelerationNet/cl-csv#17 where the escape mode feature was introduced for pgloader issue #80 already.
The error handling was good enough to continue parsing the CSV data after a recoverable parser error, but not good enough to actually report its misfortunes to the user. See #250 for a report where this is misleading.
As per PostgreSQL documentation on connection strings, allow overriding of main URI components in the options parts, with a percent-encoded syntax for parameters. It allows to bypass the main URI parser limitations as seen in #199 (how to have a password start with a colon?). See: http://www.postgresql.org/docs/9.3/interactive/libpq-connect.html#LIBPQ-CONNSTRING
To allow for importing JSON one-liners as-is in the database it can be interesting to leverage the CSV parser in a compatible setup. That setup requires being able to use any separator character as the escape character.
Some CSV files are given with an header line containing the list of their column names, use that when given the option "csv header". Note that when both "skip header" and "csv header" options are used, pgloader first skip as many required lines and then uses the next one as the csv header. Because of temporary failure to install the `ronn` documentation tool, this patch only commits the changes to the source docs and omits to update the man page (pgloader.1). A following patch is intended to be pushed that fixed that. See #236 which is using shell tricks to retrieve the field list from the CSV file itself and motivated this patch to finally get written.