In some cases (such as when using a very old PostgreSQL instance or an Amazon Redshift service, as in #255), the function pg_get_keywords() does not exists but we assume that pgloader might still be able to complete its job. We're better off with a static list of keywords than with a unhandled error here, so let's see what happens next with Redshift.
Related to #249, stop reporting 0 errors on sources where we failed to handle some data transformation.
The problem in #249 is that SQLite is happy processing floats in an integer field, so pgloader needs to be instructing via the CAST mechanism to cast to float at migration time. But then the transformation function would choke on integers, because of its optimisation "declare" statement. Of course the integer representation expected by PostgreSQL is float-compatible, so just instruct the function that integers are welcome to the party.
Some CSV files are using the CSV escape character internally in their fields. In that case we enter a parsing bug in cl-csv where backtracking from parsing the escape string isn't possible (or at least unimplemented). To handle the case, change the quote parameter from \" to just \ and let cl-csv use its escape-quote mechanism to decide if we're escaping only separators or just any data. See AccelerationNet/cl-csv#17 where the escape mode feature was introduced for pgloader issue #80 already.
The error handling was good enough to continue parsing the CSV data after a recoverable parser error, but not good enough to actually report its misfortunes to the user. See #250 for a report where this is misleading.
As per PostgreSQL documentation on connection strings, allow overriding of main URI components in the options parts, with a percent-encoded syntax for parameters. It allows to bypass the main URI parser limitations as seen in #199 (how to have a password start with a colon?). See: http://www.postgresql.org/docs/9.3/interactive/libpq-connect.html#LIBPQ-CONNSTRING
To allow for importing JSON one-liners as-is in the database it can be interesting to leverage the CSV parser in a compatible setup. That setup requires being able to use any separator character as the escape character.
Some CSV files are given with an header line containing the list of their column names, use that when given the option "csv header". Note that when both "skip header" and "csv header" options are used, pgloader first skip as many required lines and then uses the next one as the csv header. Because of temporary failure to install the `ronn` documentation tool, this patch only commits the changes to the source docs and omits to update the man page (pgloader.1). A following patch is intended to be pushed that fixed that. See #236 which is using shell tricks to retrieve the field list from the CSV file itself and motivated this patch to finally get written.
It turns out that SQLite3 data type handling is back to kick us wherever it hurts, this time by the driver deciding to return blob data (a vector of unsigned bytes) when we expect properly encoded text data. In the wikipedia data test case used to reproduce the bug, we're lucky enough that the byte vectors actually map to properly encoded strings. Of course doing the proper thing costs some performances. I'd like to be able to decide if I should blame the SQLite driver or the whole product on this one. The per-value data type handling still is a disaster in my book, tho, which means it's crucially important for pgloader to get it right and allow users to seemlessly migrate away from using such a system.
pgloader used to have a single database name parsing rule that is supposed to be compliant with PostgreSQL identifier rules. Of course it turns out that MySQL naming rules are different, so adjust the parser so that the following connection string is accepted: mysql://root@localhost/3scale_system_development
MS SQL default values can be quite... sophisticated, so get around with using a more complex expression in the SQL query that retrieve the default values. The query and implementation has been largely provided by luqelinux and jstans github users, and I finally merged manually their cumulated efforts on this front.
When given a file in the COPY format, we should expect that its content is already properly escaped as expected by PostgreSQL. Rather than unescape the data then escape it again, add a new more of operation to format-vector-row in which it won't even try to reformat the data. In passing, fix an off-by-one bug in dealing with non-ascii characters.