New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV-290 - Fix the wrong assumptions in PostgreSQL formats #265
Conversation
CSVFormat.POSTGRESQL_CSV - special characters are not escaped. CSVFormat.POSTGRESQL_TEXT - values are not quoted.
Codecov Report
@@ Coverage Diff @@
## master #265 +/- ##
=========================================
Coverage 97.34% 97.34%
Complexity 535 535
=========================================
Files 11 11
Lines 1169 1169
Branches 205 205
=========================================
Hits 1138 1138
Misses 18 18
Partials 13 13
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Hi @angusdev |
I can't test previous versions of PostgreSQL. But it is pretty safe to say that it applies to all versions. Postgresql support export to CVS since version 8.0 (year 2005, https://www.postgresql.org/docs/8.0/release-8-0.html) For text format (tab delimited), there is no reason to quote the text. |
Tested the behaviour of import and export are consistent. Test case: export csv/tsv from PostgreSQL, read by commons-cvs and write to new csv/tsv, import to PostgreSQL, export csv/tsv again. compare the 1st and 2nd export file drop table COMMONS_CSV_PSQL_TEST;
create table COMMONS_CSV_PSQL_TEST (ID INTEGER, COL1 VARCHAR, COL2 VARCHAR, COL3 VARCHAR, COL4 VARCHAR);
insert into COMMONS_CSV_PSQL_TEST select 1, 'abc', 'test line 1' || chr(10) || 'test line 2', null, '';
insert into COMMONS_CSV_PSQL_TEST select 2, 'xyz', '\b:' || chr(8) || ' \n:' || chr(10) || ' \r:' || chr(13), 'a', 'b';
insert into COMMONS_CSV_PSQL_TEST values (3, 'a', 'b,c,d', '"quoted"', 'e');
copy COMMONS_CSV_PSQL_TEST to '/tmp/psql.csv' with (FORMAT CSV);
copy COMMONS_CSV_PSQL_TEST to '/tmp/psql.tsv'; use commons-csv to read '/tmp/psql.csv' and write to '/tmp/outpsql.csv', same for 'psql.tsv' truncate table COMMONS_CSV_PSQL_TEST;
copy COMMONS_CSV_PSQL_TEST(ID, COL1, COL2, COL3, COL4) from '/tmp/outpsql.csv' with (FORMAT CSV);
copy COMMONS_CSV_PSQL_TEST to '/tmp/psql2.csv' with (FORMAT CSV);
truncate table COMMONS_CSV_PSQL_TEST;
copy COMMONS_CSV_PSQL_TEST(ID, COL1, COL2, COL3, COL4) from '/tmp/outpsql.tsv';
copy COMMONS_CSV_PSQL_TEST to '/tmp/psql2.tsv'; diff /tmp/psql.csv /tmp/psql2.csv diff /tmp/psql.tsv /tmp/psql2.tsv |
Hello @angusdev Thank you for updating your PR. (1) I think you need to test for tab characters (ASCII 9) in values. (2) In the PG docs I read
Please help me understand why the git master code does not match this definition. |
Added test for tab characters (ASCII 9) in values. For QUOTE and ESCAPE, see below example
In PG (CSV), ESCAPE is used to escape the quote char, while in COMMONS_CSV, ESCAPE is to escape delimiter and special char In PG (TEXT), QUOTE is not needed as it is tab-delimited and the delimiter (tab) is escaped by '\t' |
I tested in psql 14.5 Homebrew in Mac M1.
CSVFormat.POSTGRESQL_CSV - special characters are not escaped.
CSVFormat.POSTGRESQL_TEXT - values are not quoted.