Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The pgloader is not responding (with a lot of errors after Ctrl+C). #705

Closed
hyperion-cs opened this issue Dec 26, 2017 · 11 comments
Closed

Comments

@hyperion-cs
Copy link

hyperion-cs commented Dec 26, 2017

Hello, Dimitri!
Once I started importing a large CSV file. In the process, the program showed such errors:

debugger invoked on a SIMPLE-TYPE-ERROR in thread
#<THREAD "lparallel" RUNNING {100B25D153}>:
Argument Y is not a NUMBER: NIL
2017-12-26T14:38:10.862000Z ERROR PostgreSQL [""TableHere""] Database error 22P02: : "197?"
CONTEXT: COPY TableHere, 14168, ColumnHere: "197?"

debugger invoked on a SIMPLE-TYPE-ERROR in thread
#<THREAD "lparallel" RUNNING {100B25C673}>:
Argument Y is not a NUMBER: NIL
2017-12-26T14:38:11.464000Z ERROR PostgreSQL [""TableHere""] Database error 22P02: : "197?"
CONTEXT: COPY TableHere, 11172, ColumnHere: "197?"

This is okay, however, after that the pgloader did not respond. I waited a very long time (several hours)...
In total, I could not stand it and made a forced stop pgloader using Ctrl+C
After that, the pgloader showed a huge bunch of errors. I attach them to pastebin: https://pastebin.com/pd4btiu8
What was it and why did the pgloader hang?

@dimitri
Copy link
Owner

dimitri commented Dec 26, 2017

Can you share a self-contained example for me to reproduce your error here?

It seems that one key to this problem is the PostgreSQL error type being reported: 22P02 is invalid_text_representation, as seen in https://www.postgresql.org/docs/10/static/errcodes-appendix.html. It could be mis-handled by pgloader, either because the COPY error message isn't reporting a line number, or because pgloader believes it can't process this kind of situation at all --- and then we might have two bugs to fix.

@hyperion-cs
Copy link
Author

Below I attach an example of a table schema and an example of a file for import, which gives the error above.
Dataset_Paradox.zip

@dimitri
Copy link
Owner

dimitri commented Dec 26, 2017

Thanks for the test case! Here's what I get with it, given the following load file:

load csv
    from './Dataset_Paradox/DateToLoad.csv'
    into pgsql:///pgloader
    target table t705

    with truncate,
         fields optionally enclosed by '0x27',
         fields terminated by ';'

    before load execute 't705.sql';

Note that 0x27 is single quote and an easy trick to specify it here, since it looks like I forgot to enable escaping it properly.

2017-12-26T20:57:38.071842+01:00 ERROR we are reading non quoted csv data and found a quote at 1
'ВЛАДИМИР';'КУДРИН';'ПЕТРОВИЧ';;;;1;1;1937;;;'АНТИКРЕДИТОРЫ АСТРАХАНЬ 2010';'{"D":["БОМЖ","","","","","","","АЛКОГОЛИЗМ","","2096.03.29"]}'
2017-12-26T20:57:39.401697+01:00 ERROR PostgreSQL ["t705"] Database error 22P02: invalid input syntax for integer: "196-"
CONTEXT: COPY t705, line 9753, column YearBirth: "196-"
2017-12-26T20:57:40.147637+01:00 LOG report summary reset
             table name     errors       rows      bytes      total time
-----------------------  ---------  ---------  ---------  --------------
                  fetch          0          0                     0.000s
            before load          0          2                     0.027s
-----------------------  ---------  ---------  ---------  --------------
                   t705          2      45600    11.4 MB          2.077s
-----------------------  ---------  ---------  ---------  --------------
        Files Processed          0          1                     0.000s
COPY Threads Completion          0          2                     2.077s
-----------------------  ---------  ---------  ---------  --------------
      Total import time          2      45600    11.4 MB          2.077s

Also the following t705.sql script is used, based on yours:

DROP TABLE IF EXISTS t705;

CREATE TABLE t705
(
    "FirstName" text   ,
    "LastName" text    ,
    "MiddleName" text  ,
    "Telephone" text   ,
    "Car" text         ,
    "Passport" text    ,
    "DayBirth" smallint,
    "MonthBirth" smallint,
    "YearBirth" smallint,
    "SNILS" text       ,
    "INN" text         ,
    "Information" text ,
    "Base" text        
);

dimitri added a commit that referenced this issue Dec 26, 2017
The option "fields optionally enclosed by" was missing a way to easily
specify a single quote as the quoting character. Add '\'' to the existing
solution '0x27' which isn't as friendly.

See #705.
@dimitri
Copy link
Owner

dimitri commented Dec 26, 2017

So please build pgloader from fresh sources and try again. Also you will now be able to use the following definition for import:

load csv
    from './Dataset_Paradox/DateToLoad.csv'
    into pgsql:///pgloader
    target table t705

    with truncate,
         fields optionally enclosed by '\'',
         fields terminated by ';'

    before load execute 't705.sql';

@hyperion-cs
Copy link
Author

hyperion-cs commented Dec 26, 2017

I use next commands file (I've already got an idea about 0x27 for a long time and I use it actively, because simple escaping did not work ...):

LOAD CSV
FROM stdin WITH ENCODING utf8
HAVING FIELDS
(
"FirstName","LastName","MiddleName","Telephone","Car","Passport","DayBirth","MonthBirth","YearBirth","SNILS","INN","Base","Information"
)
INTO postgresql://login:password@localhost/dbname?"People"
TARGET COLUMNS
(
"FirstName","LastName","MiddleName","Telephone","Car","Passport","DayBirth","MonthBirth","YearBirth","SNILS","INN","Base","Information"
)
WITH
fields optionally enclosed by '0x27',
fields terminated by ';',
quote identifiers,
workers = 16, concurrency = 4

SET work_mem to '256 MB', maintenance_work_mem to '4096 MB';

Do you see an error with this config?

@hyperion-cs
Copy link
Author

A similar error (22P02) is observed in other files (with the same table and config), where incorrect input is specified instead of smallint (ex.: 196-, 197? and so on). Everything freezes, and then Ctrl + C is shown a lot of mistakes.
Help me please. This is a very important problem that I can not solve.

@dimitri
Copy link
Owner

dimitri commented Dec 28, 2017

Which version of pgloader are you using? did you try with a local build from current git master version?

@hyperion-cs
Copy link
Author

hyperion-cs commented Dec 28, 2017

$ pgloader --version
pgloader version "3.4.1"
compiled with SBCL 1.3.1.debian

@dimitri
Copy link
Owner

dimitri commented Dec 28, 2017

Yeah, please try with current git version, and report back. Given that I can't reproduce, I expect it'll fix it for you!

@hyperion-cs
Copy link
Author

It seems that the update helped. However, I'm still testing. I keep you informed, thank you!

dimitri added a commit that referenced this issue Jan 14, 2018
The previous patch introduced parser conflicts and we couldn't parse some
expressions any more, such as the following:

        fields escaped by '\',

It's now possible to represent single quote as either '''', '\'', or '0x27'
and we still can parse '\' as being a single backslash character.

See #705.
@dimitri
Copy link
Owner

dimitri commented Feb 1, 2018

Any news? Shall we close the issue now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants