Performance:  slow CSV uploads

Hi,
i have to import big csv into postgres. The file is a csv and it have only 5 columns but a lot of rows; the size is around 20 GB.
The pgloader import to postgresql takes 4h38m for a file with 20 GB size. This is my pgloader settings for the import is

```
LOAD CSV
     FROM {{FILEPATH}}
     HAVING FIELDS
      (
         NUM_TELEFONO,
         CODICE_CIC,
         DESCRIZIONE_CIC,
         DATA_ATTIVAZIONE_COMPONENTE [date format 'YYYY-MM-DD HH24:MI:SS'],
         DATA_CESSAZIONE_COMPONENTE [date format 'YYYY-MM-DD HH24:MI:SS']
      )     
     INTO postgresql://{{PGUSER}}@{{PGHOST}}:{{PGPORT}}/{{PGDATABASE}}
     TARGET TABLE TABLEDEST
        (
          NUM_TELEFONO text using (or NUM_TELEFONO "ND"),
          CODICE_CIC text using (or CODICE_CIC "ND"),
          DESCRIZIONE_CIC text using (or DESCRIZIONE_CIC "ND"),
          DATA_ATTIVAZIONE_COMPONENTE,
          DATA_CESSAZIONE_COMPONENTE,
          MESE_RIFERIMENTO text using "XXXXX",
          YYYYMMDD text using "YYYY"
        )
     WITH 
          workers = 16, concurrency = 4,
          fields optionally enclosed by '"',
          batch rows=100000,
          batch size = 200 MB,
          batch concurrency = 3,
          fields terminated by ';'

      SET client_encoding to 'utf8',
          work_mem to '12MB',
          standard_conforming_strings to 'on';
```

I give you major details of my environment:
- the database is cloud sql of google cloud platform: 8 vCPUs, 32 GB memory
- the vm that launch pgloader is a vm of google cloud platform :  8 vCPUs, 32 GB memory
- the vm and the dabatase are inside the same vpc  and not communicate with internet but using private network
- the table target of the import not have index and trigger.
- the bad records is extremely rare so i can configure pgloader to use big batch in order to improve performance

My question are:
- can i improve the performance ? actually i have imported the big csv: 20 GB, in 4h38m
- some of the settings configured above cause slowness ? 
- bearing in mind that I will have a powerful vm and an powrful db in production, what option can improve the performance of my import ? 
- what is the behavior of  batch concurrency option ? i dont find any note in the docs

Thanks a lot
regards





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance: slow CSV uploads #1251

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Performance: slow CSV uploads #1251

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions