-
-
Notifications
You must be signed in to change notification settings - Fork 597
Open
Description
Hi,
i have to import big csv into postgres. The file is a csv and it have only 5 columns but a lot of rows; the size is around 20 GB.
The pgloader import to postgresql takes 4h38m for a file with 20 GB size. This is my pgloader settings for the import is
LOAD CSV
FROM {{FILEPATH}}
HAVING FIELDS
(
NUM_TELEFONO,
CODICE_CIC,
DESCRIZIONE_CIC,
DATA_ATTIVAZIONE_COMPONENTE [date format 'YYYY-MM-DD HH24:MI:SS'],
DATA_CESSAZIONE_COMPONENTE [date format 'YYYY-MM-DD HH24:MI:SS']
)
INTO postgresql://{{PGUSER}}@{{PGHOST}}:{{PGPORT}}/{{PGDATABASE}}
TARGET TABLE TABLEDEST
(
NUM_TELEFONO text using (or NUM_TELEFONO "ND"),
CODICE_CIC text using (or CODICE_CIC "ND"),
DESCRIZIONE_CIC text using (or DESCRIZIONE_CIC "ND"),
DATA_ATTIVAZIONE_COMPONENTE,
DATA_CESSAZIONE_COMPONENTE,
MESE_RIFERIMENTO text using "XXXXX",
YYYYMMDD text using "YYYY"
)
WITH
workers = 16, concurrency = 4,
fields optionally enclosed by '"',
batch rows=100000,
batch size = 200 MB,
batch concurrency = 3,
fields terminated by ';'
SET client_encoding to 'utf8',
work_mem to '12MB',
standard_conforming_strings to 'on';
I give you major details of my environment:
- the database is cloud sql of google cloud platform: 8 vCPUs, 32 GB memory
- the vm that launch pgloader is a vm of google cloud platform : 8 vCPUs, 32 GB memory
- the vm and the dabatase are inside the same vpc and not communicate with internet but using private network
- the table target of the import not have index and trigger.
- the bad records is extremely rare so i can configure pgloader to use big batch in order to improve performance
My question are:
- can i improve the performance ? actually i have imported the big csv: 20 GB, in 4h38m
- some of the settings configured above cause slowness ?
- bearing in mind that I will have a powerful vm and an powrful db in production, what option can improve the performance of my import ?
- what is the behavior of batch concurrency option ? i dont find any note in the docs
Thanks a lot
regards
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels