Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot understand migration error reasons #1530

Closed
2 tasks
Anth0nyME opened this issue Sep 13, 2023 · 8 comments
Closed
2 tasks

Cannot understand migration error reasons #1530

Anth0nyME opened this issue Sep 13, 2023 · 8 comments

Comments

@Anth0nyME
Copy link

Anth0nyME commented Sep 13, 2023

Hello!

After ~24h of migration (source DB size ~ 200Gb, not so quick RAID solution) i got error. I can't understand reasons of erorr, please assist (if possible) to understand it.

Thank you!

OS: Rocky Linux 8.8 (Green Obsidian)
postgres (PostgreSQL) 15.4 (dnf, https://download.postgresql.org/pub/repos)
MariaDB: 10.11.5-MariaDB

pgloader version "3.6.7~devel" (dnf)
compiled with SBCL 2.2.10-1.rhel8

Goal of migration: transfer the Zabbix 6.0.21 LTS partitioned DB from MariaDB to PostgreSQL (without TimescaleDB ATM) via Unix sockets (single host enviroment). This is TEST server, no zabbix-server daemon is running.

DS size (on disk) on failure time.
Source (mariadb): ~172Gb
Destination (postgresql): ~210Gb

  • did you test a fresh compile from the source tree?

No

  • did you search for other similar issues?

Yes but it too complicated without reasons understanding.

What i did:

sudo -u postgres pgloader zabbix_migrate.load

Content of "zabbix_migrate.load":

LOAD DATABASE
FROM mysql://user:password@unix:/var/lib/mysql/mysql.sock:/zabbixdb
INTO postgresql://unix:/tmp:5432/zabbixdb
WITH include no drop,
truncate,
create no tables,
create no indexes,
no foreign keys,
reset sequences,
data only
SET maintenance_work_mem TO '1024MB', work_mem to '256MB'
ALTER SCHEMA 'zabbixdb' RENAME TO 'public'
BEFORE LOAD EXECUTE create.sql
AFTER LOAD EXECUTE alter.sql;

Result (console):

. . .
:START 13223 :COUNT 100000) :LOCK #<SB-THREAD:MUTEX "Anonymous lock" free owner=0> :NOTIFY-PUSH #<SB-THREAD:WAITQUEUE #1=Anonymous condition variable {101D796B13}> :NOTIFY-POP #<SB-THREAD:WAITQUEUE #1# {100E4BB783}>) :DISABLE-TRIGGERS NIL :ON-ERROR-STOP T :COLUMNS NIL)
8: ((FLET "BODY-FN0" :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK))
9: ((LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK))
10: (LPARALLEL.KERNEL::EXEC-TASK/WORKER #<unavailable argument> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" free owner=0> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10057C0003}> :RUNNING-CATEGORY :DEFAULT :INDEX 3 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD #2=(LPARALLEL.SPIN-QUEUE::DUMMY) :TAIL #2#)))
11: (LPARALLEL.KERNEL::WORKER-LOOP #<LPARALLEL.KERNEL:KERNEL :NAME "lparallel" :WORKER-COUNT 4 :USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {1007984643}> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" free owner=0> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10057C0003}> :RUNNING-CATEGORY :DEFAULT :INDEX 3 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD #2=(LPARALLEL.SPIN-QUEUE::DUMMY) :TAIL #2#)))
12: (LPARALLEL.KERNEL::%CALL-WITH-TASK-HANDLER #<unavailable argument>)
13: ((LAMBDA NIL :IN LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT))
14: (LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT #<FUNCTION (LAMBDA NIL :IN LPARALLEL.KERNEL::ENTER-WORKER-LOOP) {1005EF6A5B}> #<FUNCTION FUNCALL> #<LPARALLEL.KERNEL:KERNEL :NAME "lparallel" :WORKER-COUNT 4 :USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {1007984643}> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" free owner=0> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10057C0003}> :RUNNING-CATEGORY :DEFAULT :INDEX 3 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD #2=(LPARALLEL.SPIN-QUEUE::DUMMY) :TAIL #2#)))
15: ((LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-WORKER-THREAD))
16: ((LAMBDA NIL :IN BORDEAUX-THREADS::BINDING-DEFAULT-SPECIALS))
17: ((FLET SB-UNIX::BODY :IN SB-THREAD::RUN))
18: ((FLET "WITHOUT-INTERRUPTS-BODY-11" :IN SB-THREAD::RUN))
19: ((FLET SB-UNIX::BODY :IN SB-THREAD::RUN))
20: ((FLET "WITHOUT-INTERRUPTS-BODY-4" :IN SB-THREAD::RUN))
21: (SB-THREAD::RUN)
22: ("foreign function: call_into_lisp_")
23: ("foreign function: funcall1")


What I am doing here?

Database error: Connection to database server lost.

Result (pgloader.log):

 :START 13223 :COUNT 100000) :LOCK #<SB-THREAD:MUTEX "Anonymous lock" free owner=0> :NOTIFY-PUSH #<SB-THREAD:WAITQUEUE #1=Anonymous condition variable {101D796B13}> :NOTIFY-POP #<SB-THREAD:WAITQUEUE #1# {100E4BB783}>) :DISABLE-TRIGGERS NIL :ON-ERROR-STOP T :COLUMNS NIL)
8: ((FLET "BODY-FN0" :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK))
9: ((LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK))
10: (LPARALLEL.KERNEL::EXEC-TASK/WORKER #<unavailable argument> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" free owner=0> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10057C0003}> :RUNNING-CATEGORY :DEFAULT :INDEX 3 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD #2=(LPARALLEL.SPIN-QUEUE::DUMMY) :TAIL #2#)))
11: (LPARALLEL.KERNEL::WORKER-LOOP #<LPARALLEL.KERNEL:KERNEL :NAME "lparallel" :WORKER-COUNT 4 :USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {1007984643}> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" free owner=0> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10057C0003}> :RUNNING-CATEGORY :DEFAULT :INDEX 3 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD #2=(LPARALLEL.SPIN-QUEUE::DUMMY) :TAIL #2#)))
12: (LPARALLEL.KERNEL::%CALL-WITH-TASK-HANDLER #<unavailable argument>)
13: ((LAMBDA NIL :IN LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT))
14: (LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT #<FUNCTION (LAMBDA NIL :IN LPARALLEL.KERNEL::ENTER-WORKER-LOOP) {1005EF6A5B}> #<FUNCTION FUNCALL> #<LPARALLEL.KERNEL:KERNEL :NAME "lparallel" :WORKER-COUNT 4 :USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {1007984643}> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" free owner=0> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# free owner=0> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10057C0003}> :RUNNING-CATEGORY :DEFAULT :INDEX 3 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD #2=(LPARALLEL.SPIN-QUEUE::DUMMY) :TAIL #2#)))
15: ((LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-WORKER-THREAD))
16: ((LAMBDA NIL :IN BORDEAUX-THREADS::BINDING-DEFAULT-SPECIALS))
17: ((FLET SB-UNIX::BODY :IN SB-THREAD::RUN))
18: ((FLET "WITHOUT-INTERRUPTS-BODY-11" :IN SB-THREAD::RUN))
19: ((FLET SB-UNIX::BODY :IN SB-THREAD::RUN))
20: ((FLET "WITHOUT-INTERRUPTS-BODY-4" :IN SB-THREAD::RUN))
21: (SB-THREAD::RUN)
22: ("foreign function: call_into_lisp_")
23: ("foreign function: funcall1")

The "systemctl status" shows both servers in alive state ATM.

@svantevonerichsen6906
Copy link
Collaborator

The start of the stacktrace seems to be missing, and the error message.

@Anth0nyME
Copy link
Author

Anth0nyME commented Sep 13, 2023

Oops....

I think i found reason, it's my carelessness.

df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.7G 0 7.7G 0% /dev
tmpfs 7.7G 1.1M 7.7G 1% /dev/shm
tmpfs 7.7G 362M 7.4G 5% /run
tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup
/dev/sda1 120G 11G 109G 10% /
/dev/sda3 3.9G 881M 2.8G 24% /boot
/dev/sdb1 549G 521G 65M 100% /data
tmpfs 1.6G 0 1.6G 0% /run/user/1000

I'm sorry.

@Anth0nyME
Copy link
Author

The start of the stacktrace seems to be missing, and the error message.

Please explain what it might look like? I will posting more full report in future.

@svantevonerichsen6906
Copy link
Collaborator

Ah, you may notice that the lines you showed start with 8:, 9:, etc. Each of these presents a stack frame. There are 8 frames on top of those shown. The output should also usually start with a description of the error, such as »error STORAGE-CONDITION signalled. [Condition of type STORAGE-CONDITION]«.

@Anth0nyME
Copy link
Author

Many thanks! )

@Anth0nyME
Copy link
Author

Damn, another error...

Couldn't write to #<SB-SYS:FD-STREAM for "socket, peer: /var/lib/mysql/mysql.sock" {1028B0ADA3}>:
Broken pipe

Will try to use TCP instead of unix sockets

@Anth0nyME
Copy link
Author

Anth0nyME commented Sep 18, 2023

Damn, another error (

From console:

`
Heap exhausted during garbage collection: 640 bytes available, 2112 requested.
Immobile Object Counts
Gen layout fdefn symbol code Boxed Cons Raw Code SmMix Mixed LgRaw LgCode LgMix Waste% Alloc Trig Dirty GCs Mem-age
1 0 0 0 0 530 4 6671 0 0 15 0 0 0 1.3 233410960 143313498 7220 1 1.3589
2 0 0 0 0 860 1 7697 0 0 6 0 0 0 1.3 276994448 10737418 8410 0 0.6591
3 0 0 0 0 764 2 6117 0 1 13 0 0 0 1.1 223513840 10737418 6460 0 0.2350
4 0 0 2 0 149 4 629 0 1 14 0 0 0 1.0 25842032 10737418 630 0 0.0000
5 48 0 0 70 394 27 1730 1 7 41 0 0 4300 2.0 208818064 219555482 1780 29 0.0000
fatal error encountered in SBCL pid 856986 tid 856998:
GC invariant lost, file "gencgc.c", line 522

Welcome to LDB, a low-level debugger for the Lisp runtime environment.
(GC in progress, oldspace=1, newspace=2)
ldb>
`

No errors in pgloader.log file.

Last records:

2023-09-16T13:40:11.026301+01:00 NOTICE COPY public.history_uint with 2318469921 rows estimated [3/4] 2023-09-16T13:40:11.035301+01:00 NOTICE COPY public.history with 747700580 rows estimated [1/4] 2023-09-16T20:21:37.475199+01:00 NOTICE COPY public.trends with 59719160 rows estimated [1/4] 2023-09-16T21:05:40.778050+01:00 NOTICE COPY public.auditlog with 19082647 rows estimated [1/4]

I don't understand what i need to do to fix it...
Console error and "ldb>" prompt give no direction how to continue migration process.

Can anybody assist to make this problem more... clear?

Thank you!

@Anth0nyME
Copy link
Author

Found: #962

Will try to use "with prefetch rows = 10000"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants