Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Lemmy 0.18.3 BE - Couldn't run DB Migrations: Failed to run 2023-07-08-101154_fix_soft_delete_aggregates #3756

Closed
4 tasks done
wpuckering opened this issue Jul 28, 2023 · 40 comments
Labels
area: database bug Something isn't working

Comments

@wpuckering
Copy link

wpuckering commented Jul 28, 2023

Requirements

  • Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • Did you check to see if this issue already exists?
  • Is this only a single bug? Do not put multiple bugs in one issue.
  • Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.

Summary

When upgrading Lemmy BE from the 0.18.2 container image to the 0.18.3 container image, database migrations fail with this error:

thread 'main' panicked at 'Couldn't run DB Migrations: Failed to run 2023-07-08-101154_fix_soft_delete_aggregates with: syntax error at or near "trigger"', crates/db_schema/src/utils.rs:221:25

Reverted back to the 0.18.2 tag and my instance is running normally again without issue.

For the Postgres database I'm using postgres:13.2-alpine.

Steps to Reproduce

  1. Spin down Lemmy 0.18.2 BE container.
  2. Change container image to reference 0.18.3 tag.
  3. Spin up Lemmy 0.18.3 BE container.

Technical Details

thread 'main' panicked at 'Couldn't run DB Migrations: Failed to run 2023-07-08-101154_fix_soft_delete_aggregates with: syntax error at or near "trigger"', crates/db_schema/src/utils.rs:221:25 note: run with 'RUST_BACKTRACE=1' environment variable to display a backtrace

Version

BE 0.18.3

Lemmy Instance URL

No response

@wpuckering wpuckering added the bug Something isn't working label Jul 28, 2023
@dessalines
Copy link
Member

Where did you find that postgres version?

@wpuckering
Copy link
Author

I've been using this container image for some of my self-hosted Postgres containers for a while now: https://hub.docker.com/layers/library/postgres/13.2-alpine/images/sha256-3335d0494b62ae52f0c18a1e4176a83991c9d3727fe67d8b1907b569de2f6175?context=explore

Admittedly I should get around to upgrading them all to a more recent version, but everything's been working fine for the 20 or so other services I host that use this image, so I wasn't in a rush. I didn't think the likelihood of hitting an incompatibility would be all that high, but maybe that's the case now?

Lemmy 0.18.2 runs fine with the database using that image. Does Lemmy 0.18.3 introduce some syntax for database migrations that's only valid on a higher version of Postgres?

@dessalines
Copy link
Member

That's not a postgres version we support. All our CI jobs would fail if we used older postgres versions. Use the backup and restore docs to do a postgres upgrade.

@dessalines dessalines closed this as not planned Won't fix, can't repro, duplicate, stale Jul 28, 2023
@wpuckering
Copy link
Author

Okay, I'll upgrade and see if it resolves the issue. Thanks!

@RocketDerp

This comment was marked as abuse.

@RocketDerp

This comment was marked as abuse.

@dessalines
Copy link
Member

We can't support from-scratch installs, just docker and ansible installs, and neither of those use the postgres version above.

@RocketDerp

This comment was marked as abuse.

@dessalines
Copy link
Member

We already have that warning: https://join-lemmy.org/docs/administration/administration.html

@RocketDerp

This comment was marked as abuse.

@wpuckering
Copy link
Author

wpuckering commented Jul 28, 2023

Just chiming in, I took a backup of my Postgres 13.2 database, and restored it into Postgres 15.3. It's still having issues running the migrations, it fails at the same point but yields a more accurate error:

thread 'main' panicked at 'Couldn't run DB Migrations: Failed to run 2023-07-08-101154_fix_soft_delete_aggregates with: relation "comment" does not exist', crates/db_schema/src/utils.rs:221:25

Might possibly have something to do with this? #3501

@wpuckering
Copy link
Author

Looks like somehow my comment table is missing. I didn't do anything on my part which should have caused it to be deleted, but that appears to be the root cause.

@wpuckering
Copy link
Author

I'm back in business. Something went very wrong with the migrations in 0.18.3, but I wasn't able to pinpoint what it was (I didn't try too hard to be honest, I just wanted to get up and running again).

Here's what I did that put me back into a good state in the end:

  • Deleted my database completely, and let it get rebuilt fresh
  • Stopped Lemmy
  • Restored my database backup from when my database was on Postgres 13.2, dropping existing objects and recreating them fresh as part of the restore
  • Deleted sent_activity and received_activity tables, since they were added when I initially tried to upgrade seamlessly (and these were causing the migration scripts to get hung up if I restored right away and tried to start Lemmy from 0.18.3 right away)
  • Started Lemmy, and let the migration scripts do their thing (this time they worked)

I don't know if that will help anyone else or not. Something seems janky with the migration scripts, maybe only if you had been running with an older version of Postgres (like 13) since before 0.18.3. Just glad I got up to the new version. Will definitely exercise more caution when upgrading from here on.

@dessalines
Copy link
Member

Were you by any chance running an rc, or main branch? I can't imagine why the comment table would go missing like that.

@wpuckering
Copy link
Author

Never ran any release candidates, always stuck to the stable container images.

@dessalines
Copy link
Member

Can you be precise, exact quote or GitHub line-link to precise line of content, exact warning that upgrades will crash in 0.18.3 that I added?

Screenshot_2023-07-28-18-48-04-445_mark.via.gp.png

@RocketDerp

This comment was marked as abuse.

@RocketDerp

This comment was marked as abuse.

@RocketDerp

This comment was marked as abuse.

@RocketDerp

This comment was marked as abuse.

@RocketDerp

This comment was marked as abuse.

@RocketDerp

This comment was marked as abuse.

@RocketDerp

This comment was marked as abuse.

@RocketDerp

This comment was marked as abuse.

@RocketDerp

This comment was marked as abuse.

@RocketDerp

This comment was marked as abuse.

@RocketDerp

This comment was marked as abuse.

@k4r4b3y
Copy link

k4r4b3y commented Aug 21, 2023

I compiled from source in order to upgrade from 0.18.2 to 0.18.4. I am following the "Install from scratch" guide (I know it is not the official method of lemmy install, however, I am not familiar with docker, and like running programs directly on my debian 12 system).

After compilation, I tried restarting the lemmy systemd service, however it fails to run. In the journalctl, I see the following error message:

lemmy_server[17631]: thread 'main' panicked at 'Couldn't run DB Migrations: Failed to run 2023-07-08-101154_fix_soft_delete_aggregates with: syntax error at or near "trigger"', crates/db_schema/src/utils.rs:221:25 

Is there a solution to that? I have lost my lemmy instance because of that.

@RocketDerp

This comment was marked as abuse.

@k4r4b3y
Copy link

k4r4b3y commented Aug 22, 2023

I have a backup before the upgrade. However, in my server, I have postgresql 13 and 15. I had the server running debian 11, and then I have dist-upgrade'd into debian 12, and along with it, came postgresql 15.

However, the psql db that lemmy is running is still 13 (up until lemmy v0.18.2).

To get things more trickier, I have been running a matrix dendrite server on the same VPS, on the psql 13 database (under a different database username, of course).

So, how would a database restore work in this case? I would like the dendrite database to be left unscathed and only migrate the lemmy db data from the psql 13 database to the psql database 15.

@RocketDerp

This comment was marked as abuse.

@k4r4b3y
Copy link

k4r4b3y commented Aug 22, 2023

@RocketDerp are you in the Lemmy Matrix support rooms? If so, can you DM me, or ping me on one of the Lemmy Upgrade or Lemmy Install matrix rooms? My matrix username is @k4r4b3y:karapara.net

I would like to learn more about how I can do the lemmy database migration from psql 13 to 15. I have never used psql (or any other SQL software before).

Also, afaik, the lemmy is by default using the psql 15 on my system, however, the debian psql 15 have inherited the previous existing psql 13 database and running it. So, the software psql 15 is running the db psql 13 on my debian 12. How do I fix these things?

@RocketDerp

This comment was marked as abuse.

@k4r4b3y
Copy link

k4r4b3y commented Aug 22, 2023

@RocketDerp thank you for the detailed answer. I will create an account on the lemmy server you posted and detail my issue on the sub-lemmy you posted there. Is that ok?

I'd start there and talk specific port numbers.

I looked into those yesterday with phind.com AI assistant. And I realized that the ports 5432 and 5433 are being used. Afaik, 5432 belongs to the psql 13 and 5433 belongs to the psql 15.

@RocketDerp

This comment was marked as abuse.

@k4r4b3y
Copy link

k4r4b3y commented Aug 22, 2023

Thanks. I should have posted a thread on there in an hour.

@k4r4b3y
Copy link

k4r4b3y commented Aug 22, 2023

@RocketDerp , I posted here: https://bulletintree.com/post/3047685

@RocketDerp

This comment was marked as abuse.

@akohlsmith
Copy link

I have the same issue. I'm using an EXTERNAL postgres database and for similar reasons to others, the schema update failed. For me, however, restoring from backup after upgrading to Postgresql 15 (which deletes the old db) didn't work and I still have the same error.

Now I can manually fix the up.sql/down.sql (and have tested this) but docker-compose up still fails, presumably becuase the up/down scripts in the docker image are still "not fixed".

I did try docker-compose run lemmy /bin/sh to try to poke around inside the container but I'm quite sure that's not the right way to do this. I know enough about docker to be dangerous, but that's it.

@gribodyr
Copy link

For anyone who lands here trying to run a fresh "install from scratch", here's what worked for me.

  1. Make sure you have PostgreSQL 16 installed
  2. Make sure YOU DO NOT have previous versions of PostgreSQL installed (like 12 which is bundled with Ubuntu 20.2). To check:
user@lemmy:~/lemmy$ pg_lsclusters 
Ver Cluster Port Status Owner    Data directory              Log file
16  main    5432 online postgres /var/lib/postgresql/16/main /var/log/postgresql/postgresql-16-main.log
  1. If you have older PG clusters running, do something like this. MAKE SURE YOU DON'T HAVE ANY DATABASES RUNNING ON THE OLD CLUSTER(S).
pg_dropcluster --stop 12 main
  1. Make sure your PG16 port number is 5432 (if you had other clusters, it might be different). Change it to 5432 if necessary by editing postgresql.conf and restart PG if necessary.
  2. Repeat "Install from Scratch" instructions for setting up a DB on the PG16's cluster if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: database bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants