Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to operator 2.13.0 fail when postgres_extra_args is present #1760

Closed
3 tasks done
woodbb opened this issue Mar 12, 2024 · 14 comments · Fixed by #1765
Closed
3 tasks done

Upgrade to operator 2.13.0 fail when postgres_extra_args is present #1760

woodbb opened this issue Mar 12, 2024 · 14 comments · Fixed by #1765

Comments

@woodbb
Copy link

woodbb commented Mar 12, 2024

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that the AWX Operator is open source software provided for free and that I might not receive a timely response.

Bug Summary

When doing an upgrade of AWX from older version using postgres-13 to the 2.13.0 operator with postgres-15, a new postgres PV is created (empty) and the old PV is removed.

AWX Operator version

2.13.0

AWX version

24.0.0

Kubernetes platform

openshift

Kubernetes/Platform version

4.13

Modifications

no

Steps to reproduce

Install latest AWX operator (2.13.0) into namespace with old operator running (named the same as old instance).

Expected results

I would expect the same PV to be used and the database migrations to happen - or steps to migrate data from old PV to new PV

Actual results

new PV created, old PV removed .. resulting in loss of data

Additional information

If we are not upgrading properly, and there is documentation on the proper way to upgrade, please point me in that direction.

Or if we should be leveraging labels on the PV for postgres in a customized manner, please let me know what parameters should be in our awx instance.

Any guidance is appreciated.

Operator Logs

No response

@woodbb
Copy link
Author

woodbb commented Mar 13, 2024

Side note:

postgres_extra_args:
- '-c'
- 'max_connections=1000'

^ this setting seems to cause the postgres pod to fail with an entrypoint error. Removing these extra_args will allow the pod to start.

@saif-88
Copy link

saif-88 commented Mar 13, 2024

i had to copy data to the new pv before starting the operator and even after database data migration to postgresql 15 the web stuck at "AWX is Upgrading"

@TheRealHaoLiu
Copy link
Member

investigating now

@TheRealHaoLiu TheRealHaoLiu self-assigned this Mar 13, 2024
@Tfinn92
Copy link

Tfinn92 commented Mar 13, 2024

We observed the same behavior as above. I had to remove all postgres_extra_args for the 15 pod to come up correctly:

    postgres_extra_args:
    - '-c'
    - 'huge_pages=off'
    - '-c'
    - 'shared_buffers=512MB'
    - '-c'
    - 'max_connections=1000'

That being said, eventually the data migrated over; however, we started seeing TONS of X-DAB-JW-TOKEN header not set errors

2024-03-12 23:11:33,055 INFO     X-DAB-JW-TOKEN header not set for JWT authentication
2024-03-12 23:11:33,071 ERROR    Websocket connection does not provide valid authentication
172.16.1.17 - - [12/Mar/2024:23:11:33 +0000] "GET /websocket/ HTTP/1.1" 403 5 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3.1 Safari/605.1.15" "10.xx.xx.xxx"

As a troubleshooting step, we killed the pods in the awx namespace, upon them coming back up, our database had been lost and all data inside was missing. Had to roll back to 23.9.0 and 2.12.2 and restore a db dump we had taken beforehand.

@TheRealHaoLiu
Copy link
Member

it seems like the postgres 15 image doesn't take the args correctly for some reason...

@TheRealHaoLiu
Copy link
Member

the postgres 15 container is it in crashloopbackoff? and if so can you provide the log output

@TheRealHaoLiu TheRealHaoLiu changed the title Upgrade to operator 2.13.0 creates new DB PVC Upgrade to operator 2.13.0 fail when postgres_extra_args is present Mar 13, 2024
@gundalow
Copy link

gundalow commented Mar 13, 2024

@woodbb @saif-88 @Tfinn92 Just wanted to say thanks for testing this and the detailed feedback :)

@TheRealHaoLiu
Copy link
Member

The DAB is a separate problem. Can you open another issue for that @Tfinn92

@Tfinn92
Copy link

Tfinn92 commented Mar 13, 2024

The DAB is a separate problem. Can you open another issue for that @Tfinn92

I can yeah, I nuked my install of that newer version so further than the few lines I posted, I sadly don't have much more logs to share at this time. But let me get that issue opened up

@woodbb
Copy link
Author

woodbb commented Mar 13, 2024

I see this is closed.. but the original issue where a new PV is created for postgres, the old is deleted (I believe it should be retained by default?) and no migration of data from old PV to new PV happens is still my bigger issue.

I have ran through this twice with the same results.

@TheRealHaoLiu
Copy link
Member

i also fix the retention thing #1767

@woodbb
Copy link
Author

woodbb commented Mar 14, 2024

Thank you @TheRealHaoLiu!

...I assume data migration from old PV to new PV will also happen?

@TheRealHaoLiu
Copy link
Member

yes

@woodbb
Copy link
Author

woodbb commented Mar 14, 2024

just a quick note, I have tested this on a few different AWX instances and the upgrade from operator 2.12.x to 2.13.1 now goes as expected.

2.13.0 to 2.13.1 did not go well, but it was an easy roll back, so I just rolled those instances back to the 2.12 operator and upgraded to 2.13.1.

Thanks to all who worked on this! Excellent and timely response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants