Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration fails and leaves AWX install in a bad state #138

Closed
allen00se opened this issue Sep 12, 2017 · 44 comments
Closed

Migration fails and leaves AWX install in a bad state #138

allen00se opened this issue Sep 12, 2017 · 44 comments

Comments

@allen00se
Copy link

ISSUE TYPE
  • Bug Report
COMPONENT NAME
  • Installer
SUMMARY

AWX install seemed to complete without error, however it appears to be hung in the migration phase. The web interface shows AWX is upgrading and tailing the log files shows a traceback error repeatedly.

ENVIRONMENT
STEPS TO REPRODUCE

Install AWX per guide

EXPECTED RESULTS
ACTUAL RESULTS
ADDITIONAL INFORMATION

here is a snippet from the error that keeps repeating in the log file

django.db.utils.ProgrammingError: relation "main_schedule" does not exist
LINE 1: ...le"."next_run", "main_schedule"."extra_data" FROM "main_sche...

@wenottingham
Copy link
Contributor

Full error, please.

@allen00se
Copy link
Author

https://paste.ee/p/A4zGi

@matburt
Copy link
Member

matburt commented Sep 12, 2017

I've been hearing about this for a little while from various folks... is there any way you can run this again and see if it happens again grabbing more of the output the occurred above this as this isn't the totality of the awx_task container log.

Basically... in some situations migrations seem to be failing but no one has been able to show me a log that shows the actual migrations failing at the top of the awx_task container log.

@knechtionscoding
Copy link

@matburt This is the same error I encountered yesterday. #116.

@allen00se The solution that worked for me and @phandolin was the following:

  1. Stop each container in descending order:
  • awx_task
  • awx_web
  • memcached
  • rabbitmq
  • postgres
  1. Remove /tmp/pgdocker/

  2. Re-run install.yml with no other changes.

Not sure what the cause was, I couldn't grab the logs yesterday before they were overwritten.

@allen00se
Copy link
Author

@matburt

Youre in luck bc I have it from the start...

https://paste.ee/p/i1buQ

@allen00se
Copy link
Author

@knechtionscoding

Thanks man, that looks to be working, further along than the first try already.

@matburt
Copy link
Member

matburt commented Sep 12, 2017

The important thing from @knechtionscoding was this bit:

  1. Remove /tmp/pgdocker

Here's the line from you @allen00se that's relevant

django.db.utils.IntegrityError: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
DETAIL:  Key (typname, typnamespace)=(django_migrations_id_seq, 2200) already exists.

Had tried to start it up once before and it failed and then went further the next time?

Maybe two parallel installs?

@matburt matburt changed the title AWS migration hung Migration fails and leaves AWX install in a bad state Sep 12, 2017
@matburt
Copy link
Member

matburt commented Sep 12, 2017

We need to find a way to make sure this kind of error starting up awx_task can't happen.

@knechtionscoding
Copy link

knechtionscoding commented Sep 12, 2017

@matburt When I had the issue I had previously run the playbook and had to fix a different error. I didn't even think about that.

Run a check for /tmp/pgdocker? Is that the storage of postgres permanently? Could it be removed on run of script?

Because if it can be removed on script run then the ansible can be a simple as something like this:

- stat:
    path: /tmp/pgdocker/
  register: pgdocker
- {delete /tmp/pgdocker/}
  when: pgdocker.stat.isdir is defined and pgdocker.isdir

If /tmp/pgdocker is permanent storage for the postgres db then this becomes an issue if someone runs the playbook against an already running awx tower (i.e. upgrade to a newer version, etc.).

@allen00se
Copy link
Author

@matburt @knechtionscoding
Same here I had the flag set for the AWX branding logos, but didn't have them in the correct location which caused the install to fail. I put the logos where they belonged and then ran the install again. Im sure at that is what caused the /pgdocker folder to exist already.

@phandolin
Copy link

@matburt @knechtionscoding @allen00se Yes, I had also run the playbook once, had to stop it and update the server, then ran again and that's when it happened here as well.

@matburt
Copy link
Member

matburt commented Sep 12, 2017

We definitely don't want to remove the database directory otherwise everyone would lose their data but this is super helpful to know. Usually these things are protected by transactions to keep from running into partial migrations.

I'll look at this a little closer.

@xjohnyknox
Copy link

Thanks, it works!

@switchboardOp
Copy link

Having the same or similar issue but using an external postgres host.
https://paste.ee/p/TDdd1

@AlanCoding
Copy link
Member

Is anyone still hitting this error? I see #116 is a related issue that seems to have resolve itself.

@ricalenil
Copy link

ricalenil commented Dec 23, 2017

Greetings,

I'm having this error when upgrading from 1.0.2.0 to 1.0.2.289

django.db.utils.ProgrammingError: column main_jobtemplate.credential_id does not exist
LINE 1: ...urvey_enabled", "main_jobtemplate"."survey_spec", "main_jobt...
^
I'm falling back to 1.0.2.0, but this needs to be corrected/fixed.

@matburt
Copy link
Member

matburt commented Jan 8, 2018

@ricalenil sorry, but there's no direct upgrade path from 1.0.2.0 to anything later. See: https://groups.google.com/d/msg/awx-project/PQLxKl5Rj9s/UGy-3VaCCQAJ

@mkempster22
Copy link

Just got this issue again upgrading today, I had this a while back and switched to an external postgres to try and avoid the issue. Have we got a way to fix this yet, completely removing the database is a bit drastic

@mkempster22
Copy link

mkempster22 commented Jan 12, 2018

For info, I'm currently running AWX 1.0.2.327
Running any sort of upgrade forces AWX to get completely stuck in 'AWX is upgrading' with a similar log to switchboardOp

Unfortunately i'm also having the issue of my /var/lib/docker/overlay folder filling up meaning the only way to clean it is to remove containers which will cause them get the latest containers. So I either run out of space and cant launch AWX, or I clean the files and ugrade and then cant launch AWX

@dfollereau
Copy link

I have the same issue with openshift origin 3.7.1 and ansible/latest (1.0)- symptom is web page constantly showing "AWX is Upgrading" and stays in this state forever. address: http://awx-web-svc-awx.192.168.99.100.nip.io/. I dont know which infos to give you to help me finding the issue

@thealexauer
Copy link

If I recall correctly get the logs from awx-task/awx-celery. Log should be there

@FloThinksPi
Copy link

@matburt i also got this when updating from 1.0.2.0 to 1.0.3.0. You said there is no (automatic) update path from 1.0.2 to anything later. Is there a manual way we can upgrade ? Database export import or something like that ?

@joshuacherry
Copy link

@matburt

there's no direct upgrade path from 1.0.2.0 to anything later

There seems to have been a few people in the past couple of days that have expressed problems with upgrading to 1.0.3 which suggests that some people (myself included) assumed there would be a path to upgrade between versions. I think it would be useful to clarify what AWX users can expect in terms of upgrading so this confusion can be avoided.

As mentioned in #1133 by @jakemcdermott

There is no explicit expectation that one should be able to smoothly upgrade from one commit to the next on the devel branch.

@akcrisp
Copy link

akcrisp commented Feb 13, 2018

Guys on doing upgrades. We appreciate its dev but equally I think you have to apply some common sense and realise you've got a lot of people using this software and therefore, it would be doing the community a great service if you can supply some guidelines - even manual about doing upgrades.

I still struggle with the thought that if your breaking dev branch - then you inevitably going to break the ansible tower deployments and therefore someone must surely be looking into what changes are taking place ?

So can somone please document what steps (whether manual) are required to allow updates. Would a db dump and load onto a new fresh db work ? I have the added complexetity of going from 1.0.1.173 at somepoint...

Andy

@johnjeffers
Copy link

Agreed with @akcrisp

Please give us something here besides manually recreating everything by hand. Once you get more than a handful of job templates, it's an excruciating task, especially if the jobs include surveys.

FWIW I looked into migrating to a new version via the API, but many things refer to specific IDs in the database. For example, job templates refer to credentials by their ID, which probably won't be the same in the new DB. This makes export/import via the API rather difficult.

I understand this is not a commercial product like Tower, and there is no expectation of support, but other Red Hat open-source products have reasonable upgrade paths.

@AlanCoding
Copy link
Member

There is a feature in the works for tower-cli (now also adopting the alias awx-cli) to copy and export data.

ansible/tower-cli#197

Like everything with the CLI project, it is intended to be cross-version compatible. This should solve the problem for some users in some situations once it rolls out.

@johnjeffers
Copy link

Another (kind of obvious) benefit of giving your users an upgrade path is that it gets people testing your newer code. If I'm stuck on an old version because I can't upgrade, I'm not doing you much good as a tester.

@bobobox
Copy link

bobobox commented Feb 13, 2018

Totally agreed - some path to be able to keep an AWX stack reasonably up to date without having to start over each time would be much appreciated, and frankly, seems like a basic requirement for RedHat to be able to say that they're offering a good faith free and non-commercial version of their product.

I've created scripts that use tower-cli (it's a handy Python library in addition to being a cli tool) to export and import my Job Templates and Surveys, as recreating all of these would be a nightmare, but it still results in the loss of Job History, and just isn't a fun time.

I understand that the AWX devs are working hard and that supporting DB upgrades between arbitrary commits is not practial, but clearly the upgrade issue is solved for downstream Tower. Maybe all it would take is to tag certain commits in AWX which 'track' Tower?

@matburt
Copy link
Member

matburt commented Feb 14, 2018

We may do more to track upgrade in the future, but currently that's not the case. This is very much an upstream development focused branch for Tower and as we've said before... there's no guarantees.

@akcrisp
Copy link

akcrisp commented Feb 14, 2018 via email

@ricalenil
Copy link

@matburt this kind of answers takes away credibility to this project. I think this community is helping a lot to tower development with an invaluable job. The least I expect of you is a little respect cause we are trusting in you and that you are doing a quality job. Imagine that Fedora tells the community that they are not going to support upgrading any more? What do you think is gonna happen? I understand that there’s no warranty, but upgrading is a very basic thing that any serious software may have!

Sent with GitHawk

@gregdek
Copy link
Contributor

gregdek commented Feb 15, 2018

@ricalenil as a former Fedora Project Leader, I can tell you the exact answer: for the first several versions of Fedora, we did not recommend upgrades in place, because they would break users in unsupportable ways.

AWX is young now, just as Fedora was young then. We are not saying "we will never support upgrades in place for AWX." We are saying "we do not currently support upgrades in place for AWX because we have other issues that are a higher priority." For now, that answer is firm.

@akcrisp
Copy link

akcrisp commented Feb 15, 2018

@gregdek I think however although awx might be young as an open source project - ansible tower (licensed version) is not. This is the up stream version of that product. So if you are breaking and preventing awx upgrades it is a logical assumption you will also be breaking the down stream ansible tower upgrades. In order for that not to happen - seeing as people are paying for that support - someone must be tracking those changes and planning remediation ? If so why can’t that be made available to the awx community ?

Regards

Andy

@mkempster22
Copy link

Unfortunately I don't think it would be as easy as that. I don't see any way it could be kept to a point where it is upgrade-able without extra resources dedicated to that which it's clear is not going to happen.
Although I do agree that the current way AWX is developed is going to turn people away from using it, such as when over December there was an issue where no one could run any job templates at all for weeks (this sort of thing shouldn't really get pushed without any form of testing?)

The best way to use AWX at the moment is download a version that is known to be working and stick with it. Use this to see if you would get the benefit of tower or not. The only reason to go around upgrading is if you are using AWX with the purpose of contributing to it and are not actually planning on using it as an alternative to tower

As for a open source alternative to tower. there doesn't seem to be a good option for that at the moment

@shanemcd
Copy link
Member

it is a logical assumption you will also be breaking the down stream ansible tower upgrades

This is an incorrect assumption. Tower upgrade paths are tested and supported. It should be completely understandable that there will be changes to the data models in the development branch between major releases that might cause issues when pulling new code. You should automate the provisioning of your AWX server rather than relying on a stable upgrade path.

@akcrisp
Copy link

akcrisp commented Feb 15, 2018 via email

@FloThinksPi
Copy link

So IMHO the issue can be closed as wontfix, the reason is understandable and answered above.

The focus atm is Dev and once this project comes near a tower release, there will be efforts in developing a upgrade path between the new and old tower/awx versions as @shanemcd pointed out. This will repeat etc.

It Makes totally sense not to waste time in supplying update paths between tiny version jumps that are anyway just in rapid change and dev. So we will see over time in which direction this goes and if update paths get released at some point, or at least rugh migration instructions etc. 😉

@matburt
Copy link
Member

matburt commented Feb 16, 2018

I have closed this issue, the original problem might still be there if you interrupt the install when it's performing the first database migration.

@AlanCoding
Copy link
Member

The tower-cli feature to receive/upload data has been submitted:

ansible/tower-cli#479

I mentioned this in the mailing list, but I thought some people might get the notification from this issue.

@MrMEEE
Copy link
Contributor

MrMEEE commented May 1, 2018

As far as I see.. the tower-cli is not really an option (not alone anyways).. as it's missing to many thing... (credentials, logs, group, ldap and more)..

I have noticed that there is:

awx-manage dumpdata
awx-manage loaddata

I can get the dumpdata, to dump the data to a file, but I can't get loaddata to work...

Anyone know how to load the data back in??? and if it can be used when upgrading???

@akcrisp
Copy link

akcrisp commented May 1, 2018 via email

@akcrisp
Copy link

akcrisp commented May 1, 2018 via email

@gregdek
Copy link
Contributor

gregdek commented May 1, 2018

The tower-cli changes are still relatively new, just released a few days ago, so still working on documenting this procedure. It's coming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests