New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rails migrations failing randomly without a proper error message #258

Open
k0nserv opened this Issue Dec 4, 2014 · 13 comments

Comments

Projects
None yet
8 participants
@k0nserv

k0nserv commented Dec 4, 2014

So I ran in to this today after a migration failed in our Rails/MySQL application. It passed on the staging environment which is an exact copy of production.

This thread on the forums talks about the issue. https://forums.aws.amazon.com/message.jspa?messageID=587770#587770

The error that is shown makes no sense and gives no clue to what is going wrong.

================================================================================
Error executing action `deploy` on resource 'deploy[/srv/www/XXX]'
================================================================================


Chef::Exceptions::Exec
----------------------
if [ -f Gemfile ]; then echo 'OpsWorks: Gemfile found - running migration with bundle exec' && /usr/local/bin/bundle exec /usr/local/bin/rake db:migrate; else echo 'OpsWorks: no Gemfile - running plain migrations' && /usr/local/bin/rake db:migrate; fi returned 1, expected 0

Further as I wrote in the thread I was able to get around the issue with the following workaround. which make little sense

Okay even weirder since it was working on our staging environment I decided to try creating a new instance in the production stack and running the migration on that one. Guess what? It worked.

Steps.

  1. Create new instance
  2. Run deploy with migrations on that instance only
  3. Kill the new instance
  4. Run deploy without migrations on the other instance(s)

This is super annoying and extremely flakey. Needs attention and a fix asap.

@svetozar02

This comment has been minimized.

Show comment
Hide comment
@svetozar02

svetozar02 Feb 4, 2015

any progress on this issue?

svetozar02 commented Feb 4, 2015

any progress on this issue?

@k0nserv

This comment has been minimized.

Show comment
Hide comment
@k0nserv

k0nserv Feb 4, 2015

I've found that this happens when the migrations fail. Deploying without migrations and then running the migrations manually will show you the error

k0nserv commented Feb 4, 2015

I've found that this happens when the migrations fail. Deploying without migrations and then running the migrations manually will show you the error

@fred

This comment has been minimized.

Show comment
Hide comment
@fred

fred Mar 16, 2015

I've seen this problem too. Let me ask you, how many instances you have in production? :)
What happened to us is that when we had 8 rails instances in production, running the command for migration happened on all servers at the same time. in which the first server already did the migration, but the others also tried to execute the same migration on a table that was already migrated to the new schema. It works in Staging because you may have 1 instance in staging.
This is an architecture flaw of opsworks, it's not yet mature or intended for large production deployments as of now.
I believe you are aware about the thread in AWS forums, in which no solution has been patched to the code. https://forums.aws.amazon.com/thread.jspa?messageID=606594

The solution is actually quite simple, we do the following deployment strategy, assuming we have 8 app servers:

  1. deploy Rails application code to 7 servers without migration
  2. deploy Rails application code to 1 servers with migration

Both can be done at the same time. This is not ideal for haproxy and in case one fail, but it works for now.

Opsworks could fix it easily as well by changing how the "rake db:migrate" works, all needs to be done is to run the migration "only" on one of the rails app servers, for example, run rake db:migrate only if the server index is zero on the array of instances for the rails layer. i.e. node["opsworks"]["layers"][layer]["instances"]

...should be a few lines fix.

fred commented Mar 16, 2015

I've seen this problem too. Let me ask you, how many instances you have in production? :)
What happened to us is that when we had 8 rails instances in production, running the command for migration happened on all servers at the same time. in which the first server already did the migration, but the others also tried to execute the same migration on a table that was already migrated to the new schema. It works in Staging because you may have 1 instance in staging.
This is an architecture flaw of opsworks, it's not yet mature or intended for large production deployments as of now.
I believe you are aware about the thread in AWS forums, in which no solution has been patched to the code. https://forums.aws.amazon.com/thread.jspa?messageID=606594

The solution is actually quite simple, we do the following deployment strategy, assuming we have 8 app servers:

  1. deploy Rails application code to 7 servers without migration
  2. deploy Rails application code to 1 servers with migration

Both can be done at the same time. This is not ideal for haproxy and in case one fail, but it works for now.

Opsworks could fix it easily as well by changing how the "rake db:migrate" works, all needs to be done is to run the migration "only" on one of the rails app servers, for example, run rake db:migrate only if the server index is zero on the array of instances for the rails layer. i.e. node["opsworks"]["layers"][layer]["instances"]

...should be a few lines fix.

@n1t1nv3rma

This comment has been minimized.

Show comment
Hide comment
@n1t1nv3rma

n1t1nv3rma Apr 1, 2015

I actually agree with K0nserv... this error also happens when the migrations fail due do some other reason. Deploying without migrations and then running the migrations manually will show you the error.

So I tried running the migration manually and noticed the error...

Ex:

su - deploy
$ cd /srv/www/mynextapp/releases/20150401052640
$ /usr/local/bin/bundle exec /usr/local/bin/rake db:migrate
rake aborted!
ExecJS::RuntimeUnavailable: Could not find a JavaScript runtime. See https://github.com/sstephenson/execjs for a list of available runtimes.
....
...

So I added following into Gemfile and ran "/usr/local/bin/bundle update"
gem 'execjs'
gem 'therubyracer', :platforms => :ruby

Which added following gems:
...
Installing libv8 (3.16.14.7)
Installing ref (1.0.5)
Installing therubyracer (0.12.1)
...

After that the error did not appear in next Deployment with migration... so it seems that this error & Chef exception are quite misleading!

Also this did not occur during 'development' environment. Perhaps because the I was using the 'sqllite3' in Dev but 'mysql' in Prod.

n1t1nv3rma commented Apr 1, 2015

I actually agree with K0nserv... this error also happens when the migrations fail due do some other reason. Deploying without migrations and then running the migrations manually will show you the error.

So I tried running the migration manually and noticed the error...

Ex:

su - deploy
$ cd /srv/www/mynextapp/releases/20150401052640
$ /usr/local/bin/bundle exec /usr/local/bin/rake db:migrate
rake aborted!
ExecJS::RuntimeUnavailable: Could not find a JavaScript runtime. See https://github.com/sstephenson/execjs for a list of available runtimes.
....
...

So I added following into Gemfile and ran "/usr/local/bin/bundle update"
gem 'execjs'
gem 'therubyracer', :platforms => :ruby

Which added following gems:
...
Installing libv8 (3.16.14.7)
Installing ref (1.0.5)
Installing therubyracer (0.12.1)
...

After that the error did not appear in next Deployment with migration... so it seems that this error & Chef exception are quite misleading!

Also this did not occur during 'development' environment. Perhaps because the I was using the 'sqllite3' in Dev but 'mysql' in Prod.

@k0nserv

This comment has been minimized.

Show comment
Hide comment
@k0nserv

k0nserv Apr 1, 2015

@fred If I remember correctly Amazon should be picking a single instance for the migration instead of running on all of them according to their docs anyway. And it is doing that in our case.

@n1t1nv3rma I don't think you should have to do add therubyracer manually it should be part of the dependencies of what ever gem needs it. Also just a recommendation I think you should run mysql in both development and prod to avoid issues like these. Check out vagrant for an easier way to set up a local environment that is close to your production servers

k0nserv commented Apr 1, 2015

@fred If I remember correctly Amazon should be picking a single instance for the migration instead of running on all of them according to their docs anyway. And it is doing that in our case.

@n1t1nv3rma I don't think you should have to do add therubyracer manually it should be part of the dependencies of what ever gem needs it. Also just a recommendation I think you should run mysql in both development and prod to avoid issues like these. Check out vagrant for an easier way to set up a local environment that is close to your production servers

@fulvi0

This comment has been minimized.

Show comment
Hide comment
@fulvi0

fulvi0 Oct 26, 2015

I'm trying the AWS Opsworks, where I follow this workshop of Rails conferes 2015, for learn more about this service and how works, so on the way working every step of the workshop I got a error trying to deploy the Rails App - here is the log

[2015-10-25T18:00:44+00:00] ERROR: Running exception handlers
[2015-10-25T18:00:44+00:00] ERROR: Exception handlers complete
[2015-10-25T18:00:44+00:00] FATAL: Stacktrace dumped to /var/lib/aws/opsworks/cache.stage2/chef-stacktrace.out
[2015-10-25T18:00:44+00:00] ERROR: deploy[/srv/www/todoapp] (deploy::rails line 65) had an error: Chef::Exceptions::Exec: if [ -f Gemfile ]; then echo 'OpsWorks: Gemfile found - running migration with bundle exec' && /usr/local/bin/bundle exec /usr/local/bin/rake db:migrate; else echo 'OpsWorks: no Gemfile - running plain migrations' && /usr/local/bin/rake db:migrate; fi returned 1, expected 0
[2015-10-25T18:00:44+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1) 

So i'm looking but i couldn't find anything that help, any idea how can i solve this issue.

Nota: I deployed one time but without the migration, that mean my app doesn't work at all.

fulvi0 commented Oct 26, 2015

I'm trying the AWS Opsworks, where I follow this workshop of Rails conferes 2015, for learn more about this service and how works, so on the way working every step of the workshop I got a error trying to deploy the Rails App - here is the log

[2015-10-25T18:00:44+00:00] ERROR: Running exception handlers
[2015-10-25T18:00:44+00:00] ERROR: Exception handlers complete
[2015-10-25T18:00:44+00:00] FATAL: Stacktrace dumped to /var/lib/aws/opsworks/cache.stage2/chef-stacktrace.out
[2015-10-25T18:00:44+00:00] ERROR: deploy[/srv/www/todoapp] (deploy::rails line 65) had an error: Chef::Exceptions::Exec: if [ -f Gemfile ]; then echo 'OpsWorks: Gemfile found - running migration with bundle exec' && /usr/local/bin/bundle exec /usr/local/bin/rake db:migrate; else echo 'OpsWorks: no Gemfile - running plain migrations' && /usr/local/bin/rake db:migrate; fi returned 1, expected 0
[2015-10-25T18:00:44+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1) 

So i'm looking but i couldn't find anything that help, any idea how can i solve this issue.

Nota: I deployed one time but without the migration, that mean my app doesn't work at all.

@k0nserv

This comment has been minimized.

Show comment
Hide comment
@k0nserv

k0nserv Oct 26, 2015

@fulvi0 you are going to want to SSH one of your instances and run the migrations manually to understand what the error is

k0nserv commented Oct 26, 2015

@fulvi0 you are going to want to SSH one of your instances and run the migrations manually to understand what the error is

@johncblandii

This comment has been minimized.

Show comment
Hide comment
@johncblandii

johncblandii Oct 27, 2015

Yes, SSH and the first thing you should try is accessing the db from the command line. I've hit this for many reasons but the majority are because of a database problem.

You should also run the migration manually to test then as well, prefix with bundle exec.

johncblandii commented Oct 27, 2015

Yes, SSH and the first thing you should try is accessing the db from the command line. I've hit this for many reasons but the majority are because of a database problem.

You should also run the migration manually to test then as well, prefix with bundle exec.

@fulvi0

This comment has been minimized.

Show comment
Hide comment
@fulvi0

fulvi0 Oct 27, 2015

@k0nserv @johncblandii, Thank you, I could find the problem through SSH and I could solve it updating the MySQL gem with

gem 'mysql2', '~> 0.3.18'

That should use the proper for activerecord and then deploy the app.

fulvi0 commented Oct 27, 2015

@k0nserv @johncblandii, Thank you, I could find the problem through SSH and I could solve it updating the MySQL gem with

gem 'mysql2', '~> 0.3.18'

That should use the proper for activerecord and then deploy the app.

@diegodurante

This comment has been minimized.

Show comment
Hide comment
@diegodurante

diegodurante Apr 5, 2016

I have experienced the same problem using Postgresql and the problem happen both in production (where I have 3 running instances) and in staging (just one instance).

Sometimes seems that the deploy with migrations fails with this error:

[2016-04-05T10:44:10+02:00] INFO: Running queued delayed notifications before re-raising exception
[2016-04-05T10:44:10+02:00] ERROR: Running exception handlers
[2016-04-05T10:44:10+02:00] ERROR: Exception handlers complete
[2016-04-05T10:44:10+02:00] FATAL: Stacktrace dumped to /var/lib/aws/opsworks/cache.stage2/chef-stacktrace.out
[2016-04-05T10:44:10+02:00] ERROR: deploy[/srv/www/app] (deploy::rails line 65) had an error: Chef::Exceptions::Exec: if [ -f Gemfile ]; then echo 'OpsWorks: Gemfile found - running migration with bundle exec' && /usr/local/bin/bundle exec /usr/local/bin/rake db:migrate; else echo 'OpsWorks: no Gemfile - running plain migrations' && /usr/local/bin/rake db:migrate; fi returned 1, expected 0
[2016-04-05T10:44:10+02:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)

but the migrations were performed! Because if I log via ssh inside my instance and I check the database I can see the changes.

My stack:

  • Ruby 2.1.8
  • Rails 4.2.3
  • PostgreSQL 9.5.1
  • Rails pg gem: 0.18.4
  • OpsWorks Agent: 3433
  • OS: Ubuntu 14.04 LTS

Has someone any news?

diegodurante commented Apr 5, 2016

I have experienced the same problem using Postgresql and the problem happen both in production (where I have 3 running instances) and in staging (just one instance).

Sometimes seems that the deploy with migrations fails with this error:

[2016-04-05T10:44:10+02:00] INFO: Running queued delayed notifications before re-raising exception
[2016-04-05T10:44:10+02:00] ERROR: Running exception handlers
[2016-04-05T10:44:10+02:00] ERROR: Exception handlers complete
[2016-04-05T10:44:10+02:00] FATAL: Stacktrace dumped to /var/lib/aws/opsworks/cache.stage2/chef-stacktrace.out
[2016-04-05T10:44:10+02:00] ERROR: deploy[/srv/www/app] (deploy::rails line 65) had an error: Chef::Exceptions::Exec: if [ -f Gemfile ]; then echo 'OpsWorks: Gemfile found - running migration with bundle exec' && /usr/local/bin/bundle exec /usr/local/bin/rake db:migrate; else echo 'OpsWorks: no Gemfile - running plain migrations' && /usr/local/bin/rake db:migrate; fi returned 1, expected 0
[2016-04-05T10:44:10+02:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)

but the migrations were performed! Because if I log via ssh inside my instance and I check the database I can see the changes.

My stack:

  • Ruby 2.1.8
  • Rails 4.2.3
  • PostgreSQL 9.5.1
  • Rails pg gem: 0.18.4
  • OpsWorks Agent: 3433
  • OS: Ubuntu 14.04 LTS

Has someone any news?

@diegodurante

This comment has been minimized.

Show comment
Hide comment
@diegodurante

diegodurante Apr 5, 2016

Hi all, maybe I found the problem, at least it worked for me.

The reason why the migration fail can depend for many reason and the OpsWorks logs can't help us.
So as many of you suggested, to understand exactly what is going on, we have to login via SSH to one of our instance and try to run the command that cause the failure.

Login via ssh and then:

su - deploy
$ cd /srv/www/mynextapp/releases/20150401052640
$ /usr/local/bin/bundle exec /usr/local/bin/rake db:migrate 

Now this command should fail, showing us why.
@n1t1nv3rma got ExecJS::RuntimeUnavailable: Could not find a JavaScript runtime error that he fixed in this way.

For me the error was:

/usr/lib/postgresql/9.5/bin/pg_dump: invalid option -- 'i'
Try "pg_dump --help" for more information.
rake aborted!
Error dumping database

And I fixed it just updating to Rails 4.2.6 (before was 4.2.3).
Look here for more details: https://gist.github.com/nruth/a3bc1b75281109b036e4

Hope it helps.

diegodurante commented Apr 5, 2016

Hi all, maybe I found the problem, at least it worked for me.

The reason why the migration fail can depend for many reason and the OpsWorks logs can't help us.
So as many of you suggested, to understand exactly what is going on, we have to login via SSH to one of our instance and try to run the command that cause the failure.

Login via ssh and then:

su - deploy
$ cd /srv/www/mynextapp/releases/20150401052640
$ /usr/local/bin/bundle exec /usr/local/bin/rake db:migrate 

Now this command should fail, showing us why.
@n1t1nv3rma got ExecJS::RuntimeUnavailable: Could not find a JavaScript runtime error that he fixed in this way.

For me the error was:

/usr/lib/postgresql/9.5/bin/pg_dump: invalid option -- 'i'
Try "pg_dump --help" for more information.
rake aborted!
Error dumping database

And I fixed it just updating to Rails 4.2.6 (before was 4.2.3).
Look here for more details: https://gist.github.com/nruth/a3bc1b75281109b036e4

Hope it helps.

@andrewhood125

This comment has been minimized.

Show comment
Hide comment
@andrewhood125

andrewhood125 Jul 20, 2017

This is still an issue.

Ran /usr/local/bin/bundle exec /usr/local/bin/rake db:migrate manually and it exited with 0. I deployed via opsworks without migrations and it succeeded.

andrewhood125 commented Jul 20, 2017

This is still an issue.

Ran /usr/local/bin/bundle exec /usr/local/bin/rake db:migrate manually and it exited with 0. I deployed via opsworks without migrations and it succeeded.

@andrewhood125

This comment has been minimized.

Show comment
Hide comment
@andrewhood125

andrewhood125 Jul 25, 2017

Turns out the app wouldn't boot. When I ran rake db:migrate current was symlinked to a different release than chef was running on. How we debugged this was ran a deploy without migrations, it succeeded. Then went to the current release on the server and ran rake db:migrate in our case it was a bad middleware preventing the app from booting.

andrewhood125 commented Jul 25, 2017

Turns out the app wouldn't boot. When I ran rake db:migrate current was symlinked to a different release than chef was running on. How we debugged this was ran a deploy without migrations, it succeeded. Then went to the current release on the server and ran rake db:migrate in our case it was a bad middleware preventing the app from booting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment