Skip to content

Troubleshooting

Ilya Sytchev edited this page Feb 23, 2018 · 30 revisions
  1. General
  2. Pip install fails on Mac OS X 10.10
  3. PhantomJS fails
  4. Select.2 is missing
  5. Empty facets
  6. WorkerLost where are you and how can I get rid of pending tasks?
  7. Provisioning
  8. Improperly configured
  9. Virtual machine
  10. Are all services running?
  11. Supervisorctl can't connect
  12. Supervisor fails starting and stoping any service
  13. PostgreSQL server crash
  14. Re-associate a lost VM with Vagrant
  15. VirtualBox can't mount shared folders
  16. Can't Generate ToolDefinitions

General

Pip install fails on Mac OS X 10.10 because of missing ffi.h

When PIP fails installing dependencies because of fatal error: 'ffi.h' file not found. Check if Xcode's developer tools are properly installed.

$ xcode-select --install

Credits: http://stackoverflow.com/a/30453414/981933

PhantomJS fails

If PhantomJS fails to starts for whatever reason, it is most likely an issue of who called grunt. In general grunt is called by the VM and in turn downloads and triggers the compilation of PhantomJS. If you try to call PhantomJS from outside of the VM using a different OS, e.g. Mac OS X, the PhantomJS binary won't work. To get around this issue you can pass --host to grunt, which will tell karma to try to use a global installation, i.e. you should have a global phantomjs binary in your path. Note if you pass --host and have no global phantomjs binary Grunt will complain.

Select.2 is missing

If you see an error saying that select.2 is missing than you need to run re-compile and re-build your UI. The Bower package of select.2 doesn't come with a minified version, so Grunt will need to minify the source manually. Run:

grunt make

Empty facets

If you see unrelated and empty facets and you've run vagrant destroy at some point then you need to rebuild Solr's node index:

./manage.py rebuild_index --using=data_set_manager --batch-size=25

Can't search for Datasets

If you are unable to search for datasets on the dashboard then you need to rebuild Solr's node index: https://github.com/parklab/refinery-platform/wiki/Solr-development#updating-indexes

WorkerLost where are you and how can I get rid of pending tasks?

If a worker is lost

WorkerLostError: Worker exited prematurely.

but the VM keeps running hot then Celery is probably still working hard and will keep working hard even after the VM has been restarted. In order to clear pending tasks log into the VM's Django shall ./manage.py shell and execute the following:

from celery.task.control import discard_all
discard_all()

Make sure that Celery is running at execution time. Afterwards you can restart Celery supervisorctl shutdown && supervisord and things should work smooth again.

Provisioning

Improperly configured config.json

If you see an error saying something like:

ImproperlyConfigured: Missing setting 'VARIABLE_NAME'

that means you're trying to update an existing VM but haven't compared config.json with config.json.erb. Every now and then settings change, get added or removed. For example, the phone number of our instant pizza delivery service might have changed, the service went bankrupt or we switched to an instant green salad delivery service because salad is more healthy.

The Vagrant provision scripts will not overwrite a config.json that already exists because it might have been customized. Instead you should manually compare both files and make adjustments accordingly. The config.json.erb file is a JSON file that is templated (so that various settings can different on Vagrant versus AWS). The command line tool erb can be used to render the template into a proper JSON file if you want to use that for comparison: erb config.json.erb > config.json.sample.

fab vm update fails

You may need to start ssh-agent:

$ eval `ssh-agent`
$ ssh-add ~/.ssh/id_rsa

Virtual machine

Are all services running?

Check status of services:

supervisorctl status

Restart all services at once:

supervisorctl restart all

Supervisorctl can't connect

Start supervisord:

supervisord

Supervisor fails starting and stoping any service

Sometimes supervisor cannot get any service to run or to stop. E.g. you will see an error like:

$ supervisorctl status
celerybeat                       FATAL     Exited too quickly (process log may have details)
celerycam                        FATAL     Exited too quickly (process log may have details)
celeryd:worker1                  FATAL     Exited too quickly (process log may have details)
celeryd:worker2                  FATAL     Exited too quickly (process log may have details)
runserver                        FATAL     Exited too quickly (process log may have details)
$ supervisorctl restart all
celerycam: ERROR (abnormal termination)
celeryd:worker1: ERROR (abnormal termination)
celerybeat: ERROR (abnormal termination)
runserver: ERROR (abnormal termination)
celeryd:worker2: ERROR (abnormal termination)

Try to restart supervisor:

$ supervisorctl shutdown
$ supervisord

PostgreSQL server crash

OperationalError at /
could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

Celery leaks memory when DEBUG=True in Django settings, so if PostgreSQL server crashes or other services fail to start, check memory use with free -m and if the amount of free memory is below 250M, do supervisorctl restart celerycam from the Refinery virtualenv and restart the PostgreSQL server with sudo /usr/sbin/service postgresql start.

Re-associate a lost VM with Vagrant

After an unsuccessful shutdown it might be possible that Vagrant lost the association to the actual VM. This is the case if Vagrant suddenly starts to provision a fresh VM even though you have already set-up Refinery. To re-associate the lost VM either destroy and re-provision the VM or follow these steps:

  1. Go to Refinery's root

    $ workon refinery-deployment
    
  2. List all VMs

    $ VBoxManage list vms
    

    You will see something like this

    "refinery_default_1431955986189_84447" {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}
    
  3. Copy the ID of your original VM, i.e. xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.

  4. Go to

    $ cd ./.vagrant/machines/default/virtualbox
    
  5. Create or edit the file id

    $ nano ./id
    
  6. Insert your VM's ID from step 3

  7. Save the file.

  8. Go back to the root and start Vagrant

    $ vagrant up
    

If you see Warning: Authentication failure. Retrying... it is most likely that the VM's and Vagrant's public / private keys are out of sync. To sync them manually follow Vagrant ssh authentication failure

Vagrant ssh authentication failure

Since Vagrant v1.7 keys are re-generated at every log out. When your VM hasn't been shut down properly, it can happen that Vagrant's and the VM's private / public key are out of sync. To manually sync them again follow these steps:

  1. Generate a public key from your Vagrant's private key. (Make sure to be in Refinery's root directory)

    ssh-keygen -f ./.vagrant/machines/default/virtualbox/private_key -y
    

    This will generate something like this:

    ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDRwjUEWSbQoe6FWOeaAuFQ0FmxDJRU+L+JdeT5rGS09U4KGnZTARXKmkZWFuc8C3UZt9cpU7ydt0sSMvYviyL9BFtCALNqgf7xw7G9wyfBTwkwIhZ4oQLzzfCo/zKPUoJ2MUaK9tAYBy+TTFjertqkXDwUibNtuK4+tJ223pHRDtuaA2hUdY90NKiBr0VGp+sU+MxHCIJ4FnN7lgm42Jt/sJo0H8J6VCUr1SETrPpvAvs1Y/npoV6b6ksWSakvT3KAxLZdqYBluCRS5zSEd7Oi8wf14MqrZ9pIVq5X5d5LZQCtvHDHOPf1hnumZAiFWHgQpOSbJJmRaRrXxOGa897/
    
  2. Copy everything except ssh-rsa

  3. Manually ssh into Refinery with password vagrant

    vagrant ssh
    
  4. Open authorized_keys

    nano ~/.ssh/authorized_keys
    
  5. Replace the existing key with the one from step 2. Be sure to not remove vagrant from the end of the line.

  6. Leave the VM and try ssh-ing again. It should work again.

    vagrant ssh
    

Time in the Vagrant VM is drifting out of sync

Install VirtualBox Guest Additions and run the following inside the VM:

sudo /usr/sbin/VBoxService --timesync-set-start

Remount shared folders

Sometimes we've seen Vagrant report that it can't mount VirtualBox Shared Folders.

Vagrant was unable to mount VirtualBox shared folders. This is usually
because the filesystem "vboxsf" is not available. This filesystem is
made available via the VirtualBox Guest Additions and kernel module.
Please verify that these guest additions are properly installed in the
guest. This is not a bug in Vagrant and is usually caused by a faulty
Vagrant box. For context, the command attemped was:

set -e
mount -t vboxsf -o uid=`id -u vagrant`,gid=`getent group vagrant | cut -d: -f3` vagrant /vagrant
mount -t vboxsf -o uid=`id -u vagrant`,gid=`id -g vagrant` vagrant /vagrant

The error output from the command was:

: No such device
  • Solution: run this in your VM and reload: sudo ln -sf /opt/VBoxGuestAdditions-5.1.20/lib/VBoxGuestAdditions/mount.vboxsf /sbin/mount.vboxsf

Note: You may have a different version of VBoxGuestAdditions if you are viewing this in the future. please adjust the prior command accordingly

Generate ToolDefinitions

If you run into an error like: "There is a known error when trying to import a Galaxy Workflow from a file that utilizes asterisked workflow_outputs" you can resolve it by performing the following steps In your Galaxy Workflow Editior:

  • Unselect any asterisked Workflow outputs, and save your Workflow
  • Reselect the same Workflow outputs, and save your workflow again
  • re-run ./manage.py load_tools --workflows

PyCharm

Problem: git push failed with java.io.IOException: Cannot negotiate, proposals do not match (Solution)[https://stackoverflow.com/a/31238453]: Settings-->Version Control-->Git then in the SSH executable dropdown, choose Native

Clone this wiki locally