Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce artimed extras #172

Merged
merged 27 commits into from
Jul 7, 2016
Merged

Reduce artimed extras #172

merged 27 commits into from
Jul 7, 2016

Conversation

mvdbeek
Copy link
Collaborator

@mvdbeek mvdbeek commented Jun 15, 2016

Summary of changes:

  • Use startup.sh script for launching ansible and starting up supervisor
  • Strip artimed_extras to the data manager functionality only
  • Simplify get_tool_list_from_galaxy.py script (no need for admin api key anymore)

@drosofff
Copy link
Member

We should rename the artimed_extras role to galaxykickstart

@mvdbeek
Copy link
Collaborator Author

mvdbeek commented Jun 15, 2016

We should rename the artimed_extras role to galaxykickstart

I think we should remove it completely and move the data managers part to a new data managers role (If we decide we want to keep this functionality)

@drosofff
Copy link
Member

I am ok with complete removing and a data_managers role.

@mvdbeek mvdbeek force-pushed the reduce_artimed_extras branch 3 times, most recently from 1e144ff to 8b9e6b6 Compare June 17, 2016 11:36
@mvdbeek
Copy link
Collaborator Author

mvdbeek commented Jun 17, 2016

I propose to create a script folder at the root, with the install_tool_shed_tools.py, generate_tool_list_from_ga_workflow_files.py and other scripts to come (I will work at a script that creates a group_vars, inventory, etc, from a workflow and/or a tool list

Can you outline this in an issue? I am looking at this as well.

therefore move the action up from roles/galaxy.movedata/tasks/import.yml to
roles/set_supervisor_env_vars/tasks/main.yml
This startup script takes as argument the inventory to be used and passes
the process on to supervisor as PID1 (which allows graceful stopping of processes).
to work without admin api key for galaxy newer than 16.01.
so that an empty list simply causes skipping of the task.
and rename artimed_extras to data_managers.
update galaxy role and switch back to galaxyproject galaxy-extras role
galaxy_tools_admin_user_password to group_vars
@mvdbeek
Copy link
Collaborator Author

mvdbeek commented Jun 18, 2016

@drosofff this is ready for review!

@@ -1,9 +1,9 @@
[submodule "roles/galaxyprojectdotorg.galaxy-tools"]
path = roles/galaxyprojectdotorg.galaxy-tools
url = https://github.com/galaxyproject/ansible-galaxy-tools.git
url = https://github.com/mvdbeek/ansible-galaxy-tools.git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ansible-galaxy-tools is forked in ARTbio repo. Why not relying on your fork (if rolling back the changeset revision to the one you are providing is required, it is not a problem to me)

Copy link
Collaborator Author

@mvdbeek mvdbeek Jun 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we're automatically synchronizing this from upstream in a crontab, I'm afraid we may break that synchronization. The changes are already in galaxyproject/ansible-galaxy-tools#31, so as soon as that get's merged in we will have it in the artbio fork as well. (Which I consider to be a backup as discussed in #62)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. when you say in #62

I put this script in my crontab:
https://gist.github.com/mvdbeek/37b77326e6921f963993

with this we are updating our forks every 2 hours.

where is running your crontab ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My imac in the lab.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I am sure we can do better for the sustainability of the synchonization

@drosofff
Copy link
Member

OK, look nice 👍
I would like to run a couple of installations with the reorganized roles before merging

@drosofff
Copy link
Member

I have this error with the ansible-playbook run on the branch

TASK [galaxyprojectdotorg.galaxy-tools : Install Tool Shed tools] **************
failed: [localhost] => (item=extra-files/artimed/artimed_tool_list.yml) => {"changed": true, "cmd": ["/tmp/venv/bin/python", "install_tool_shed_tools.py", "-t", "artimed_tool_list.yml", "-a", "admin", "-g", "localhost"], "delta": "0:00:00.182402", "end": "2016-06-19 14:49:10.523772", "failed": true, "item": "extra-files/artimed/artimed_tool_list.yml", "rc": 1, "start": "2016-06-19 14:49:10.341370", "stderr": "Traceback (most recent call last):\n  File \"install_tool_shed_tools.py\", line 590, in <module>\n    install_tools(options)\n  File \"install_tool_shed_tools.py\", line 471, in install_tools\n    itl = installed_tool_revisions(gi)  # installed tools list\n  File \"install_tool_shed_tools.py\", line 170, in installed_tool_revisions\n    itl = tsc.get_repositories()\n  File \"/tmp/venv/local/lib/python2.7/site-packages/bioblend/galaxy/toolshed/__init__.py\", line 36, in get_repositories\n    return Client._get(self)\n  File \"/tmp/venv/local/lib/python2.7/site-packages/bioblend/galaxy/client.py\", line 147, in _get\n    raise ConnectionError(msg)\nbioblend.galaxy.client.ConnectionError: GET: error 403: '{\"err_msg\": \"Provided API key is not valid.\", \"err_code\": 403001}', 0 attempts left: None", "stdout": "", "stdout_lines": [], "warnings": []}

Apparently an API key issue.

However I had to sync / update the submodules that have been changed in the branch and do not guarantee that this is not the problem: from my git session:

From https://github.com/mvdbeek/ansible-galaxy-tools
 * [new branch]      install_individual_tools -> origin/install_individual_tools
 + 036bb22...c259caa master     -> origin/master  (forced update)
 * [new branch]      predefined_api_key -> origin/predefined_api_key
 * [new branch]      timeout    -> origin/timeout
Submodule path 'roles/galaxyprojectdotorg.galaxy-tools': checked out 'c259caa75621dadea8280dfa7d06db9df1c122bd'

@mvdbeek
Copy link
Collaborator Author

mvdbeek commented Jun 19, 2016

Submodule path 'roles/galaxyprojectdotorg.galaxy-tools': checked out 'c259caa75621dadea8280dfa7d06db9df1c122bd'

Yep, this should be 188e7cd136052f1e00efa3d19ffbcd9fe8f29dd5. In the ansible-artimed repo try a git submodule sync && git submodule update.

@mvdbeek
Copy link
Collaborator Author

mvdbeek commented Jun 19, 2016

@mvdbeek please test vagrant up, too.

Works for me, but I ran this on a new machine. I suspect the problem comes from updating docker.

"rmtree failed: [Errno 16] Device or resource busy: '/var/lib/docker/devicemapper'"}
    to retry, use: --limit @galaxy.retry

This is a lockup, can you do a supervisorctl stop docker and see if it passes through?

@drosofff
Copy link
Member

TASK [data_managers : Run data managers] ***************************************
failed: [localhost] => (item=extra-files/artimed/artimed_data_manager_tasks.yml) => {"changed": true, "cmd": ["/tmp/venv/bin/python", "install_tool_shed_tools.py", "-d", "extra-files/artimed/artimed_data_manager_tasks.yml", "-a", "admin", "-g", "localhost"], "delta": "0:00:00.116701", "end": "2016-06-19 19:10:39.312228", "failed": true, "item": "extra-files/artimed/artimed_data_manager_tasks.yml", "rc": 1, "start": "2016-06-19 19:10:39.195527", "stderr": "Traceback (most recent call last):\n  File \"install_tool_shed_tools.py\", line 654, in <module>\n    run_data_managers(options)\n  File \"install_tool_shed_tools.py\", line 415, in run_data_managers\n    kl = load_input_file(dbkeys_list_file)  # Input file contents\n  File \"install_tool_shed_tools.py\", line 126, in load_input_file\n    with open(tool_list_file, 'r') as f:\nIOError: [Errno 2] No such file or directory: 'extra-files/artimed/artimed_data_manager_tasks.yml'", "stdout": "", "stdout_lines": [], "warnings": []}

RUNNING HANDLER [galaxyprojectdotorg.galaxy : restart galaxy] ******************

RUNNING HANDLER [galaxyprojectdotorg.galaxy : email administrator with changeset id] ***

PLAY RECAP *********************************************************************
localhost                  : ok=156  changed=92   unreachable=0    failed=1

@mvdbeek
Copy link
Collaborator Author

mvdbeek commented Jun 20, 2016

@drosofff
This should work now (and has been broken ever since we moved these to extra-files, if this has ever worked). The reason we notice this now is that I have removed all these run_* (like run_data_managers) variables.

@drosofff
Copy link
Member

It is not:

commit 5822fbc5a645bb0a4310d1051c540955b1dd29e8
Author: Marius van den Beek <m.vandenbeek@gmail.com>
Date:   Mon Jun 20 09:17:31 2016 +0200

    When copying task lists, only pass basename to install_tool_shed_tools.py

ansible-playbook -i inventory_files/artimed galaxy.yml on a fresh IFB instance

TASK [data_managers : Remove data manager task file] ***************************
failed: [localhost] => (item=extra-files/artimed/artimed_data_manager_tasks.yml) => {"failed": true, "item": "extra-files/artimed/artimed_data_manager_tasks.yml", "msg": "rmtree failed: [Errno 2] No such file or directory: '/tmp/ccRRi7Mf.s'"}

RUNNING HANDLER [galaxyprojectdotorg.galaxy : restart galaxy] ******************

RUNNING HANDLER [galaxyprojectdotorg.galaxy : email administrator with changeset id] ***

PLAY RECAP *********************************************************************
localhost                  : ok=158  changed=94   unreachable=0    failed=1

Please test your commits yourself cause I have no time to do it anymore this week

@mvdbeek
Copy link
Collaborator Author

mvdbeek commented Jun 20, 2016

Please test your commits yourself cause I have no time to do it anymore this week

I have tested this in vagrant, where it works. If you don't have time to test it this week then don't test it, I will move forward.

@drosofff
Copy link
Member

drosofff commented Jul 1, 2016

Coming back to this PR after numerous testing and restesting in vagrant, IFB cloud, AWS cloud...

first issue

How this branch currently diverge from the gcc2016 branch ? are the modifications in Vagrantfile the only changes ? If yes, the question is shall we merge this branch, or the gcc2016 ?

second issue

I think that the role ansible-galaxy-tools/tasks/main.yml should contain additional code such as

- include: restart_galaxy.yml
  when: galaxy_tools_install_tools  # this condition is even optional in my opinion

otherwise, the new playbook implies that you have to restart manually Galaxy which is a regression from the current master. I understand that this comes from a notify statement from another role, whose log should be also removed if we restart Galaxy in our playbook.
I have tested this additional code and it seems to work.

Third (most important) issue

There is a complex issue (at least for me) with the /tmp directory. Probably with the rights of /tmp but could be also its deletion... or its non-deletion.
The facts are that this /tmp is implied in various errors when you play or replay the playbook with different inventory files.

Here is an exemple:

TASK [galaxyprojectdotorg.galaxy-tools : Create Galaxy bootstrap user] *********
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["/home/galaxy/galaxy/.venv/bin/python", "manage_bootstrap_user.py", "-c", "/home/galaxy/galaxy/config/galaxy.ini", "create", "-e", "admin@galaxy.org", "-u", "cloud", "-p", "admin", "-a", "admin"], "delta": "0:00:02.506624", "end": "2016-06-29 16:51:12.106424", "failed": true, "rc": 1, "start": "2016-06-29 16:51:09.599800", "stderr": "Traceback (most recent call last):\n  File \"manage_bootstrap_user.py\", line 230, in <module>\n    log = _setup_global_logger()\n  File \"manage_bootstrap_user.py\", line 86, in _setup_global_logger\n    file_handler = logging.FileHandler('/tmp/galaxy_tools_bootstrap_user.log')\n  File \"/usr/lib/python2.7/logging/__init__.py\", line 903, in __init__\n    StreamHandler.__init__(self, self._open())\n  File \"/usr/lib/python2.7/logging/__init__.py\", line 928, in _open\n    stream = open(self.baseFilename, self.mode)\nIOError: [Errno 13] Permission denied: '/tmp/galaxy_tools_bootstrap_user.log'", "stdout": "", "stdout_lines": [], "warnings": []}
    to retry, use: --limit @galaxy.retry

PLAY RECAP *********************************************************************
localhost                  : ok=120  changed=18   unreachable=0    failed=1

But it can also happen that the restarting of supervisorctl (galaxy:uwsgi) fails due to the absence of this /tmp (probably deleted in a previous playbook round). You can restart just by mkdir /tmp && chmod 777 /tmp

And last, but not least, I finally figured out why the installation of deseq2 package systematically fails with the new simplified playbook:
This is precisely the absence of the /tmp and/or too restricted access rights if it already exists.
From the galaxy admin panel, the repair of this tool (which include reinstalling libxml) won't work until you manually mkdir /tmp && chmod 777 /tmp

In summary, the behavior of the /tmp file along the playbook run is not clear to me because I understand that it can be manipulated by several submodules, including our galaxy-tools submodule. But I feel this important /tmp feature is still a bit floppy (not well automated yet).

4th issue

the data_managers role is not crystal clear too me. Is it really an important feature or just a rest from the previous playbook ?


Finally, I would really like much to merge this PR (or a PR from gcc2016 if equivalent) with the master, to move forward. But avoiding regression in the automation. As Bjorn said, we are working for usability not for geeks.

file: dest={{ galaxy_tools_base_dir }}/install_tool_shed_tools.py state=absent

- name: Remove data manager task file
file: src={{ item }} dest={{ galaxy_tools_base_dir }}/ state=absent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be

  file: dest={{ galaxy_tools_base_dir }}/{{ item|basename }} state=absent

otherwise, /tmp is deleted and replaying the playbook fails

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this said, the data_managers role seems obsolete.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, I agree.

@mvdbeek
Copy link
Collaborator Author

mvdbeek commented Jul 4, 2016

first issue

How this branch currently diverge from the gcc2016 branch ? are the modifications in Vagrantfile the only changes ? If yes, the question is shall we merge this branch, or the gcc2016 ?

Yes, those are the only changes ... I wanted to demo ansible without the automatic provisioning that vagrant up does. I would prefer to merge the gcc2016 branch, but ultimately i don't think this is important.

second issue

I think that the role ansible-galaxy-tools/tasks/main.yml should contain additional code such as

  • include: restart_galaxy.yml
    when: galaxy_tools_install_tools # this condition is even optional in my opinion
    otherwise, the new playbook implies that you have to restart manually Galaxy which is a regression from the current master. I understand that this comes from a notify statement from another role, whose log should be also removed if we restart Galaxy in our playbook.
    I have tested this additional code and it seems to work.

I am intentionally removing these things, as they should be done once and once only when the play has finished, from inside the play, not the role (The role should only notify of a necessary restart, while the play implements the restart. I'll add this before merging the PR). All this restarting unnecessarily slows down the playbook, which in return limits the amount of testing we can do in travis.

For the third issue, it comes down to #172 (comment) , which should solve most of these problems. The undelrying problem is that the tool installation script is copied and removed, which doesn't really make sense. It should become part of ephemeris, and then we just install ephemeris.

@@ -1,6 +1,9 @@
- name: start galaxy
supervisorctl: name='galaxy:' state=started

- name: restart galaxy
supervisorctl: name='galaxy:' state=restarted

- name: restart galaxy handler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is never just "start" ?

@mvdbeek mvdbeek force-pushed the reduce_artimed_extras branch 2 times, most recently from 38e21e7 to d3cd54b Compare July 5, 2016 18:29
@@ -1 +1 @@
Subproject commit fda11a2e1fe72c5a425079fcf50369f318287217
Subproject commit 48750d41f75de22dbd4059ae72fee2e7f6f4673e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not this updated to 34ec0ce959c8b4a2f95c1c7f19104cffbd73696e ?

@drosofff drosofff merged commit f95df15 into master Jul 7, 2016
@drosofff drosofff deleted the reduce_artimed_extras branch June 8, 2017 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants