Add support for bioconda picard installation. #22

karl616 · 2018-07-27T14:30:33Z

rely on that the bioconda wrapper is a symbolic link to the
folder containing picard.jar

- rely on that the bioconda wrapper is a symbolic link to the folder containing picard.jar

leepc12 · 2018-07-27T23:04:11Z

Thanks for your PR. We will test it and will let you know.

BTW there are some issues about readlink -f on MacOS but there is a workaround.
Example:

#DEST_DIR=$(readlink -f $2)
DEST_DIR=$(cd $(dirname $2) && pwd -P)/$(basename $2)

karl616 · 2018-07-28T06:32:41Z

Happy if I can contribute. I didn't know about the MacOS behavior. Would it be OK to shift this functionality to python instead? That way, it would be easier to avoid os differences like this

…

On Sat, Jul 28, 2018 at 1:04 AM Jin Lee ***@***.***> wrote: Thanks for your PR. We will test it and will let you know. BTW there are some issues about readlink -f on MacOS but there is a workaround. Example: #DEST_DIR=$(readlink -f $2) DEST_DIR=$(cd $(dirname $2) && pwd -P)/$(basename $2) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#22 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADsaNejXmB3nPLcn0sCqwVCW_bkYsdugks5uK5xsgaJpZM4VjmVg> .

…r bioconda installations

karl616 · 2018-07-30T09:06:28Z

I shifted it all into python.
For testing, what do you normally do? Is the test example sufficient?

karl616 · 2018-08-01T10:28:06Z

Hi @leepc12,
I'm setting up a small test set based on ENCSR356KRQ. That way I can at least make sure that the code I generate executes locally.

I'm a bit confused with regard to phantompeakqualtools; In the documentation on phantompeakqualtools the statement is that v1.14 is required and you also provide the custom archive.

Here you are using spp v1.13. At least it looks that way in the environment file and the custom installation of spp is disabled in the installation script.

==> two questions:

What is changed with the custom v1.14? Can we use the stock version?
What version of spp is needed by phantompeakqualtools? v1.14 or v1.13?

karl616 · 2018-08-01T15:29:56Z

I understood one reason for the custom conda installation today... macs2 requires python2 and idr requires python3. One solution would be to have multiple environments and then load them on demand. I came up with one design, but this being my first try with WDL I would appreciate feedback before I start testing:

task trim_adapter {
        ....
        #environment
        Boolean? useConda

        command {
                if [[ ${select_first([useConda,false])} == "true" ]]; then
                        source activate encode-atac-seq-pipeline-python3
                fi
                python $(which encode_trim_adapter.py) \
                        ${write_tsv(fastqs)} \
                        --adapters ${write_tsv(adapters)} \
                        ${if paired_end then "--paired-end" else ""} \
                        ${if select_first([auto_detect_adapter,false]) then "--auto-detect-adapter" else ""} \
                        ${"--min-trim-len " + select_first([min_trim_len,5])} \
                        ${"--err-rate " + select_first([err_rate,'0.1'])} \
                        ${"--nth " + select_first([cpu,2])}
                if [[ ${select_first([useConda,false])} == "true" ]]; then
                        source deactivate
                fi

        }
        ....
}

Alternatively, it could be made part of the python script. I'm not sure what is best or if it is a viable solution at all. What do you think?

vsoch · 2018-08-01T16:08:30Z

Just a heads up - Gael announced that scikit learn is dropping support for Python 2 - see the red box here:

http://gael-varoquaux.info/programming/sprint-on-scikit-learn-in-paris-and-austin.html

And it's actually part of a larger effort - if you click the "other projects" link there:

http://python3statement.org/

This likely doesn't require immediate action, but I would suggest a general mindset of development toward python 3 and sunsetting Python 2, if at all possible. If macs2 requires python2, then macs2 should be updated for Python 3, and not the other way around.

karl616 · 2018-08-01T17:57:43Z

Me coming from the outside, completely seconds this. There are efforts converting MACS2 to python3: macs3-project/MACS#179 It's not within my capacity to quicken that transition, but I should be able to split up and debug two or more separate environments. That would at least highlight some of the software lagging behind.

…

On Wed, Aug 1, 2018 at 6:08 PM Vanessa Sochat ***@***.***> wrote: Just a heads up - Gael announced that scikit learn is dropping support for Python 2 - see the red box here: http://gael-varoquaux.info/programming/sprint-on-scikit-learn-in-paris-and-austin.html [image: image] <https://user-images.githubusercontent.com/814322/43532527-2d08a8ea-9567-11e8-8213-5be51be5c1c2.png> And it's actually part of a larger effort - if you click the "other projects" link there: http://python3statement.org/ This likely doesn't require immediate action, but I would suggest a general mindset of development toward python 3 and sunsetting Python 2, if at all possible. If macs2 requires python2, then macs2 should be updated for Python 3, and not the other way around. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#22 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADsaNZojPlUT3c2VEj2w8ez3Y449cIoxks5uMdKDgaJpZM4VjmVg> .

leepc12 · 2018-08-01T20:49:51Z

@karl616: spp 1.14 was released to fix compilation issues for R (>=3.3.0) but our pipeline uses R 3.2.x. I just wanted to keep the latest spp (currently 1.15 is the latest though) in the phantompeakquals repo when I migrated it from Google code. Both 1.13 and 1.14 produce the same output for our pipeline and 1.14 was harder to install, it needed more dependencies to be installed so I chose 1.13 for the pipeline.

leepc12 · 2018-08-01T20:56:45Z

About python versions, the old pipeline has two separate environments for py2 and py3 just like you implemented above. It activates py3 env only for IDR tasks.

But for this new pipeline, I didn't want to have any Conda-specific things in the WDL file and input JSON. There is a workaround to use both py2 and py3 at the same time. One major drawback of this hack is that users need to remove their own py2-based Conda from their PATH and install a new py3-based Conda for the pipeline. All py3-based apps like IDR should be installed in the root environment and called explicitly with python3 app.py.

I hate this workaround too. We will eventually remove all py2 dependencies and move on to py3. MACS2 is the only problem.

vsoch · 2018-08-01T22:39:05Z

If you could use a container, you don't need to worry about these things.

karl616 · 2018-08-02T09:26:26Z

@vsoch containers are very convenient. WDL has the runtime docker parameter as well. Still, I'm one of the sometimes underprivileged users (no root access and restrictive admins) why I would want a pure conda solution.

@leepc12 I understand the problem now. Thanks for the explanation. It would be nice if WDL had built in conda support. I know nextflow is going that way.

So no conda in the pipeline... Perhaps if IDR was moved from the root to it's own environment and then the application of the same linking trick it would be possible to avoid requiring a clean conda installation. I will try that.

karl616 · 2018-08-02T16:06:39Z

My initial test on a small subset was positive. The requirements.txt file anticipates that phantompeakqualtools will be accepted into bioconda, but beyond that, and some extensive clean-up I find this installation process easier.

It also installs on top of an available conda instance.

leepc12 · 2018-08-02T19:22:47Z

Thanks @karl616. I tested your PR and it worked fine except phantompeakqualtools. Once phantompeakqualtools is available on a conda repo, I will test again and let you know.

BTW I cannot remove MACS2 py egg extraction workaround in the installer script.
kundajelab/atac_dnase_pipelines#72

…ipeline into addCondaPicardSupport Conflicts: installers/install_dependencies.sh

karl616 · 2018-08-03T07:39:26Z

OK, phantompeakqualtools is there. It should work. One thing I didn't mention earlier was that I also added cromwell as a dependency. It's not directly part of the pipeline, but having it in the bundle removes one step for the end-user.

I undid the removal of the egg extraction. In the same vein perhaps; I wasn't sure about this:
https://github.com/karl616/atac-seq-pipeline/blob/fb5f4255e932394c3a81a112cce95bcd3162ec7d/installers/install_dependencies.sh#L29-L32
It is currently excluded and works on my system, but I don't have access to all the various backends you support

karl616 · 2018-08-03T09:48:58Z

If this environment setup gets through the tests we could also simplify the container generation (I think). (And I apologize if I dismissed the idea too harshly earlier, some of my systems are too backwards and I was one step behind in the thought process.)

What should be possible is to simply outsource the generation of containers to BioContainers, which is an extension/partner of bioconda.

http://biocontainers.pro/

They have one container per bioconda recipe, but also provide multi-package containers:

http://biocontainers.pro/multi-package-containers

I have to figure out if it is possible with all these legacy dependencies, but if so, that would relieve you of supporting containers along with the pipeline.

Biocontainers provide both docker:

https://quay.io/organization/biocontainers

and singularity containers:

https://depot.galaxyproject.org/singularity/

Perhaps this is a separate pull request and some of the bioconda recipes we added hasn't been ported to biocontainer yet (I'm following it up), but before venturing forth I wanted check if there is interest.

It requires some changes now, but I imagine it will make the process of updating software easier in the long run.

karl616 · 2018-08-03T10:43:46Z

It works with older versions
Pull requests like these:

python2: https://github.com/BioContainers/multi-package-containers/pull/529/files
python3: https://github.com/BioContainers/multi-package-containers/pull/528/files

Gives you containers like these:

python2:
https://quay.io/repository/biocontainers/mulled-v2-c0d01ccc79bbd3fe061a321d7ead24b5abd34899
docker pull quay.io/biocontainers/mulled-v2-c0d01ccc79bbd3fe061a321d7ead24b5abd34899:723b6776d5a7f3816c8ef3035a855cff230f1e65-0
singluarity:
https://depot.galaxyproject.org/singularity/mulled-v2-c0d01ccc79bbd3fe061a321d7ead24b5abd34899%3A723b6776d5a7f3816c8ef3035a855cff230f1e65-0

python3:
https://quay.io/repository/biocontainers/mulled-v2-4079a2d072f7abce3806c0603812aee84e36b5d7
docker pull quay.io/biocontainers/mulled-v2-4079a2d072f7abce3806c0603812aee84e36b5d7:fd6613618d8dafbebac12d49f3884cfa879c0ef5-0
singularity
https://depot.galaxyproject.org/singularity/mulled-v2-4079a2d072f7abce3806c0603812aee84e36b5d7%3Afd6613618d8dafbebac12d49f3884cfa879c0ef5-0

They replicate the conda environments which is good for reproducibility, The drawback is that you need two containers and the naming is a bit clumsy.

Anyhow, my idea was that this could be handled within the wdl file:
http://cromwell.readthedocs.io/en/develop/RuntimeAttributes/#docker

What do you think?

karl616 · 2018-08-07T09:46:48Z

@leepc12 the containers seems to require more work than I expected. BioContainers are built upon busybox and hence doesn't provide full bash functionality, which is required by built-in wdl functions like read_map. I'll see if I can find a way around it.
I'll open up a new pull request if I come up with something and leave this here for bioconda.

leepc12 · 2018-08-08T19:43:44Z

@karl616: Thanks for your work. Yes, let's talk more about the containers in another PR or issue.

BTW, your PR works fine but I got slightly different MACS2 outputs between Conda and docker due to some changes in numpy/blas. So I still need to figure this out. I will make a new branch derived from this PR and will let you know about progress.

karl616 · 2018-08-09T14:23:30Z

@leepc12 Sounds good.
The difference is very inconvenient. I compared my original environment to the new one and there was a lot of differing versions.
To me this sounds like that we should lock down the version of more software. Perhaps even all of them. One way to go is this:

https://conda.io/docs/user-guide/tasks/manage-environments.html#cloning-an-environment

That would ensure that versions stay the same.

I'm going back and setting up the old version anew in order to do a comparison. If you want assistance, perhaps you could share the output from conda list for the old and new encode-atac-seq-pipeline environments.

I suspect that already by comparing our two installations we will see differences. In my experience, conda tries to stay as actual as possible when no version is specified.

karl616 · 2018-08-09T15:31:44Z

Here are my files if you want to have a look:

First two versions of the original pipeline installed with roughly one week in between
encode-atac-seq-pipeline.orig.180802.txt
encode-atac-seq-pipeline.orig.180810.txt

And the new installation process:
encode-atac-seq-pipeline.new.180802.txt

My point is that only the software with fixed version stay constant... And still there might be new builds. Cython and biopython are two packages changing by just installing old environment twice...

I looked at the docker file and there are quite a few differences to the conda environment... Would it be interesting to harmonize this?

I tried to do this quickly before and I messed it up...

karl616 · 2018-08-10T15:23:52Z

I added a draft for a more docker like conda-environment...
https://github.com/ENCODE-DCC/atac-seq-pipeline/raw/4bb631863b4204a89ce6b497ff9da501fb13371f/installers/requirements_docker.txt

leepc12 · 2018-08-22T18:16:43Z

@karl616 . I am very sorry about very late update for this PR. We appreciate your work on this.
I am still struggling with dependency problems related to MACS2 and still getting different outputs on different platforms. but I will finish it soon and will let your know.
Thanks again for your work.

Add support for bioconda picard installtions.

ba068c3

- rely on that the bioconda wrapper is a symbolic link to the folder containing picard.jar

karl616 mentioned this pull request Jul 27, 2018

Simplify conda installation #21

Closed

shifting the detection of picard tools into python and add support fo…

6ab97d9

…r bioconda installations

Karl Nordström and others added 3 commits August 2, 2018 16:37

simplified conda installation

46509bb

Merge branch 'condaConditionals' into addCondaPicardSupport

5e4ae8e

Merge branch 'master' into addCondaPicardSupport

ffb3cb4

Karl Nordström added 3 commits August 3, 2018 09:21

cleaning up and reverting some changes

f67994d

clean up

0f6ec05

Merge branch 'addCondaPicardSupport' of github.com:karl616/atac-seq-p…

fb5f425

…ipeline into addCondaPicardSupport Conflicts: installers/install_dependencies.sh

karl616 changed the title ~~Add support for bioconda picard installtions.~~ Add support for bioconda picard installations. Aug 3, 2018

karl616 changed the title ~~Add support for bioconda picard installations.~~ Add support for bioconda picard installation. Aug 3, 2018

adding requirements for a more docker-like conda environment.

4bb6318

leepc12 added a commit that referenced this pull request Aug 15, 2018

testing PR #22

9452239

leepc12 mentioned this pull request Aug 27, 2018

Hotfix pip 357 #24

Merged

leepc12 closed this Sep 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for bioconda picard installation. #22

Add support for bioconda picard installation. #22

karl616 commented Jul 27, 2018

leepc12 commented Jul 27, 2018

karl616 commented Jul 28, 2018 via email

karl616 commented Jul 30, 2018

karl616 commented Aug 1, 2018

karl616 commented Aug 1, 2018

vsoch commented Aug 1, 2018

karl616 commented Aug 1, 2018 via email

leepc12 commented Aug 1, 2018

leepc12 commented Aug 1, 2018

vsoch commented Aug 1, 2018

karl616 commented Aug 2, 2018 •

edited

Loading

karl616 commented Aug 2, 2018

leepc12 commented Aug 2, 2018

karl616 commented Aug 3, 2018

karl616 commented Aug 3, 2018

karl616 commented Aug 3, 2018 •

edited

Loading

karl616 commented Aug 7, 2018

leepc12 commented Aug 8, 2018

karl616 commented Aug 9, 2018

karl616 commented Aug 9, 2018 •

edited

Loading

karl616 commented Aug 10, 2018

leepc12 commented Aug 22, 2018

Add support for bioconda picard installation. #22

Add support for bioconda picard installation. #22

Conversation

karl616 commented Jul 27, 2018

leepc12 commented Jul 27, 2018

karl616 commented Jul 28, 2018 via email

karl616 commented Jul 30, 2018

karl616 commented Aug 1, 2018

karl616 commented Aug 1, 2018

vsoch commented Aug 1, 2018

karl616 commented Aug 1, 2018 via email

leepc12 commented Aug 1, 2018

leepc12 commented Aug 1, 2018

vsoch commented Aug 1, 2018

karl616 commented Aug 2, 2018 • edited Loading

karl616 commented Aug 2, 2018

leepc12 commented Aug 2, 2018

karl616 commented Aug 3, 2018

karl616 commented Aug 3, 2018

karl616 commented Aug 3, 2018 • edited Loading

karl616 commented Aug 7, 2018

leepc12 commented Aug 8, 2018

karl616 commented Aug 9, 2018

karl616 commented Aug 9, 2018 • edited Loading

karl616 commented Aug 10, 2018

leepc12 commented Aug 22, 2018

karl616 commented Aug 2, 2018 •

edited

Loading

karl616 commented Aug 3, 2018 •

edited

Loading

karl616 commented Aug 9, 2018 •

edited

Loading