Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/Create mock data subworkflow #206

Merged
merged 11 commits into from
Jan 5, 2022
Merged

Fix/Create mock data subworkflow #206

merged 11 commits into from
Jan 5, 2022

Conversation

chasemc
Copy link
Member

@chasemc chasemc commented Dec 17, 2021

Part of but does-not-fix:

mention
#152

Part of but does-not-fix: 

mention
#152
@chasemc
Copy link
Member Author

chasemc commented Dec 17, 2021

PR allows creating mock contigs from an input set of Genbank assembly accessions. Also creates two minimal reports at end showing the binning results- one colored by genus (parsed from name) and one by assembly accession.

Example output:
mock_data_reports.zip

@chasemc chasemc marked this pull request as ready for review December 17, 2021 21:17
@chasemc
Copy link
Member Author

chasemc commented Dec 17, 2021

Notes:
Mock data reports should write out to the main output folder.

To run the pipeline with mock data set the parameter --mock_test true

@evanroyrees evanroyrees added the nextflow Nextflow related issues/code label Dec 21, 2021
@chasemc
Copy link
Member Author

chasemc commented Dec 21, 2021

A couple of things to fix (or not) before merging in (@WiscEvan I don't think I'll have time to do these today)

  1. This process needs a docker image. @ajlail98 could maybe look around to find one?

    // TODO: Docker image for emboss or another seqsplitter/python script

  2. This one also:

    // container TODO: The R "Rocker" Docker images don't have ps which is required by Nextflow so this may have to be a custom image?
    // docker build https://gist.githubusercontent.com/chasemc/818111640daae05beb2b070641aa33fb/raw/09107704fa60a6311fb09542c8b99b848e168ea3/Dockerfile --tag mock_data_reporter

    I have an example there but it has to be built first. Maybe that's okay if the mock_data is only going to be used by developers, where instructions to build the image first could be provided
    Note: that dockerfile is a modified version of:
    https://github.com/rocker-org/rocker/blob/master/r-rmd/Dockerfile
    where procps is also installed (required by Nextflow), so the Rocker project license would have to be included

  3. Last- just a note that when I happened to run this with "GCF_013307045.1" it failed because of no markers found. May be worth looking into

🐳 Add modules/local/get_genomes_for_mock.nf for GET_GENOMES_FOR_MOCK process
🐳 Add modules/local/mock_data_reporter.nf for MOCK_DATA_REPORT process
🐳🍏 Add container tag to processes with Dockerfiles
🎨 Add Makefile command to build local modules processes images
@evanroyrees
Copy link
Collaborator

📝 I've added a tag to the GET_GENOMES_FOR_MOCK process in get_genomes_for_mock.nf so the user can easily tell how many genomes are being fetched for the mock community.

Runtime Note

  • I built the docker image jason-c-kwan/autometa:dev (docker build . -t jason-c-kwan/autometa:dev) prior to running:
nextflow run . -profile docker -params-file "nf-params.json" --mock_test true --input .

nf-params.json

{
    "autometa_image_tag": "dev"
}

Dockerfiles

I've also added dockerfiles for the processes you've mentioned. I was not sure where to put these. I've opted to place them in a $HOME/Autometa/docker/modules sub-directory. If you have guidance on where these should be placed, feel free to move them.. If you make these changes, the Makefile command modules-images will need to be updated to conform to these updated paths.

i.e. to build all autometa nextflow modules docker images from Makefile

make modules-images

A couple of things to fix (or not) before merging in

  1. This process needs a docker image.
    modules/local/get_genomes_for_mock.nf)
  • 🐳 Docker image for get_genomes_for_mock.nf can be built with docker/modules/get_genomes_for_mock.Dockerfile
  1. This one also:

    // container TODO: The R "Rocker" Docker images don't have ps which is required by Nextflow so this may have to be a custom image?
    // docker build https://gist.githubusercontent.com/chasemc/818111640daae05beb2b070641aa33fb/raw/09107704fa60a6311fb09542c8b99b848e168ea3/Dockerfile --tag mock_data_reporter

    I have an example there but it has to be built first. Maybe that's okay if the mock_data is only going to be used by developers, where instructions to build the image first could be provided
    Note: that dockerfile is a modified version of:
    https://github.com/rocker-org/rocker/blob/master/r-rmd/Dockerfile
    where procps is also installed (required by Nextflow), so the Rocker project license would have to be included

I've used conda to create the R env

  • 🐳 Docker image for mock_data_reporter.nf can be built with docker/modules/mock_data_reporter.Dockerfile

@evanroyrees evanroyrees self-requested a review January 5, 2022 00:27
Copy link
Collaborator

@evanroyrees evanroyrees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added Dockerfiles for the respective Autometa nextflow modules images to docker/modules... I'm not sure if this is appropriate with the nf-core team, but everything seems to be working.

NOTE: The "beeswarm" plots contents are appearing in the mock report, but their axes titles are not... This doesn't appear to be a breaking issue, so I'm going to approve and merge.

👍

@evanroyrees evanroyrees merged commit 6198205 into dev Jan 5, 2022
@evanroyrees evanroyrees deleted the mock_data branch January 6, 2022 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nextflow Nextflow related issues/code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants