Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update nextflow-workflow.rst #238

Merged
merged 8 commits into from
Feb 16, 2022
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 157 additions & 0 deletions docs/source/nextflow-workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,163 @@ Example ``sample_sheet.csv``
+-------------------------------------------------+------------------------------------------------------+---------------------------------------------------------------------+


Quick start
###########
The following is a condensed summary of steps required to get Autometa installed, configured and running. There are links throughout to the appropriate documentation sections that can provide more detail if required.
samche42 marked this conversation as resolved.
Show resolved Hide resolved

Installation
************
For full installation instructions, please see the :ref:`Installation` section
samche42 marked this conversation as resolved.
Show resolved Hide resolved

Installation relies on Miniconda being installed on your system (`<https://docs.conda.io/en/latest/miniconda.html>`_).
To install the conda environment run the following code:

.. code-block:: bash

conda env create --file=https://raw.githubusercontent.com/KwanLab/Autometa/main/nextflow-env.yml

Next, activate the environment:

.. code-block:: bash

conda activate autometa-nf
samche42 marked this conversation as resolved.
Show resolved Hide resolved

Configuring a scheduler
***********************
For full details on how to configure your scheduler, please see the :ref:`Configuring your process executor` section.

If you are using a Slurm scheduler, you will need to create a configuration file. If you do not have a scheduler, skip ahead to "Running Autometa".

First you will need to know the name of your slurm partition. Run :code:`sinfo` to find this. In the example below, the partition name is "queue".

.. image:: ../img/slurm_partitions.png

Next, generate a new file called slurm_nextflow.config via nano:

.. code-block:: bash

nano slurm_nextflow.config

Then copy the following code block into that new file:

.. code-block:: bash

profiles {
slurm {
process.executor = "slurm"
process.queue = "queue" // <<-- change this to whatever your partition is called
// queue is the slurm partition to use in our case
docker.enabled = true
docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
executor {
queueSize = 8
}
}
}

Keep this file somewhere central to you. For the sake of this example I will be keeping it in a folder called "Useful scripts" in my home directory e.g. :code:`/home/me/Useful_scripts/slurm_nextflow.config`
samche42 marked this conversation as resolved.
Show resolved Hide resolved

Save your new file with Ctrl+O and then exit nano with Ctrl+O. Installation and set up is now complete. 🎉 🥳

Running Autometa
****************

For a comprehensive list of features and options and how to use them please see :ref:`Running the pipeline`

Autometa can bin one or several metagenomic datasets in one run. Regardless of the number of metagenomes you
want to process, you will need to provide a sample sheet which specifies the name of your sample, the full path to
where that data is found. If the metagenome was assembled via SPAdes, Autometa can extract coverage and
contig length information from the sequence headers. If you used a different assembler you will need to provide either raw reads for coverage calculation or a table of contig/scaffold coverage. Full details for data preparation may be found under :ref:`sample-sheet-preparation`

First ensure that your Autometa conda environment is activated. You can activate your environment by running:

.. code-block:: bash

conda activate autometa-nf

Navigate to where your metagenome files are and run the following code to launch Autometa:

.. code-block:: bash

nf-core launch KwanLab/Autometa

You will now use the arrow keys to move up and down between your options and hit your "Enter" or "Return" key to make your choice.

Menu 1:
samche42 marked this conversation as resolved.
Show resolved Hide resolved
The double, right-handed arrows should already indicate "2.0.0". This is the latest version of Autometa, hit your "Enter" to chose this option.
samche42 marked this conversation as resolved.
Show resolved Hide resolved

.. image:: ../img/Menu1.png

Menu 2:
Pick the commandline option.

.. note::

Unless you've tunneled into your server, or are using Autometa locally, this is your best option.

.. image:: ../img/Menu2.png

Menu 3:
If you are using a scheduler (Slurm in this example) the only thing you'll need to change is the profile option. If you are not using a scheduler, leave this blank.
samche42 marked this conversation as resolved.
Show resolved Hide resolved

.. image:: ../img/Menu3.png

Menu 4:
Now we need to give Autometa the full path to where to find our sample sheet, where to put the output (it will generate a folder per metagenome in this folder) and where to store all the log files (aka the trace files)
samche42 marked this conversation as resolved.
Show resolved Hide resolved

.. image:: ../img/Menu4.png

Menu 5:
This is where you can change your binning parameters. If you're not sure what you're doing I would recommend only changing the "length_cutoff".
The default cutoff is 3000bp, which means that any contigs/scaffolds smaller than 3000bp will not be considered for binning. This cutoff will depend on how good your assembly is: e.g. if your N50 is 1200 bp, I would choose a cutoff of 1000. If your N50 is more along the lines of 5000, I would leave the cutoff at the default 3000. I would strongly recommend against choosing a number below 900 here. In the example below, I have chosen a cutoff of 1000bp as my assembly was not particularly great

.. image:: ../img/Menu5.png

Menu 6:
Here you have a choice to make:
By enabling taxonomy aware mode, Autometa will attempt to use taxonomic data to make your bins more accurate. However, this is a more computationally expensive step and will make the process take longer. By leaving this option as the default "False" option, Autometa will bin according to coverage and kmer patterns.

With either choice, you need to provide a path to where the necessary databases are stored in the "single_db_dir"
option. Here, I have enabled the taxonomy aware mode and provided the path to where the databases are stored.
See :ref:`Databases` for additional details on required databases.

.. image:: ../img/Menu6.png

Menu 7:
This will depend on the computational resources you have available. You could start with the default values and see
how the binning goes. If you have particularly complex datasets you may want to bump this up a bit. For your
average metagenome, you won't need more than 150Gb of memory. I've opted to use 75 Gb as a
starting point for a few biocrust (somewhat diverse) metagenomes.


.. note::

For TB worth of assembled data you may want to try :ref:`autometa-bash-workflow` using the `autometa-large-data-mode.sh <https://github.com/KwanLab/Autometa/blob/main/workflows/autometa-large-data-mode.sh>`_ template

.. image:: ../img/Menu7.png

Last few options:
You will now be presented with a choice. If you are NOT using a scheduler, you can go ahead and type "y" to launch the workflow. If you are using a scheduler, type "n" - we have one more step to go. In the example below, I am using a scheduler so I have typed "n" to prevent immediately performing the nextflow run command.

.. image:: ../img/launch_choice.png

If you recall, we created a file called :code:`slurm_nextflow.config` that contains the information Autometa will need to talk to the Slurm scheduler. We need to include that file using the :code:`-c` flag (or configuration flag). Therefore to launch the Autometa workflow I would run the following command (you would change the :code:`/home/sam/slurm_nextflow.config` file path to what is appropriate for your system):

.. code-block:: bash

nextflow run KwanLab/Autometa -r 2.0.0 -profile "slurm" -params-file "nf-params.json" -c "/home/sam/slurm_nextflow.config"

Once you have hit the "Enter" key to submit the command, a menu will pop up and let you know the progress of your binning run, such as the one below:

.. image:: ../img/progress.png

When its complete, the output will be stored in your designated output folder (See Menu 4).


Basic
#####

Expand Down