Skip to content

Commit

Permalink
add all the bacteria formatter
Browse files Browse the repository at this point in the history
  • Loading branch information
rpetit3 committed Mar 22, 2024
1 parent 3f0c46c commit 59106ae
Show file tree
Hide file tree
Showing 4 changed files with 265 additions and 1 deletion.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## Unreleased

## 1.0.9

- added `bactopia-atb-formatter` to format All-the-Bacteria assemblies for Bactopia

## 1.0.8

- Fixed `bactopia-prepare` usage of `--prefix` not working
Expand Down
106 changes: 106 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,112 @@ Below is the `--help` output for each subcommand.
╰──────────────────────────────────────────────────────────────────────────────────────╯
```

# All The Bacteria (ATB)

The [AllTheBacteria](https://www.biorxiv.org/content/10.1101/2024.03.08.584059v1) is a collection
of nearly 2,000,000 bacterial genomes. Using available FASTQ files from the European Nucleotide
Archive (ENA) and Sequence Read Archive (SRA), the genomes were assembled using [Shovill] and made
publicly available from the [Iqbal Lab])https://github.com/iqbal-lab-org/AllTheBacteria).

To make it easy to utilize [Bactopia Tools](https://bactopia.github.io/latest/bactopia-tools/) with
assemblies from AllTheBacteria, `bactopia-atb-formatter` was created. This tool will create a
directory structure that resembles output from an actual Bactopia run.

### `bactopia-atb-formatter`

```{bash}
Usage: bactopia-atb-formatter [OPTIONS]
Restructure All-the-Bacteria assemblies to allow usage with Bactopia Tools
╭─ Required Options ───────────────────────────────────────────────────────────────────╮
│ * --path -p TEXT Directory where FASTQ files are stored [required] │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Bactopia Directory Structure Options ───────────────────────────────────────────────╮
│ --bactopia-dir -b TEXT The path you would like to place bactopia │
│ structure │
│ [default: bactopia] │
│ --publish-mode -m [symlink|copy] Designates plascement of assemblies will be │
│ handled │
│ [default: symlink] │
│ --recursive -r Traverse recursively through provided path │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Additional Options ─────────────────────────────────────────────────────────────────╮
│ --verbose Increase the verbosity of output │
│ --silent Only critical errors will be printed │
│ --version -V Show the version and exit. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────╯
```

### Example Usage for _Legionella pneumophila_

To demonstrate the usage of `bactopia-atb-formatter`, we will use assemblies for
_Legionella pneumophila_. The following steps will download the assemblies, build the
Bactopia directory structure, and then run [legsta](https://github.com/tseemann/legsta)
via the [Bactopia Tool](https://bactopia.github.io/latest/bactopia-tools/legsta/).

#### Download the Assemblies

First will download the _Legionella pneumophila_ assemblies from AllTheBacteria. After downloading
we will extract them into a folder called `legionella-assemblies`. Within this folder, there will be
subdirectories for each tarball that was downloaded.

```{bash}
mkdir atb-legionella
cd atb-legionella
# Download the assemblies
wget https://ftp.ebi.ac.uk/pub/databases/AllTheBacteria/Releases/0.1/assembly/legionella_pneumophila__01.asm.tar.xz
wget https://ftp.ebi.ac.uk/pub/databases/AllTheBacteria/Releases/0.1/assembly/legionella_pneumophila__02.asm.tar.xz
# Extract the assemblies
mkdir legionella-assemblies
tar -C legionella-assemblies -xJf legionella_pneumophila__01.asm.tar.xz
tar -C legionella-assemblies -xJf legionella_pneumophila__02.asm.tar.xz
```

#### Create the Bactopia Directory Structure

With the assemblies extracted, we can now create the Bactopia directory structure using
`bactopia-atb-formatter`. Once complete, each assembly will have its own folder created
which matches the BioSample accession of the assembly.

```{bash}
# Create the Bactopia directory structure
bactopia atb-formatter --path legionella-assemblies --recursive
2024-03-22 14:30:07 INFO 2024-03-22 14:30:07:root:INFO - Setting up Bactopia directory structure (use --verbose to see more details) atb_formatter.py:129
2024-03-22 14:30:08 INFO 2024-03-22 14:30:08:root:INFO - Bactopia directory structure created at bactopia atb_formatter.py:134
INFO 2024-03-22 14:30:08:root:INFO - Total assemblies processed: 5393
```

Please note the usage of `--recursive` which will traverse the `legionella-assemblies` directory
to find all assemblies contained. At this point, the `bactopia` directory structure has been
created for 5,393 assemblies and is ready for use with Bactopia Tools.

#### Use Bactopia to run Legsta

As mentioned above, we will use [legsta](https://github.com/tseemann/legsta) to analyze each
of the _Legionella pneumophila_ assemblies. To do this, we will use the
[legsta Bactopia Tool](https://bactopia.github.io/latest/bactopia-tools/legsta/).

```{bash}
# Run legsta (please utilize Docker or Singularity only for reproducibility)
bactopia --wf legsta -profile singularity
```

Please note, for reproducibility, it is recommended to use Docker or Singularity with
Bactopia Tools.

Upon completion, you should be met with something like the following:

```{bash}
```

That's it! Now you can take advantage of any of the [Bactopia Tools](https://bactopia.github.io/latest/bactopia-tools/)
that utilize assemblies as inputs.

# Feedback
Your feedback is very valuable! If you run into any issues using Bactopia, have questions, or have some ideas to improve Bactopia, I highly encourage you to submit it to the [Issue Tracker](https://github.com/bactopia/bactopia/issues).

Expand Down
153 changes: 153 additions & 0 deletions bactopia/cli/atb_formatter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
import logging
import shutil
import sys
from pathlib import Path

import rich
import rich.console
import rich.traceback
import rich_click as click
from rich.logging import RichHandler

import bactopia

# Set up Rich
stderr = rich.console.Console(stderr=True)
rich.traceback.install(console=stderr, width=200, word_wrap=True, extra_lines=1)
click.rich_click.USE_RICH_MARKUP = True
click.rich_click.OPTION_GROUPS = {
"bactopia-atb-formatter": [
{"name": "Required Options", "options": ["--path"]},
{
"name": "Bactopia Directory Structure Options",
"options": [
"--bactopia-dir",
"--publish-mode",
"--recursive",
],
},
{
"name": "Additional Options",
"options": [
"--verbose",
"--silent",
"--version",
"--help",
],
},
]
}


def search_path(path, pattern, recursive=False):
if recursive:
return Path(path).rglob(pattern)
else:
return Path(path).glob(pattern)


def create_sample_directory(sample, assembly, bactopia_dir, publish_mode="symlink"):
logging.debug(f"Creating {sample} directory ({bactopia_dir}/{sample})")
sample_dir = Path(f"{bactopia_dir}/{sample}")
if not sample_dir.exists():
sample_dir.mkdir(parents=True, exist_ok=True)

# Make remaining subdirectories (which will be empty)
Path(f"{bactopia_dir}/{sample}/main").mkdir(parents=True, exist_ok=True)
Path(f"{bactopia_dir}/{sample}/main/gather").mkdir(parents=True, exist_ok=True)
Path(f"{bactopia_dir}/{sample}/main/assembler").mkdir(parents=True, exist_ok=True)

# Write the meta.tsv file
logging.debug(f"Writing {sample}-meta.tsv")
with open(f"{bactopia_dir}/{sample}/main/gather/{sample}-meta.tsv", "w") as meta_fh:
meta_fh.write(
"sample\truntype\toriginal_runtype\tis_paired\tis_compressed\tspecies\tgenome_size\n"
)
meta_fh.write(
f"{sample}\tassembly_accession\tassembly_accession\tfalse\tfalse\null\0\n"
)

# Write the assembly file
final_assembly = f"{bactopia_dir}/{sample}/main/assembler/{sample}.fna"
final_assembly_path = Path(final_assembly)
if publish_mode == "symlink":
logging.debug(f"Creating symlink of {assembly} at {final_assembly}")
final_assembly_path.symlink_to(assembly)
else:
logging.debug(f"Copying {assembly} to {final_assembly}")
shutil.copyfile(assembly, final_assembly)

return True


@click.command()
@click.version_option(bactopia.__version__, "--version", "-V")
@click.option(
"--path", "-p", required=True, help="Directory where FASTQ files are stored"
)
@click.option(
"--bactopia-dir",
"-b",
default="bactopia",
show_default=True,
help="The path you would like to place bactopia structure",
)
@click.option(
"--publish-mode",
"-m",
default="symlink",
show_default=True,
type=click.Choice(["symlink", "copy"], case_sensitive=False),
help="Designates plascement of assemblies will be handled",
)
@click.option(
"--recursive", "-r", is_flag=True, help="Traverse recursively through provided path"
)
@click.option("--verbose", is_flag=True, help="Increase the verbosity of output")
@click.option("--silent", is_flag=True, help="Only critical errors will be printed")
def atb_formatter(
path,
bactopia_dir,
publish_mode,
recursive,
verbose,
silent,
):
"""Restructure All-the-Bacteria assemblies to allow usage with Bactopia Tools"""
# Setup logs
logging.basicConfig(
format="%(asctime)s:%(name)s:%(levelname)s - %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
handlers=[
RichHandler(rich_tracebacks=True, console=rich.console.Console(stderr=True))
],
)
logging.getLogger().setLevel(
logging.ERROR if silent else logging.DEBUG if verbose else logging.INFO
)

abspath = Path(path).absolute()
fasta_ext = ".fa"

# Match Assemblies
count = 0
logging.info(
"Setting up Bactopia directory structure (use --verbose to see more details)"
)
for fasta in search_path(abspath, f"*{fasta_ext}", recursive=recursive):
fasta_name = fasta.name.replace(fasta_ext, "")
create_sample_directory(fasta_name, fasta, bactopia_dir, publish_mode)
count += 1
logging.info(f"Bactopia directory structure created at {bactopia_dir}")
logging.info(f"Total assemblies processed: {count}")


def main():
if len(sys.argv) == 1:
atb_formatter.main(["--help"])
else:
atb_formatter()


if __name__ == "__main__":
main()
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "bactopia"
version = "1.0.8"
version = "1.0.9"
description = "A Python package for working with Bactopia"
authors = [
"Robert A. Petit III <robbie.petit@gmail.com>",
Expand All @@ -19,6 +19,7 @@ bactopia-prepare = "bactopia.cli.prepare:main"
bactopia-search = "bactopia.cli.search:main"
bactopia-summary = "bactopia.cli.summary:main"
bactopia-update = "bactopia.cli.update:main"
bactopia-atb-formatter = "bactopia.cli.atb_formatter:main"

[tool.poetry.dependencies]
python = "^3.8.0"
Expand Down

0 comments on commit 59106ae

Please sign in to comment.