Skip to content

Ulthran/pycov3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pycov3

Tests codecov Renovate enabled Codacy Badge PyPI Bioconda DockerHub

A package for generating cov3 files which are generated from sam files giving coverage information and a fasta file giving binned contigs. Cov3 files are used as input for the DEMIC R package which calculates PTR, an estimate for bacterial growth rates.

Installation

PyPi

pip install pycov3
pycov3 -h

Bioconda

conda create -n pycov3 -c conda-forge -c bioconda pycov3
conda activate pycov3
pycov3 -h

DockerHub

docker pull ctbushman/pycov3:latest
docker run --rm --name pycov3 pycov3 pycov3 -h

GitHub

git clone https://github.com/Ulthran/pycov3.git
cd pycov3/
pip install .
pycov3 -h

Usage

Use -h to see options for running the CLI.

$ pycov3 -h

The FASTAs should all be in one directory with names of the format {sample}.{bin_name}.fasta/.fa/.fna and the SAMs should also all be in one directory with names of the format {sample}_{bin_name}.sam. The output COV3 files will be written to a directory with names of the format {sample}.{bin_name}.cov3.

You can also use the library in your own code. Create a SAM directory and FASTA directory, set any non-default window or coverage parameters, then create a COV3 directory and use it to generate a COV3 file for each contig set in the FASTA directory.

    from pycov3.Directory import Cov3Dir, FastaDir, SamDir

    sam_d = SamDir(Path("/path/to/sams/"), False)

    window_params = {
        "window_size": None,
        "window_step": None,
        "edge_length": sam_d.calculate_edge_length(),
    }
    coverage_params = {
        "mapq_cutoff": None,
        "mapl_cutoff": None,
        "max_mismatch_ratio": None,
    }
    window_params = {k: v for k, v in window_params.items() if v is not None}
    coverage_params = {k: v for k, v in coverage_params.items() if v is not None}

    fasta_d = FastaDir(Path("/path/to/fastas/"), False)

    cov3_d = Cov3Dir(
        Path(args.out_dir),
        False,
        fasta_d.get_filenames(),
        window_params,
        coverage_params,
    )

    cov3_d.generate(sam_d, fasta_d)

Alternatively, to use the bare application logic and do all the file handling yourself, you can use the Cov3Generator class which takes a list of generators as SAM inputs and a generator as a FASTA input.

    from pycov3.Cov3Generator import Cov3Generator
    from pycov3.File import Cov3File

    cov3_generator = Cov3Generator(
        sam_generators,
        fasta_generator,
        sample,
        bin_name,
        window_params,
        **coverage_params,
    )

    cov3_dict = cov3_generator.generate_cov3()

    # Write output
    cov3_file = Cov3File(Path(/path/to/output/), "001")
    cov3_file.write_generator(cov3_generator.generate_cov3())

Resource Requirements

Threads: pycov3 uses multiprocessing to parallelize processing of input fastas. Increasing --thread_num up to the number of input fastas should improve runtime, with no benefits beyond that number.

Memory: pycov3 uses generators as much as possible. The main memory users are the Contig objects, which each hold a contig's sequence and information for each Window over its length. There is also a coverages dictionary that could potentially grow to the size of the largest contig (but that is very unlikely). At a minimum, twice the size of the largest contig should be given per thread.

Algorithmic Complexity: Assuming enough threads are provided to have each fasta file processed separately, the time complexity is roughly O(cwsr).

c: Number of contigs in fasta s: Number of sam files w: Max number of windows per contig r: Max number of records per sam file

Help

Please use the Issues on this repo for any problems, questions, or suggestions.