Skip to content

bioconvert/bioconvert

Repository files navigation

Bioconvert

Bioconvert is a collaborative project to facilitate the interconversion of life science data from one format to another.

image

image

image

Documentation Status

image

image

image

image

image

image

image

contributions

Want to add a convertor ? Please join #1

How to cite

Caro et al, BioConvert: a comprehensive format converter for life sciences (2023) NAR Genomics and Bioinformatics (5),3. https://doi.org/10.1093/nargab/lqad074

Overview

Life science uses many different formats. They may be old, or with complex syntax and converting those formats may be a challenge. Bioconvert aims at providing a common tool / interface to convert life science data formats from one to another.

Many conversion tools already exist but they may be dispersed, focused on few specific formats, difficult to install, or not optimised. With Bioconvert, we plan to cover a wide spectrum of format conversions; we will re-use existing tools when possible and provide facilities to compare different conversion tools or methods via benchmarking. New implementations are provided when considered better than existing ones.

In Jan 2023, we had 50 formats, 100 direct conversions available.

image

Installation

BioConvert is developped in Python. Please use conda or any Python environment manager to install BioConvert using the pip command:

pip install bioconvert

50% of the conversions should work out of the box. However, many conversions require external tools. This is why we recommend to use a conda environment. In particular, most external tools are available on the bioconda channel. For instance if you want to convert a SAM file to a BAM file you would need to install samtools as follow:

conda install -c bioconda samtools

Since bioconvert is available on bioconda on solution that installs BioConvert and all its dependencies is to use conda/mamba:

conda env create --name bioconvert mamba
conda activate bioconvert
mamba install bioconvert
bioconvert --help

See the Installation section for more details and alternative solutions (docker, singularity).

Quick Start

There are many conversions available. Type:

bioconvert --help 

to get a list of valid method of conversions. Taking the example of a conversion from a FastQ file into a FastA file, you could do the conversion as follows:

bioconvert fastq2fasta input.fastq output.fasta
bioconvert fastq2fasta input.fq    output.fasta
bioconvert fastq2fasta input.fq.gz output.fasta.gz
bioconvert fastq2fasta input.fq.gz output.fasta.bz2

When there is no ambiguity, you can be implicit:

bioconvert input.fastq output.fasta

The default method of conversion is used but you may use another one. Checkout the available methods with:

bioconvert fastq2fasta --show-methods

For more help about a conversion, just type:

bioconvert fastq2fasta --help

and more generally:

bioconvert --help

You may also call BioConvert from a Python shell:

# import a converter
from bioconvert.fastq2fasta import FASTQ2FASTA

# Instanciate with infile/outfile names
convert = FASTQ2FASTA(infile, outfile)

# the conversion itself:
convert()

Available Converters

Conversion table
Converters CI testing Default method
abi2fasta image BIOPYTHON
abi2fastq image BIOPYTHON
abi2qual image BIOPYTHON
bam2bedgraph image BEDTOOLS
bam2bigwig image DEEPTOOLS
bam2cov image BEDTOOLS
bam2cram image SAMTOOLS
bam2fasta image SAMTOOLS
bam2fastq image SAMTOOLS
bam2json image BAMTOOLS
bam2sam image SAMBAMBA
bam2tsv image SAMTOOLS
bam2wiggle image WIGGLETOOLS
bcf2vcf image BCFTOOLS
bcf2wiggle image WIGGLETOOLS
bed2wiggle image WIGGLETOOLS
bedgraph2bigwig image UCSC
bedgraph2cov image BIOCONVERT
bedgraph2wiggle image WIGGLETOOLS
bigbed2bed image DEEPTOOLS
bigbed2wiggle image WIGGLETOOLS
bigwig2bedgraph image DEEPTOOLS
bigwig2wiggle image WIGGLETOOLS
bplink2plink image PLINK
bplink2vcf image PLINK
bz22gz image Unix commands
clustal2fasta image BIOPYTHON
clustal2nexus image GOALIGN
clustal2phylip image BIOPYTHON
clustal2stockholm image BIOPYTHON
cram2bam image SAMTOOLS
cram2fasta image SAMTOOLS
cram2fastq image SAMTOOLS
cram2sam image SAMTOOLS
csv2tsv image BIOCONVERT
csv2xls image Pandas
dsrc2gz image DSRC software
embl2fasta image BIOPYTHON
embl2genbank image BIOPYTHON
fasta2clustal image BIOPYTHON
fasta2faa image BIOCONVERT
fasta2fasta_agp image BIOCONVERT
fasta2fastq image PYSAM
fasta2genbank image BIOCONVERT
fasta2nexus image GOALIGN
fasta2phylip image BIOPYTHON
fasta2twobit image UCSC
fasta_qual2fastq image PYSAM
fastq2fasta image BIOCONVERT available
fastq2fasta_qual image BIOCONVERT
fastq2qual image READFQ
genbank2embl image BIOPYTHON
genbank2fasta image BIOPYTHON
genbank2gff3 image BIOCODE
gfa2fasta image BIOCONVERT
gff22gff3 image BIOCONVERT
gff32gff2 image BIOCONVERT
gff32gtf image BIOCONVERT
gz2bz2 image pigz/pbzip2 software
gz2dsrc image DSRC software
json2yaml image Python
maf2sam image BIOCONVERT
newick2nexus image GOTREE
newick2phyloxml image GOTREE
nexus2clustal image GOALIGN
nexus2fasta image BIOPYTHON
nexus2newick image GOTREE
nexus2phylip image GOALIGN
nexus2phyloxml image GOTREE
ods2csv image pyexcel library
pdb2faa image BIOCONVERT
phylip2clustal image BIOPYTHON
phylip2fasta image BIOPYTHON
phylip2nexus image GOALIGN
phylip2stockholm image BIOPYTHON
phylip2xmfa image BIOPYTHON
phyloxml2newick image GOTREE
phyloxml2nexus image GOTREE
plink2bplink image PLINK
plink2vcf image PLINK
sam2bam image SAMTOOLS
sam2cram image SAMTOOLS
sam2paf image BIOCONVERT
scf2fasta image BIOCONVERT
scf2fastq image BIOCONVERT
sra2fastq image FASTQDUMP
stockholm2clustal image BIOPYTHON
stockholm2phylip image BIOPYTHON
tsv2csv image BIOCONVERT
twobit2fasta image DEEPTOOLS
vcf2bcf image BCFTOOLS
vcf2bed image BIOCONVERT
vcf2bplink image PLINK
vcf2plink image PLINK
vcf2wiggle image WIGGLETOOLS
wig2bed image BEDOPS
xls2csv image
xlsx2csv image Pandas library
xmfa2phylip image BIOPYTHON
yaml2json image Pandas library

Contributors

Setting up and maintaining Bioconvert has been possible thanks to users and contributors. Thanks to all:

image

Changes

Version Description

1.1.1

  • Fix benchmark labels.
  • NEW: fast52pod5 conversion
  • FIX: set goalign and gotree instead of go requirements

1.1.0

  • Implement ability to benchmark the CPU and memory usage (not just time) benchmark incorporates CPU/memory usage

1.0.0

  • Fix bam2fastq for paired data that computed useless intermediate file #325
  • more realistic fastq simulator
  • pin openpyxl to <=3.0.10 to prevent regression error in v3.1.0

0.6.3

  • add picard method in bam2sam
  • Fixed all CI workflows to use mamba
  • drop python3.7 support and add 3.10 support
  • update bedops test file to fit the latest bedops 2.4.41 version
  • revisit logging system

0.6.2

  • added gff3 to gtf conversion.
  • Added pdb to faa conversion
  • Added missing --reference argument to the cram2sam conversion

0.6.1

  • output file can be in sub-directories allowing syntax such as 'bioconvert fastq2fasta test.fastq outputs/test.fasta
  • fix all CI actions
  • add more examples as notebooks in ./examples
  • add a Snakefile for the paper in ./doc/Snakefile_paper

0.6.0

  • Fix bug in bam2sam (method sambamba)
  • Fix graph layout
  • add threading in fastq2fasta (seqkit method)
  • multibenchmark feature added
  • stable version used for web interface
0.5.2 * Update requirements and environment.yml and add a conda spec-file.txt file
0.5.1 * add genbank2gff3 requirement material in bioconvert.utils.biocode

0.5.0

  • Add CI actions for all converters
  • remove sniffer (now in biosniff on pypi https://pypi.org/project/biosniff/)
  • A complete benchmarking suite (see doc/Snakefile_benchmark file and benchmarking)
  • documentation and tests for all converters
  • removed the validators (we assume intputs are correct)

0.4.X

  • (aug 2019) added nexus2fasta, cram2fasta, fasta2faa ... ; 1-to-many and many-to-one converters are now part of the API.
0.3.X

may 2019. new methods abi2qual, bigbed2bed, etc. added --threads option

0.2.X

aug 2018. abi2fastx, bioconvert_stats tool added

0.1.X

major refactoring to have subcommands with implicit/explicit mode