Chantal Renau-Mínguez1,
Paula Herrero-Abadía2,
Vicente Sentandreu3,
Paula Ruiz-Rodriguez1,
Eduard Torrents4,5,
Álvaro Chiner-Oms6,
Manuela Torres-Puente6,
Iñaki Comas6,
Esther Julián2*
and Mireia Coscolla1*
-
I2SysBio, University of Valencia-CSIC, FISABIO Joint Research Unit Infection and Public Health, Valencia, Spain
-
Genetics and Microbiology Department, Faculty of Biosciences, Autonomous University of Barcelona, 08193, Bellaterra, Barcelona, Spain
-
Genomics Unit, Central Service for Experimental Research (SCSIE), University of Valencia, Spain
-
Bacterial Infections and Antimicrobial Therapies Group, Institute for Bioengineering of Catalonia (IBEC), Baldiri Reixac 15-21, 08028 Barcelona, Spain
-
Microbiology Section, Department of Genetics, Microbiology and Statistics, Biology Faculty, Universitat de Barcelona, 08028 Barcelona, Spain
-
Instituto de Biomedicina de Valencia (IBV), CSIC, 46010, Valencia, Spain
* Correspondence:
mireia.coscolla@uv.es (Mireia Coscolla); Esther.Julian@uab.cat (Esther Julián)
The main purpose of this repository is to display the scripts made for this academic work, in order to achieve reproducibility. Also make public and available to all the closed genome of Mycobacterium brumae ATCC 51384T
This folder contains multiple subfolders with scripts made for certain purpose.
Dependencies
Python version 3
BLAST+
The MUMmer 3
Scripts for the analysis to get protein identity of Mycobacterium tuberculosis H37Rv in the analyzed genomes of interest.
- run_blast.py: Script to perform tblastn analysis extracting genes (aa) and find them in a fasta. We calculate the coincidence percentage of the gene with the target, finally we filter the genes by 3 coincidence percentage thresholds: 80, 70 and 60.
python3 run_blast.py -g gene.txt -f brumae.fasta -p h37rv.fasta -n brumae_find
Scripts for the analysis to get genes with less than 300bp repeated in order to exclude this genes in Illumina genomic analysis.
- clean_genes.py: script to get from gff file a tabbed file with gene id, orientation, start nt and end nt.
- multifasta.py: script to get nt sequences for each gene from a fasta and a tsv file.
- process_mummer_output.py: script to process mummer aou
- command_mummer.sh: bash script in sequential order to get repeated genes from a gff file and a fasta file, also includes the command in mummer "run-mummer3".
- duplicated_genescoord.tsv: tabbed file with the genes to exclude for Illumina analysis -> with gene id + "\t" + orientation + "\t" + start + "\t" + end + "\n"
This folder contains multiple files related to the analyzed closed genome: Mycobacterium brumae ATCC 51384T
- brumae.fasta: fasta file with the genomic sequence of Mycobacterium brumae ATCC 51384T
- brumae.gff: annotation file used in this study.
- brumae.sqn: file for submission to NCBI.