Comparative Genomics Script

The shell script performs large scale genomics comparisons between 7 genomes using RagTag and Blast visualises them in Artemis. The script can be used to:

filter the input data (Seqkit)
re-arrange scaffolds according to a reference (RagTag)
find genomic similarities (Blast)
visualise genomic comparisons (Artemis)

Figure 1. Example output using Artemis for visualisaton (automatically generated).

INSTALL REQUIRED SOFTWARE

ARTEMIS

Install Artemis (for visualisation and genome broswing):

conda config --add channels bioconda   
conda config --add channels conda-forge
conda install artemis

Alternatively, you can get it directly from their from website:
http://sanger-pathogens.github.io/Artemis/Artemis/
https://www.sanger.ac.uk/tool/artemis-comparison-tool-act/

RAGTAG (for contig reordering)

Ragtag (for scaffold reordering according to reference)

conda install -c bioconda ragtag

BLAST NCBI (for genome comparisons)

sudo apt-get install ncbi-blast+

Seqkit and Sequence length HISTOGRAM

Visit website: https://bioinf.shenwei.me/seqkit/download/

pip install bashplotlib
conda install -c bioconda seqkit
conda install -c bioconda csvtk

HOW TO LAUNCH SCRIPT IN YOUR TERMINAL

Use the script in your terminal like this:

sh genome_comparison_7comps_with2Refs.sh \
       draft_genome1.fasta minimal_contig_length_1 \
       draft_genome2.fasta minimal_contig_length_2 \
       draft_genome3.fasta minimal_contig_length_3 \
       draft_genome4.fasta minimal_contig_length_4 \
       draft_genome4.fasta minimal_contig_length_5 \
       draft_genome5.fasta minimal_contig_length_6 \
       draft_genome6.fasta minimal_contig_length_7 \
       reference_genome.fasta minimal_contig_length_8 blast_parameter \    | tee log_file.txt

EXAMPLE SCRIPT:

sh genome_comparison_7comps_with2Refs.sh \
        SS1_clado_contigs_metaMDGB.fasta 500000 \
        SS7_clado_contigs_metaMDGB.fasta 500000 \
        SS8_clado_contigs_metaMDGB.fasta 500000 \
        WT10_clado_contigs_metaMDGB.fasta 500000 \
        SS3_clado_contigs_metaMDGB.fasta 500000 \
        SS5_clado_contigs_metaMDGB.fasta 500000 \
        SS9_clado_contigs_metaMDGB.fasta 500000 \
        GCA_947184155.1_Cgoreaui_SCF055-01_genomic.fna 500000 5000 \    | tee log_file.txt

INPUT PARAMETERS ARE:

draft_genome1.fasta = Your FIRST draft genome that you want to compare to a reference.
minimal_contig_length_1 = Any sequences below this length threshold get excluded to make the BLAST run faster.

draft_genome2.fasta = Your 2nd draft genome that you want to compare to a reference.
minimal_contig_length_2 = Any sequences below this length threshold get excluded to make the BLAST run faster.

draft_genome3.fasta = Your 3rd draft genome that you want to compare to a reference.
minimal_contig_length_3 = Any sequences below this length threshold get excluded to make the BLAST run faster.

draft_genome4.fasta = Your 4th draft genome that you want to compare to a reference. This is also a blast referral genome, which sits in the middle of the comparison and only acts as a blast databse, it does not get compared to any other genomes.
minimal_contig_length_4 = Any sequences below this length threshold get excluded to make the BLAST run faster.

draft_genome5.fasta = Your 5th draft genome that you want to compare to a reference.
minimal_contig_length_5 = Any sequences below this length threshold get excluded to make the BLAST run faster.

draft_genome6.fasta = Your 6th draft genome that you want to compare to a reference.
minimal_contig_length_6 = Any sequences below this length threshold get excluded to make the BLAST run faster.

draft_genome7.fasta = Your 7th draft genome that you want to compare to a reference.
minimal_contig_length_7 = Any sequences below this length threshold get excluded to make the BLAST run faster.

reference_genome = All genomic scaffolds will be re-arranged according to this reference file.
minimal_contig_length_8 = Any sequences below this length threshold get excluded to make the BLAST run faster.

blast_parameter = Word size blast parameter. Something between 2000 and 5000, higher requires more resources.
LOGFILE (| tee log_file.txt) = add this to the end of the command, to automatically save a logfile in the current directory.

NOTES

ADDITIONAL CONSIDERATIONS

All genome files need to be in the same directory. All software needs to be in your $PATH.
If you download Artemis manually, add the Artemis executable to your $PATH with:

export PATH=$PATH:/path/to/dir_from_artemis

The script creates an own directory in which it runs all the analysis. The directory name is according to input data.
All intermediate files are deleted at the end. If you like to keep them, just silence the respective code.

Artemis Comparison Tool is started with the respective genome comparison at the end of the script automatically. You can modify the very last line to start different Artemis ACT comparisons.
If there is an error with the crunch comparison files, adjust the contig and blast thresholds to make it less computational intense.

DEFINE INPUT PARAMETERS

THINGS TO ADJUST BEFORE RUNNING THE SCRIPT

OUTPUT DIRECTORY NAME # Change your output directory name in the line "## SET UP OUTPUT DIRECTORY"
RAGTAG SETTINGS # Available threads is currently set to 4
BLAST SETTINGS # Set the number of blast matches, keep at 5 for now to not make it overly complicated with the comparison

OUTPUT_DIRECTORY="NEW_SCRIPT_TEST"
NUMBER_OF_CORES="4"
WORD_SIZE_RAGTAG="5000"
MAX_TARGET_SEQS="5"

Figure 2. Example output using Artemis for visualisaton (automatically generated).

Name	Name	Last commit message	Last commit date
Latest commit PatrickBuerger Update genome_7comp_1ref_HPC.sh Apr 10, 2025 8e3f264 · Apr 10, 2025 History 43 Commits
Artemis_example_output1.png	Artemis_example_output1.png	Add files via upload	Jun 20, 2024
Artemis_example_output2.png	Artemis_example_output2.png	Add files via upload	Jun 20, 2024
LICENSE	LICENSE	Initial commit	Jun 19, 2024
README.md	README.md	Updated file	Jun 20, 2024
genome_7comp_1ref.sh	genome_7comp_1ref.sh	Update genome_7comp_1ref.sh	Jul 4, 2024
genome_7comp_1ref_HPC.sh	genome_7comp_1ref_HPC.sh	Update genome_7comp_1ref_HPC.sh	Apr 10, 2025
run_ComparativeGenomics_script.pbs	run_ComparativeGenomics_script.pbs	Update run_ComparativeGenomics_script.pbs	Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparative Genomics Script

INSTALL REQUIRED SOFTWARE

ARTEMIS

RAGTAG (for contig reordering)

BLAST NCBI (for genome comparisons)

Seqkit and Sequence length HISTOGRAM

HOW TO LAUNCH SCRIPT IN YOUR TERMINAL

INPUT PARAMETERS ARE:

NOTES

ADDITIONAL CONSIDERATIONS

DEFINE INPUT PARAMETERS

About

Releases

Packages

Languages

License

PatrickBuerger/ComparativeGenomics

Folders and files

Latest commit

History

Repository files navigation

Comparative Genomics Script

INSTALL REQUIRED SOFTWARE

ARTEMIS

RAGTAG (for contig reordering)

BLAST NCBI (for genome comparisons)

Seqkit and Sequence length HISTOGRAM

HOW TO LAUNCH SCRIPT IN YOUR TERMINAL

INPUT PARAMETERS ARE:

NOTES

ADDITIONAL CONSIDERATIONS

DEFINE INPUT PARAMETERS

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages