Skip to content

bamarshall2/evolutionary_flexibility

Repository files navigation

Evolutionary_flexibility

Code used to perform analysis in Evolutionary flexibility and rigidity in the bacterial MEP pathway

Requirements

Conda environments with:
Orfipy at https://github.com/urmi-21/orfipy
kofam_scan at https://github.com/takaram/kofam_scan

Step 1: download database of genomes as fasta files

NCBI has sortable database of genomes which can be sorted for quality. I recommend you only take genomes of the highest quality and consider renaming files to be human readable.
Copy genomes into "genome_database"
Make manifest of genomes in this database which will be used by scripts called "manifest.txt"
This can be achieved from evolutionary_flexibility folder by "ls genome_database/ >> manifest.txt"

Step 2: use orfipy to extract open reading frames (ORFs)

In conda environment with orfipy installed
Navigate into folder "extracted_orfs"
Run bash file present
"bash mass_run_orfipy.sh"

Collect output files in "collected_orf_outputs" folder
To put the output .faa files with the orfs in them, navigate to "extracted_orfs" and execute:
find -name '*.faa' -exec cp -t ./collected_orf_outputs/ {} +

Make another manifest where the names are exactly captured ls collected_orf_outputs/ >> manifest_with_extension.txt

Step 3: construct kofam_scan "ko_list" to genes of interest

Navigate into kofam directory Based on steps at : https://taylorreiter.github.io/2019-05-11-kofamscan/ Download ko_list
"wget ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz"
Download hmm profiles
"wget ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz"
Download kofamscan tool "wget ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz"
This can also be achieved by downloading the kofam_scan tar.gz file in https://www.genome.jp/ftp/tools/kofam_scan/
Download readme
"wget ftp://ftp.genome.jp/pub/tools/kofamscan/README.md"

Consider making a new conda environment for running kofamscan. Taylor Reiter performs the following:
"conda create -n kofamscan hmmer parallel"
"conda activate kofamscan"
"conda install -c conda-forge ruby"

decompress files
"gunzip ko_list.gz"
"tar -xvf profiles.tar.gz"
"tar -xvf kofamscan-VERSION.tar.gz"

Trim ko_list to ko's of interest named "ko_list_curated"
Note that the "ko_list_curated" included in this git iteration are the MEP and MVA relevant genes

Copy ko's of interest into their own folder called "profiles_curated"
This can be done in a single line by navigating into profiles/ and executing
cp $(grep -o -P '.{0,1}K.{5}' ../ko_list_curated | sed 's/$/.hmm/') ../profiles_curated/

In kofam_scan-VERSION edit config.yml file to reflect path to profile database and ko_list
note that name must be changed to config.yml, it should not be config-template.yml
config.yml should look like this:

#Path to your KO-HMM database #A database can be a .hmm file, a .hal file or a directory in which
#.hmm files are. Omit the extension if it is .hal or .hmm file
profile: ../profiles_curated

#Path to the KO list file
ko_list: ../ko_list_curated

Step 4: Perform ortholog identification

Navigate into kofam/kofam_scan-VERSION

run bash script that iteratively runs kofamscan over orf files for KO's of interest:
"bash mass_run_kofam_scan.sh"

If the program is erroring out because it is looking for profiles you are not interested in (ie. not on your ko_list_curated) then try emptying the tmp folder.

Step 5: Pull info from the output_files folder into a single file.

Navigate to the main folder, then run the R script that will catch genes if they have a score that exceeds the Kegg defined cutoff for identifying orthologs. Or, use the R script that will catch the gene with the lowest E-value. The results will be captured in evolutionary_flexibility/results/

"Rscript catch_KO_threshold.R"
"Rscript catch_min_escore.R"

Note, that these scripts will fail if there is an empty file. To fix this, note the name of the empty file, then remove it from the "output_files" folder

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published