This part of the pipeline clusters all BGCs found for this entire genome dataset using BiG-SCAPE.

### Paths and parameters

#### Pipeline input folders

In [None]:
antismash_output="./10-MGEs/BGCs/output"
metadata="./genomes_metadata"

#### Pipeline output folders

In [None]:
task_root="./11-BGCClustering"
bigscape_input="$task_root/input"
bigscape_output="$task_root/output"
network_folder="$bigscape_output/network_files/*/mix"

mkdir -p $task_root $bigscape_input $bigscape_output

#### Tool pointers and parameters

In [None]:
pfam_db="/mnt/STORAGE/databases/PFAM"
n_cores=22
cutoff=0.50

annotate_network="./utils/annotate_bigscape_network.py"

### Checking dependencies

In [None]:
conda activate bigscape
bigscape --version
conda deactivate

### Gathering all antiSMASH region genbank files

Copy the antiSMASH region GenBank file directory structure into the BiG-SCAPE input folder, and then collapse it by pulling all GenBanks out of their folder.

In [None]:
dir -1 $antismash_output | xargs -I % bash -c "
mkdir -p $bigscape_input/%
dir -1 $antismash_output/% | grep -E '.+\.region[0-9]{3}\.gbk' | xargs -I {} cp -u $antismash_output/%/{} $bigscape_input/%/{}
dir -1 $bigscape_input/% | xargs -I {} mv $bigscape_input/%/{} $bigscape_input/%.{}
dir -1 $bigscape_input | grep -v .gbk | xargs -I {} rm -rf $bigscape_input/{}"

Finally, rename the files to the format `%assembly_accession_ID.%region#`.

In [None]:
root=$(pwd)
cd $bigscape_input
paste <(dir -1) <(dir -1 | cut -d '.' -f 3-) > ../new.filenames
while read old new
do
mv $old $new
done < ../new.filenames
rm ../new.filenames
cd $root

### Running BiG-SCAPE

In [None]:
conda activate bigscape

In [None]:
bigscape -i $bigscape_input -o $bigscape_output --pfam_dir $pfam_db -c $n_cores --include_singletons --cutoffs $cutoff --mix --mibig

In [None]:
conda deactivate

### Annotate network files

Add rRNA cluster metadata and duplicate the links so that the network can be easily imported into CytoScape.

In [None]:
network_file=$(dir -1 $network_folder | grep -E "$cutoff.network$")
python $annotate_network $network_folder/$network_file $metadata $antismash_output $task_root/c$cutoff.annotated.network

**Ready for visualisation in CytoScape!**