This part of the pipeline makes a Wagner parsimony Ancestral State Reconstruction of the entire *Clostridia* set for the core genome tree using the Count tool.

### Paths and parameters

#### Pipeline input folders

In [None]:
pa="05-pangenomes/merge/matrix.csv"
ann="07-PangenomeAnnotation/COG/mapper/all_protein_families_merge.emapper.annotations"
tree="08-core-phylogeny/subtrees/merge.contree"

#### Pipeline output folders

In [None]:
task_root="12-ASR-analysis"

mkdir -p $task_root

#### Tool pointers and parameters

In [None]:
convert_inputs="utils/convert_ASR_inputs.R"
Count="utils/Count.jar"

### Convert the inputs for use in Count

`Count` expects the input files to be in a certain format for easy import into the tool.

Leaf labels have to be made consistent so that we can derive which leaf is which assembly. The pangenome P/A matrix has to be converted into a plain count matrix instead of a dataframe of genomic coordinates. Finally, the COG annotation for each assembly has to be tabulated in a tsv file.

In [None]:
root=$(pwd)
cd $task_root

In [None]:
Rscript $root/$convert_inputs $root/$tree $root/$pa $root/$ann .

### Run Count in CLI mode

In [None]:
java -Xmx50G -cp $root/$Count ca.umontreal.iro.evolution.genecontent.AsymmetricWagner \
-gain 1 input_ready.tree matrix_counts.tsv > analysis_export.tsv

### Split the output file

In [None]:
grep '# FAMILY' analysis_export.tsv | cut -f 2- > families.tsv
grep '# PRESENCE' analysis_export.tsv | cut -f 2- > presences.tsv
grep '# CHANGE' analysis_export.tsv | cut -f 2- > changes.tsv
rm -f analysis_export.tsv

In [None]:
cd $root

### Do a gain/loss penalty sensitivity analysis

In [None]:
cd $task_root

In [None]:
mkdir -p ratio_sensitivity

In [None]:
range=( 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 )
for i in "${range[@]}"
do
    java -Xmx50G -cp $root/$Count ca.umontreal.iro.evolution.genecontent.AsymmetricWagner \
    -gain $i input_ready.tree matrix_counts.tsv > analysis_export.tsv
    
    grep '# FAMILY' analysis_export.tsv | cut -f 2- > ratio_sensitity/families_$1.tsv
    grep '# PRESENCE' analysis_export.tsv | cut -f 2- > ratio_sensitity/presences_$1.tsv
    grep '# CHANGE' analysis_export.tsv | cut -f 2- > ratio_sensitity/changes_$1.tsv
    rm -f analysis_export.tsv
done

In [None]:
cd $root