# WormCat and WormCat Batch installation


This notebook provides step-by-step instructions for installing WormCat and WormCat Batch on a Linux Machine Running Conda.

---

The reasons you may desire to execute your own instance of WormCat and WormCat Batch:

1. You have _many_ gene sets to run against, and you do not want to manually run them through the WormCat website.
2. You would like to provide your own Annotation file or modify one of the provided Annotation files.
3. You desire to integrate WormCat into a Bioinformatics pipeline.

---

Before installing, be sure the other more straightforward ways to work with WormCat have been reviewed.

* The WormCat [website](http://www.wormcat.com) is the primary way researcher interact WormCat
* The WormCat [Docker Container](https://hub.docker.com/r/danhumassmed/wormcat_batch) is a more convenient way to run locally


## BEFORE STARTING Validate the current Compute Environment
Let's compare my system to yours to check for alignment.

* We should be running some version of `Linux`.
* And the underlying Architecture should be `x86_64`. 
    * x86_64 Tells us we are Running on 64 Bit Intel/AMD Hardware
* The Linux Distribution is less important but good to know for comparison.
  * I Will be installing on a Long Term Support (LTS) version of Ubuntu 
* Finally, we should be running a relatively up-to-date version of Conda.

__Note:__ All the commands executed in %%bash windows can also be directly copied into a bash command line shell. 

In [None]:
%%bash
# What Type of OS and Architecture is being used?
uname -sp

In [None]:
%%bash
# What Linux Distribution is being used?
lsb_release -a

In [None]:
%%bash
# What Version of Conda are we using?
conda --version


## Installation Process

1. Create a Conda Environment
2. Install Python (WormCat Batch is a Python Program)
3. Install R and R Devtools (WormCat is an R Program)
4. Install the Prerequisite packages for WormCat
5. Install WormCat & WormCat Batch

#### Create a new conda environment and install Python 3.9 as the base

* __Note:__ We could use any Version of Python 3.5 or greater
* __Note:__ ipykernel installs IPython Kernel for Jupyter (interactive Python)
    * ipykernel is _Not required_ if you are not using Jupyter Notebooks


In [None]:
%%bash
conda create -q -n wormcat_env python=3.9 ipykernel

#### Install R and devtools

* Here we install R 4.3.1 but any R greater than 4.0.0 should work (although not specifically tested)
* We are also installing devtool to support the later installation of WormCat


__Note:__

* You can install `conda install -n base conda-libmamba-solver`
* And add __--solver=libmamba__ as a suffix to you `conda install` commands
* This program is _NOT required_ but makes Conda resolver run significantly faster.

In [None]:
%%bash
conda install -q -y -n wormcat_env -c conda-forge r-base=4.3.1 --solver=libmamba
conda install -q -y -n wormcat_env -c r r-devtools --solver=libmamba


#### Install the WormCat & WormCat Batch dependencies

* WormCat requires: ggplot2, ggthemes, plyr, and svglite
* WormCat Batch requires argparse 
    * __Note:__ Although WormCat Batch is a Python program, it requires R's argparse to call WormCat correctly.


In [None]:
%%bash
conda install -q -y -n wormcat_env -c conda-forge r-ggplot2 r-ggthemes r-plyr r-svglite
conda install -q -y -n wormcat_env -c bioconda r-argparse

#### Install the WormCat & WormCat Batch

* Activate the newly created Conda Environment
* Install wormcat
* Install wormcat_batch

In [None]:
%%bash
eval "$(conda shell.bash hook)"
conda activate wormcat_env

R -e "library('devtools'); install_github('dphiggs01/wormcat', dep = FALSE)"
pip install wormcat_batch

# Before Testing, set the Environment to `wormcat_env`

If you are working from the console, you have already done this above:`conda activate wormcat_env.`


In [None]:
%%bash
# Ensure wormcat_cli is pointing to the environment we just installed and activated (wormcat_env)
which wormcat_cli


In [None]:
%%bash
# Take a look at the WormCat Batch help.
wormcat_cli --help

### Run a test using an Excel file

__Steps__
1. Create a Test Directory and make sure it is empty
2. Download an Example Excel file to test with
3. Execute WormCat Batch cli (Command Line Interface)
4. Evaluate the Results

In [None]:
%%bash
mkdir -p ~/wormcat_test  # Make a directory if it does not exist
rm -rf ~/wormcat_test/*  # Delete the contents if we have already run some tests here
cd ~/wormcat_test        # Change into the test directory
wget -q -O Murphy_TS.xlsx http://www.wormcat.com/static/download/Murphy_TS.xlsx # Download a sample Excel
wget -q -O customized.csv https://dphiggs01.github.io/Wormcat_data/data/whole_genome_v2_nov-11-2021.csv # Download a sample Annotation file
ls -lh ~/wormcat_test        # List the current directory content

#### Make a simple call to WormCat Batch

* Here, we are providing an Excel file as input and are using a Customized Annotation file

In [None]:
%%bash
cd ~/wormcat_test
wormcat_cli --input-excel ./Murphy_TS.xlsx --output-path ./wormcat_out_xlsx --annotation-file ./customized.csv

#### Let's take a look at the output

__tree__ is a lunix utility to pretty print a Ditectory structure

Here we can see that the output directory is created and the zipfile with the results are included

__Note:__ 
* If you do not have __tree__ installed `sudo apt-get install tree`


In [None]:
%%bash
cd ~/wormcat_test
tree

#### Call WormCat Batch with a Directory that contains the the Gene Sets to be processed

* Here, we are providing a Directory as input 
* and we are passing the Annotation file name
* and we are not deleting the intermediate files

In [None]:
%%bash
rm -rf ~/wormcat_test/* 
cd ~/wormcat_test

echo "1. wget -q -O Murphy_TS_csv.zip"
wget -q -O Murphy_TS_csv.zip https://github.com/dphiggs01/Wormcat_batch/raw/master/docker/wormcat_batch/Murphy_TS_csv.zip
echo

echo "2. unzip Murphy_TS_csv.zip"
unzip Murphy_TS_csv.zip
echo

echo "3. List Directory Content"
ls -l Murphy_TS_csv
echo

echo "4. Look at Content of one of the files to be processed"
head Murphy_TS_csv/hypodermis.csv

In [None]:
%%bash
cd ~/wormcat_test/
wormcat_cli --input-csv-path ./Murphy_TS_csv --output-path ./wormcat_out_csv --annotation-file whole_genome_v2_nov-11-2021.csv --clean-temp False

In [None]:
%%bash
cd ~/wormcat_test
tree

#### Call WormCat R function directly

* Here, we are providing one csv file
* The output directory  
* and we are pointing to an Annotation File on the Web!

In [None]:
%%bash
conda install -q -y -n wormcat_env -c r rpy2

In [None]:
%load_ext rpy2.ipython
#Enable R Calls

In [None]:
%%R
library(wormcat)

file_to_process <- "~/wormcat_test/Murphy_TS_csv/hypodermis.csv"
title <- "hypodermis"
output_dir <- "~/wormcat_test/wormcat_out_single"
rm_dir <- FALSE
annotation_file <- "https://dphiggs01.github.io/Wormcat_data/data/whole_genome_v2_nov-11-2021.csv"
input_type <- "Wormbase.ID"
zip_files <- FALSE


# Call the WormCat function
cat("Calling wormcat\n")
worm_cat_fun(file_to_process,
             title,
             output_dir,
             rm_dir,
             annotation_file,
             input_type,
             zip_files)
