## Setup Notebook
This notebook is designed to configure your VM to have the proper tools and data in place to run the transcriptome assembly training module.

We start by updating the system, and installing java, which is needed for Nextflow

In [None]:
#First install java
!sudo apt update
!sudo apt-get install default-jdk -y
!java -version

Install Mambaforge, which is needed for the supporting information for the TransPi databases.

In [None]:
!curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh
!bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge
!~/mambaforge/bin/mamba install -c bioconda sra-tools perl-dbd-sqlite perl-dbi -y

Install the program "bc" a calculator used in the TransPi.nf script

In [None]:
!sudo apt-get install -y bc

Create a variable that holds the base working directory (your current position) for later use in file paths.  

In [None]:
import subprocess
workdir=subprocess.check_output("pwd").decode("utf-8").rstrip()
workdir

Also create a variable to hold the address of the Google Bucket that holds materials for download

In [None]:
gbucket="gs://nigms-sandbox/nosi-inbremaine-storage"
gbucket

Clone the TransPi repository from github

In [None]:
!git clone https://github.com/PalMuc/TransPi.git

We have made some changes in TransPi, so download from our GS bucket, renaming and keeping the old file.

In [None]:
!mv TransPi/TransPi.nf TransPi/old.TransPi.nf
!gsutil cp $gbucket/TransPi/TransPi.nf ./
%mv TransPi.nf TransPi/TransPi.nf

Retrieve that uniprot_sprot fasta file for use as the "custom database" for annotation purposes.  We install it in its own directory in order to enable automation of the precheck_Transpi.sh script

In [None]:
!gsutil cp -r $gbucket/DBs/uniprot_db/ ./

Create the input string to check against for use with the pre_check script

In [None]:
input_string="2\ny\n2\n6\n1\n2\n" + workdir + "/uniprot_db/\n1\ny\n"
input_string

Run the TransPi setup script "precheck_TransPi.sh" -- this is normally an interactive script, but we are providing it with a set of answers with the string shown below.  

***VERY IMPORTANT*** There is a hard-coded directory in the middle of the string that points to */home/jupyter/nosi-mdibl-inbrecloud/uniprot_db/*  This is the default that we are expecting assuming that you are working within the cloned directory.  

The output just above this cell should look exactly the same as the string below.  If it doesn't, edit the string below to match that above.

In [None]:
%cd $workdir/TransPi
!bash ./precheck_TransPi.sh ./ <<< $'2\ny\n2\n6\n1\n2\n/home/jupyter/nosi-mdibl-inbrecloud/uniprot_db/\n1\ny\n'

***NOTE*** Sometimes the webservers that the script above uses are offline, which will hang the process and fail to complete installation.  If that happens to you, halt the process (use the stop button at the top).  Unfortunately, the script doesn't fail in a manner that allows it to be easily re-started, so you should create a new code cell below this markdown cell and then execute the following commands (in sequence).
>1. %cd \$workdir/TransPi/
>2. !gsutil -m cp -r \$gbucket/TransPi_DB_info/scripts/ ./
>3. !chmod --recursive a+x scripts/evigene
>4. !gsutil -m cp -r \$gbucket/TransPi_DB_info/DBs/ ./
>5. !cd ..
>6. !gsutil -m cp -r \$gbucket/TransPi_DB_info/nextflow ./
>7. !chmod a+x nextflow

***Also NOTE*** because of the use of markdown and a local environment variable, the raw markdown code in this cell includes a leading backslash (\\) that makes the markdown appear correctly, but that will not work if it is included in the command line argument.  So for example, the first line above should be executed as <pre>%cd $workdir/TransPi/</pre> rather than <pre>%cd \\$workdir/TransPi/</pre>

Put an executable copy of the nexflow program into the executable path at /usr/local/bin/

In [None]:
!sudo cp $workdir/TransPi/nextflow /usr/local/bin/nextflow
!sudo chmod a+rx /usr/local/bin/nextflow

Explicitly copy the sprot file into place, as the precheck script sometimes doesn't do so

In [None]:
!cp $workdir/uniprot_db/uniprot_sprot.fasta $workdir/TransPi/DBs/uniprot_db/

Make a directory in which to carry out the work

In [None]:
!mkdir $workdir/transpi_example
%cd $workdir/transpi_example
#copy the test sequences from the GS bucket (the -m argument allows for parallel transfer)
!gsutil -m cp -r $gbucket/seq2 ./