# BRAKER & TSEBRA Workshop

All software required for this workshop is available via the RESPONSE image in the AppHub. This notebook serves merely as documentation in case you ever have to install the required software in a different unix system without root permissions. Note that the RESPONSE image also contains some software for genome assembly that you might not need for genome annotation.

## Software installation

This installation builds on top of the DataScience+TF image at University of Greifswald.

I first installed my own conda, in terminal, interactive session:

```
echo "Installing my own conda because I can't install software with the pre-installed conda..."
wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh &> /dev/null
bash Anaconda3-2022.05-Linux-x86_64.sh
```

In [1]:
%%script bash
# ~10 seconds
echo "Installing zlib (depedency of flye assembler)..."
source ${HOME}/.bashrc
wget https://zlib.net/zlib-1.2.12.tar.gz &> /dev/null
tar -zxf zlib-1.2.12.tar.gz &> /dev/null
rm zlib-1.2.12.tar.gz
cd zlib-1.2.12
./configure --prefix=${HOME}/RESPONSE2022/zlib-1.2.12/build &> /dev/null
make &> /dev/null
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:${HOME}/RESPONSE2022/zlib-1.2.12/build/lib" >> ${HOME}/.bashrc
echo "Done."

Installing zlib (depedency of flye assembler)...
Done.


In [2]:
%%script bash
# ~2:11 minues
echo "Installing flye (long read assembly tool)..."
source ${HOME}/.bashrc
git clone https://github.com/fenderglass/Flye &> /dev/null
cd Flye
make &> /dev/null
export PATH=${PATH}:${HOME}/RESPONSE2022/Flye/bin
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/Flye/bin" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which flye

Installing flye (long read assembly tool)...
/home/jovyan/RESPONSE/Flye/bin/flye


### Installing busco (for assessing assemblies and predicted genes)... - I did this in interactive mode in a terminal

```
conda create -n busco_env -c conda-forge -c bioconda busco=5.4.2
# To activate this environment, use
#
#     $ conda activate busco_env
#
# To deactivate an active environment, use
#
#     $ conda deactivate
```
confirmed that busco works fine if kept in a separate channel, otherwise interference with BRAKER/augustus

In [3]:
%%script bash
echo "Installing blobtools (for assessing assemblies)..."
source ${HOME}/.bashrc
conda install -y -c bioconda blobtools &> /dev/null
conda activate
which blobtools

Installing blobtools (for assessing assemblies)...
/home/jovyan/anaconda3/bin/blobtools


In [5]:
%%script bash
echo "Installing seqstats (for assessing assemblies)..."
git clone --recursive https://github.com/clwgg/seqstats &> /dev/null
cd seqstats
make &> /dev/null
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/seqstats" >> ${HOME}/.bashrc
source ~/.bashrc
which seqstats

Installing seqstats (for assessing assemblies)...
/home/jovyan/RESPONSE/seqstats/seqstats


In [6]:
%%script bash
# ~12 seconds
echo "Installing minimap2 (long read mapping tool)..."
source ${HOME}/.bashrc
git clone https://github.com/lh3/minimap2 &> /dev/null
cd minimap2
make &> /dev/null
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/minimap2" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which minimap2

Installing minimap2 (long read mapping tool)...
/home/jovyan/RESPONSE/minimap2/minimap2


In [7]:
%%script bash
# 8 seconds
echo "Installing cdbfasta (fast entry access tools)..."
source ${HOME}/.bashrc
git clone https://github.com/gpertea/cdbfasta.git &> /dev/null
cd cdbfasta
make &> /dev/null
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/cdbfasta" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which cdbfasta
which cdbyank

Installing cdbfasta (fast entry access tools)...
/home/jovyan/RESPONSE/cdbfasta/cdbfasta
/home/jovyan/RESPONSE/cdbfasta/cdbyank


In [8]:
%%script bash
echo "Installing hisat2 (for RNA-Seq alignment)..."
source ${HOME}/.bashrc
# takes a long time to compile, I took a walk
git clone https://github.com/DaehwanKimLab/hisat2.git &> /dev/null
make &> /dev/null
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/hisat2" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which hisat2
which hisat2-build

Installing hisat2 (for RNA-Seq alignment)...
/home/jovyan/RESPONSE/hisat2/hisat2
/home/jovyan/RESPONSE/hisat2/hisat2-build


In [9]:
%%script bash
# ~1:43 minutes
echo "Installing m4 (dependency of autoconf)..."
source ${HOME}/.bashrc
wget http://ftp.gnu.org/gnu/m4/m4-1.4.19.tar.gz &> /dev/null
tar -xf m4-1.4.19.tar.gz
rm m4-1.4.19.tar.gz
cd m4-1.4.19
./configure --prefix=${HOME}/RESPONSE2022/m4-1.4.19/ &> /dev/null
make &> /dev/null
make install &> /dev/null
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/m4-1.4.19/bin" >> ${HOME}/.bashrc
echo "export M4=/home/jovyan/RESPONSE2022/m4-1.4.19/bin/m4" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
echo "Done."

Installing m4 (dependency of autoconf)...
Done.


In [10]:
%%script bash
# ~18 seconds
echo "Installing autoconf (dependency of htslib)..."
source ${HOME}/.bashrc
wget https://alpha.gnu.org/pub/gnu/autoconf/autoconf-2.69e.tar.gz &> /dev/null
tar -xf autoconf-2.69e.tar.gz
rm autoconf-2.69e.tar.gz
cd autoconf-2.69e
./configure --prefix=${HOME}/RESPONSE2022/autoconf-2.69e/bin &> /dev/null
make &> /dev/null
make install &> /dev/null
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/autoconf-2.69e/bin/bin" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which autoconf

Installing autoconf (dependency of htslib)...
/home/jovyan/RESPONSE/autoconf-2.69e/bin/bin/autoconf


In [11]:
%%script bash
# ~12 seconds
echo "Installing automake (dependency of htslib)..."
source ${HOME}/.bashrc
wget https://ftp.gnu.org/gnu/automake/automake-1.16.5.tar.gz &> /dev/null
tar -xf automake-1.16.5.tar.gz
rm automake-1.16.5.tar.gz
cd automake-1.16.5
./configure --prefix=${HOME}/RESPONSE2022/automake-1.16.5/bin/bin &> /dev/null
make &> /dev/null
make install &> /dev/null
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/automake-1.16.5/bin/bin/bin" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which automake

Installing automake (dependency of htslib)...
/home/jovyan/RESPONSE/automake-1.16.5/bin/bin/bin/automake


In [12]:
%%script bash
echo "Installing meson (dependency of htslib)..."
source ${HOME}/.bashrc
pip3 install --user meson &> /dev/null
echo "export PATH=\$PATH:${HOME}/.local/bin" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which meson

Installing meson (dependency of htslib)...
/home/jovyan/.local/bin/meson


In [13]:
%%script bash
# 37 seconds
echo "Installing ninja (dependency of htslib)..."
source ${HOME}/.bashrc
git clone https://github.com/ninja-build/ninja.git &> /dev/null
cd ninja
./configure.py --bootstrap &> /dev/null
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/ninja" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which ninja

Installing ninja (dependency of htslib)...
/home/jovyan/RESPONSE/ninja/ninja


In [14]:
%%script bash
# ~6 seconds
echo "Installing bzip2 (dependency of htslib)..."
source ${HOME}/.bashrc
git clone https://github.com/libarchive/bzip2.git &> /dev/null
cd bzip2
meson --prefix ${HOME}/RESPONSE2022/bzip2/bin ${HOME}/RESPONSE2022/bzip2/builddir  &> /dev/null
ninja -C ${HOME}/RESPONSE2022/bzip2/builddir  &> /dev/null
ninja -C ${HOME}/RESPONSE2022/bzip2/builddir install  &> /dev/null
echo "export PATH=${HOME}/RESPONSE2022/bzip2/bin/bin:\$PATH" >> ${HOME}/.bashrc
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:${HOME}/RESPONSE2022/bzip2/bin/lib/x86_64-linux-gnu" >> ${HOME}/.bashrc
echo "export LIBRARY_PATH=\$LIBRARY_PATH:${HOME}/RESPONSE2022/bzip2/bin/lib/x86_64-linux-gnu" >> ${HOME}/.bashrc
echo "export CPATH=\$CPATH:${HOME}/RESPONSE2022/bzip2/bin/include" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which bzip2

Installing bzip2 (dependency of htslib)...
/home/jovyan/RESPONSE/bzip2/bin/bin/bzip2


In [15]:
%%script bash
# 1:30 minutes
echo "Installing htslib (dependency of samtools)..."
source ${HOME}/.bashrc
git clone https://github.com/samtools/htslib.git &> /dev/null
cd htslib
git submodule update --init --recursive &> /dev/null
autoreconf -i
./configure --prefix=${HOME}/RESPONSE2022/htslib/bin &> /dev/null
make &> /dev/null
make install &> /dev/null
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:${HOME}/RESPONSE2022/htslib/bin/lib" >> ${HOME}/.bashrc
echo "export LIBRARY_PATH=\$LIBRARY_PATH:${HOME}/RESPONSE2022/htslib/bin/lib" >> ${HOME}/.bashrc
echo "export CPATH=\$CPATH:${HOME}/RESPONSE2022/htslib/bin/include/htslib" >> ${HOME}/.bashrc
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/htslib/bin/bin" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which tabix

Installing htslib (dependency of samtools)...
/home/jovyan/anaconda3/bin/tabix


In [16]:
%%script bash
# 2:15 minutes
echo "Installing ncurses (dependency of samtools)..."
source ${HOME}/.bashrc
wget https://ftp.gnu.org/pub/gnu/ncurses/ncurses-6.3.tar.gz &> /dev/null
tar -xf ncurses-6.3.tar.gz
rm ncurses-6.3.tar.gz
cd ncurses-6.3
./configure --prefix=${HOME}/RESPONSE2022/ncurses-6.3/bin &> /dev/null
make &> /dev/null
make install &> /dev/null
echo "export PATH=\$PATH::${HOME}/RESPONSE2022/ncurses-6.3/bin/bin" >> ${HOME}/.bashrc
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:${HOME}/RESPONSE2022/ncurses-6.3/bin/lib" >> ${HOME}/.bashrc
echo "export LIBRARY_PATH=\$LIBRARY_PATH:${HOME}/RESPONSE20222/ncurses-6.3/bin/lib" >> ${HOME}/.bashrc
echo "export CPATH=\$CPATH:${HOME}/RESPONSE2022/ncurses-6.3/bin/include/" >> ${HOME}/.bashrc
echo "Done."

Installing ncurses (dependency of samtools)...
Done.


In [17]:
%%script bash
# ~40 seconds
echo "Installing samtools (dependency of BRAKER, also required for genome assembly)..."
source ${HOME}/.bashrc
git clone https://github.com/samtools/samtools.git &> /dev/null
cd samtools
autoreconf -i &> /dev/null
./configure --prefix=${HOME}/RESPONSE2022/samtools/bin &> /dev/null
make &> /dev/null
make install &> /dev/null
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/samtools/bin/bin" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which samtools

Installing samtools (dependency of BRAKER, also required for genome assembly)...
/home/jovyan/anaconda3/bin/samtools


In [18]:
%%script bash
# ~1 second
echo "Installing DIAMOND (dependency of BRAKER, also required for genome assembly)..."
source ${HOME}/.bashrc
mkdir diamond
cd diamond
wget http://github.com/bbuchfink/diamond/releases/download/v2.0.15/diamond-linux64.tar.gz &> /dev/null
tar -xf diamond-linux64.tar.gz
rm diamond-linux64.tar.gz
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/diamond" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which diamond

Installing DIAMOND (dependency of BRAKER, also required for genome assembly)...
/home/jovyan/RESPONSE/diamond/diamond


In [19]:
%%script bash
echo "Installing Boost (dependency of AUGUSTUS, we need this exact version for the precompiled binaries, shared objects...)"
source ${HOME}/.bashrc
wget https://boostorg.jfrog.io/artifactory/main/release/1.75.0/source/boost_1_75_0.tar.gz &> /dev/null
tar -xf boost_1_75_0.tar.gz
rm boost_1_75_0.tar.gz
cd boost_1_75_0
./bootstrap.sh --prefix=${HOME}/RESPONSE2022/boost_1_75_0/bin &> /dev/null
./b2 headers &> /dev/null
./b2 &> /dev/null
./b2 --with-serialization
echo "export BOOST_INCLUDEDIR=${HOME}/RESPONSE2022/boost_1_75_0/boost" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:${HOME}/RESPONSE2022/boost_1_75_0/stage/lib" >> ${HOME}/.bashrc
echo "Done."

Installing Boost (dependency of AUGUSTUS, we need this exact version for the precompiled binaries, shared objects...)
Done.


In [20]:
%%script bash
echo "Installing bamtools (dependency of bam2hints...)"
source ${HOME}/.bashrc
git clone https://github.com/pezmaster31/bamtools.git &> /dev/null
mkdir build
cmake &> /dev/null
make &> /dev/null
make install &> /dev/null
echo "export PATH=${PATH}:\$HOME/RESPONSE2022/bamtools/build/bin" >> ${HOME}/.bashrc
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:${HOME}/RESPONSE2022/bamtools/build/lib" >> ${HOME}/.bashrc
echo "export LIBRARY_PATH=\$LIBRARY_PATH:${HOME}/RESPONSE2022/bamtools/build/lib" >> ${HOME}/.bashrc
echo "export CPATH=\$CPATH:${HOME}/RESPONSE2022/bamtools/build/include/" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which bamtools

Installing bamtools (dependency of bam2hints...)
/home/jovyan/RESPONSE/bamtools/build/bin/bamtools


In [21]:
%%script bash
echo "Installing AUGUSTUS (dependency of BRAKER, this is a mess)..."
source ${HOME}/.bashrc
git clone https://github.com/Gaius-Augustus/Augustus.git &> /dev/null # yes, we need this despite binary download
cd Augustus/bin
echo "Downloading pre-compiled augustus binaries (because compiling is too painful)..."
wget https://bioinf.uni-greifswald.de/bioinf/katharina/RESPONSE/augustus &> /dev/null
wget https://bioinf.uni-greifswald.de/bioinf/katharina/RESPONSE/etraining &> /dev/null
wget https://bioinf.uni-greifswald.de/bioinf/katharina/RESPONSE/joingenes &> /dev/null
chmod u+x *
cd ..
echo "Compiling bam2hints (because pre-compiled binaries do not work)..."
# modifying common.mk for auxprogs compilation
cat common.mk > common.tmp
echo "INCLUDE_PATH_ZLIB := -I/home/jovyan/RESPONSE2022/zlib-1.2.12/build/include" >> common.tmp
echo "LIBRARY_PATH_ZLIB := -L/home/jovyan/RESPONSE2022/zlib-1.2.12/build/lib -Wl,-rpath,/home/jovyan/RESPONSE2022/zlib-1.2.12/build/lib" >> common.tmp
echo "INCLUDE_PATH_BAMTOOLS := -I/home/jovyan/RESPONSE2022/bamtools/build/include/bamtools" >> common.tmp
echo "LIBRARY_PATH_BAMTOOLS := -L/home/jovyan/RESPONSE2022/bamtools/build/lib -Wl,-rpath,/home/jovyan/RESPONSE/bamtools/build/lib" >> common.tmp
mv common.tmp common.mk
make auxprogs &> /dev/null # don't care if things after bam2hints fail to compile, only bam2hints matters
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/Augustus/bin:${HOME}/RESPONSE2022/Augustus/scripts" >> ${HOME}/.bashrc
echo "export AUGUSTUS_CONFIG_PATH=${HOME}/RESPONSE2022/Augustus/config" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which augustus
which etraining
which bam2hints
which joingenes
which optimize_augustus.pl

Installing AUGUSTUS (dependency of BRAKER, this is a mess)...
Downloading pre-compiled augustus binaries (because compiling is too painful)...
Compiling bam2hints (because pre-compiled binaries do not work)...
./augustus
./etraining
./bam2hints
./joingenes
/home/jovyan/RESPONSE/Augustus/scripts/optimize_augustus.pl


In [22]:
%%script bash
echo "Installing NCBI-BLAST (dependency of BRAKER, we could probably do with the system installed blast that is in DataScience Notebook, didn't try that though)..."
wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.13.0+-x64-linux.tar.gz &> /dev/null
tar -zxf ncbi-blast-2.13.0+-x64-linux.tar.gz &> /dev/null
rm ncbi-blast-2.13.0+-x64-linux.tar.gz
echo "export PATH=${HOME}/RESPONSE2022/ncbi-blast-2.13.0+/bin:\$PATH" >> ${HOME}/.bashrc
which blastp # representative for the other binaries

Installing NCBI-BLAST (dependency of BRAKER, we could probably do with the system installed blast that is in DataScience Notebook, didn't try that though)...
/usr/bin/blastp


In [62]:
%%script bash
echo "Installing Perl (dependencies of BRAKER)..."
source ${HOME}/.bashrc
conda install -y -c anaconda perl
conda install -y -c bioconda perl-app-cpanminus
conda install -y -c bioconda perl-hash-merge
conda install -y -c bioconda perl-parallel-forkmanager
conda install -y -c bioconda perl-scalar-util-numeric
conda install -y -c bioconda perl-yaml
conda install -y -c bioconda perl-class-data-inheritable
conda install -y -c bioconda perl-exception-class
conda install -y -c bioconda perl-test-pod
conda install -y -c anaconda biopython
conda install -y -c bioconda perl-file-which
conda install -y -c bioconda perl-mce
conda install -y -c bioconda perl-threaded
conda install -y -c bioconda perl-list-util
conda install -y -c bioconda perl-math-utils
echo "Done."

Installing Perl (dependencies of BRAKER)...
Done.


## Install GeneMark & ProtHint

This cannot be compiled into an image because it has a license that every individual has to sign, and the license expires after 200 days. Every student will have to install their own GeneMark in the following way:

   * Go to http://exon.gatech.edu/GeneMark/license_download.cgi
   * Select GeneMark-ES/ET/EP ver 4.69_lic LINUX 64 kernel 2.6 - 3
   * Fill in Name, Institution = "University of Greifswald", Country
   * Click on agree to the license agreement

On the next website, click on "here" to download **gmes_linux_64.tar.gz**, and click on "64_bit" to download a file **gm_key_64.gz**.

In the AppHub, upload both files into your home directory.

### Not for image

This code is not for building an image, it's for students, later. I only have it here to test whether everything is running or whether dependencies are still missing.

```
cd ${HOME}
source .bashrc
gunzip gmes_linux_64.tar.gz
tar -xf gmes_linux_64.tar
export PATH=${PATH}:${HOME}/gmes_linux_64
echo "export PATH=$PATH" >> ${HOME}/.bashrc
cd gmes_linux_64
change_path_in_perl_scripts.pl "/home/jovyan/anaconda3/bin/perl" &> /dev/null
gunzip gm_key_64.gz
mv gm_key_64 ${HOME}/.gm_key
which gmes_petap.pl
cd ${HOME}
git clone https://github.com/gatech-genemark/ProtHint.git &> /dev/null
cp -r ${HOME}/gmes_linux_64_4/* ProtHint/dependencies/GeneMarkES
export PATH=${PATH}:${HOME}/ProtHint/bin
echo "export PATH=$PATH" >> ${HOME}/.bashrc
which prothint.py # representative for the other scripts
```

In [23]:
%%script bash
echo "Installing BRAKER..."
source ${HOME}/.bashrc
git clone https://github.com/Gaius-Augustus/BRAKER.git &> /dev/null
cd BRAKER/example
wget http://bioinf.uni-greifswald.de/augustus/datasets/RNAseq.bam &> /dev/null # file needed for running tests
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/BRAKER/scripts" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which braker.pl
# if genemark & prothint are present:
# ${HOME}/RESPONSE2022/BRAKER/example/tests/test1.sh # should run a while... adding --skipOptimize reduces runtime
# ${HOME}/RESPONSE2022/BRAKER/example/tests/test2.sh # should run a while... adding --skipOptimize reduces runtime
# both tests worked fine with all the dependencies from my environment

Installing BRAKER...
/home/jovyan/RESPONSE/BRAKER/scripts/braker.pl


In [24]:
%%script bash
echo "Installing TSEBRA..."
source ${HOME}/.bashrc
git clone https://github.com/Gaius-Augustus/TSEBRA &> /dev/null
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/TSEBRA/bin" >> ${HOME}/.bashrc
source ${HOME}/.bashrc
which tsebra.py

Installing TSEBRA...
/home/jovyan/RESPONSE/TSEBRA/bin/tsebra.py


In [26]:
%%script bash 
echo "Installing MakeHub..."
source ${HOME}/.bashrc
git clone https://github.com/Gaius-Augustus/MakeHub.git &> /dev/null
cd MakeHub
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/bedToBigBed &> /dev/null
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/genePredCheck &> /dev/null
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/faToTwoBit &> /dev/null
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/gtfToGenePred &> /dev/null
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/hgGcPercent &> /dev/null
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/ixIxx &> /dev/null
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/twoBitInfo &> /dev/null
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/wigToBigWig &> /dev/null
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/genePredToBed &> /dev/null
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64.v369/genePredToBigGenePred &> /dev/null
chmod u+x bedToBigBed genePredCheck faToTwoBit gtfToGenePred hgGcPercent ixIxx  twoBitInfo wigToBigWig genePredToBed genePredToBigGenePred make_hub.py
echo "export PATH=\$PATH:${HOME}/RESPONSE2022/MakeHub" >> ${HOME}/.bashrc
which make_hub.py

Installing MakeHub...
./make_hub.py


## Data preparation Blobtools

Do **not** attempt to do this in a user instance of the AppHub! The data is way larger than your maximum home directory size. The database has been prepared at alphafold_data for you.

#### Download NCBI taxdump and create nodesdb

In [None]:
%%script bash
source ${HOME}/.bashrc
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz -P taxonomy/ &> /dev/null
tar zxf taxonomy/taxdump.tar.gz -C blobtools_db/taxonomy/ nodes.dmp names.dmp
blobtools nodesdb --nodes blobtools_db/taxonomy/nodes.dmp --names blobtools_db/taxonomy/names.dmp

#### Download and extract UniProt reference proteomes

In [None]:
%%script bash
source ${HOME}/.bashrc
UNIPROT=./blobtools_db/uniprot_2022_07_27
TAXDUMP=./blobtools_db/taxonomy
mkdir -p $UNIPROT
wget -q -O $UNIPROT/reference_proteomes.tar.gz \
 ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/$(curl \
     -vs ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/ 2>&1 | \
     awk '/tar.gz/ {print $9}')
cd $UNIPROT
tar xf reference_proteomes.tar.gz

touch reference_proteomes.fasta.gz
find . -mindepth 2 | grep "fasta.gz" | grep -v 'DNA' | grep -v 'additional' | xargs cat >> reference_proteomes.fasta.gz

printf "accession\taccession.version\ttaxid\tgi\n" > reference_proteomes.taxid_map
zcat */*/*.idmapping.gz | grep "NCBI_TaxID" | awk '{print $1 "\t" $1 "\t" $3 "\t" 0}' >> reference_proteomes.taxid_map

diamond makedb -p 16 --in ${UNIPROT}/reference_proteomes.fasta.gz --taxonmap ${UNIPROT}/reference_proteomes.taxid_map --taxonnodes $TAXDUMP/nodes.dmp --taxonnames $TAXDUMP/names.dmp -d ${UNIPROT}/reference_proteomes.dmnd
cd -

## Data preparation BRAKER2

Do **not** attempt to do this in a user instance of the AppHub! The data is way larger than your maximum home directory size. The database has been prepared at alphafold_data for you.