Structural biology - Practical day 01
=======================================

Part 01
-------

Document written by [AdriÃ¡n Diaz](mailto:adrian.diaz@vub.be) & [David Bickel](mailto:david.bickel@vub.be)

**Vrije Universiteit Brussel**

## The scenario

We have these proteins involved: 

- Gyrase (dimer) `Gyr:Gyr`
- Toxin (dimer) `CcdB:CcdB`
- Anti-toxin `CcdA`

### Sequences

The following code block contains the residues of both Toxin and Anti-toxin proteins in FASTA format. You will use them to create the prediction job.

```fasta
>CcdA
MKQRITVTVDSDSYQLLKAYDVNISGLVSTTMQNEARRLRAERWKAENQEGMAEVARFIEMNGSFADENRDW

>CcdB
MQFKVYTYKRESRYRLFVDVQSDIIDTPGRRMVIPLASARLLSDKVSRELYPVVHIGDESWRMMTTDMASVPVSVIGEEVADLSHRENDIKNAINLMFWGI
```

### Task A
Predict the following complexes using ColabFold:

- Group A: `CcdB:CcdB:CcdA`
- Group B: `CcdB:CcdB`


## AlphaFold2 Job
AlphaFold is installed for the Nvidia Ampere GPUs in Hydra. Submit your jobs with the options `--gpus-per-node=1 --partition=ampere_gpu` to specifically request one of those GPUs.

```bash
python3 docker/run_docker.py \
  --fasta_paths=sequences.fasta \
  --max_template_date=2021-11-01 \
  --model_preset=multimer \
  --data_dir=$DOWNLOAD_DIR \
  --output_dir=/home/user/absolute_path_to_the_output_dir
```

#### Group A

`CcdB:CcdB:CcdA`

In [None]:
job_name         = "structbio_multimer"
input_name       = "multimer.fasta"
output_path      = "multimer"

prediction_input = """
>CcdB_1
MQFKVYTYKRESRYRLFVDVQSDIIDTPGRRMVIPLASARLLSDKVSRELYPVVHIGDESWRMMTTDMASVPVSVIGEEVADLSHRENDIKNAINLMFWGI

>CcdB_2
MQFKVYTYKRESRYRLFVDVQSDIIDTPGRRMVIPLASARLLSDKVSRELYPVVHIGDESWRMMTTDMASVPVSVIGEEVADLSHRENDIKNAINLMFWGI

>ccdA_1
MKQRITVTVDSDSYQLLKAYDVNISGLVSTTMQNEARRLRAERWKAENQEGMAEVARFIEMNGSFADENRDW

"""

#### Group B

`CcdB:CcdB`

In [None]:
job_name         = "structbio_multimer"
input_name       = "multimer.fasta"
output_path      = "multimer"

prediction_input = """
>ccdA_1
MKQRITVTVDSDSYQLLKAYDVNISGLVSTTMQNEARRLRAERWKAENQEGMAEVARFIEMNGSFADENRDW

>_ccdA_2
MKQRITVTVDSDSYQLLKAYDVNISGLVSTTMQNEARRLRAERWKAENQEGMAEVARFIEMNGSFADENRDW

"""

#### Creating the files

In [None]:
%%bash
# Create input directory
mkdir -p ./input

# Create output directory
mkdir -p ./output

In [None]:
import os
input_path = os.path.abspath(os.path.join('./input', input_name))

print("Saving input file in", input_path)

with open(input_path, "w") as file_handler:
    file_handler.write(prediction_input)

### About Slurm jobs in Hydra

Slurm provides a complete toolbox to manage and control your jobs. Some of them carry out common tasks: 

- submitting job scripts to the queue (`sbatch`)
- printing information about the queue (`mysqueue`)

Jobs are descripted using Bash files with these header options:

- `--job-name=job_name` : Set job name to job_name
- `--time=DD-HH:MM:SS`: Define the time limit
- `--mail-type=BEGIN|END|FAIL|REQUEUE|ALL`: Conditions for sending alerts by email
- `--partition=cluster_type`: Request a cluster


More information: https://hpc.vub.be/docs/job-submission/

In [None]:
prediction_job = """
#!/bin/bash

# SCRIPT RESOURCE CONFIG:
#SBATCH --partition=ampere_gpu
#SBATCH --nodes=1
#SBATCH --gpus-per-node=1
#SBATCH --cpus-per-gpu=16
#SBATCH --job-name={job_name}
#SBATCH --time=02:00:00
#SBATCH --reservation=structbio1

# LOADING CUDA (GPU) DEPENDENCIES:
export CUDA_MPS_PIPE_DIRECTORY=$TMPDIR/nvidia-mps-pipe
export CUDA_MPS_LOG_DIRECTORY=$TMPDIR/nvidia-mps-log
nvidia-cuda-mps-control -d

# LOADING ALPHAFOLD IN MEMORY:
module load AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0

# SETUP INPUT/OUTPUT FILES
BASE_DIR={wrkdir}
mkdir -p $BASE_DIR/output/{output_path}

# LAUNCH ALPHAFOLD
run_alphafold.py \
    --model_preset=monomer_casp14 \
    --fasta_paths=$BASE_DIR/{input_name} \
    --max_template_date=2999-12-31 \
    --output_dir=$BASE_DIR/output/{output_path}
""".format(job_name=job_name, input_name=input_name, output_path=output_path, wrkdir=os.getcwd())

In [None]:
job_path = os.path.join("./input", job_name + ".sh")

print("Saving job file in", job_path)

with open(job_path, "w") as file_handler:
    file_handler.write(prediction_job)

print("Job file saved in", job_path)

## Job enqueueing

The job is submitted using the command `sbatch` followed by the name of our job file. You will receive a job identifier as output.

In [None]:
import subprocess

process = subprocess.Popen(['sbatch', job_path],
                     stdout=subprocess.PIPE,
                     stderr=subprocess.PIPE)

stdout, stderr = process.communicate()
return_code    = process.poll()

print(f"stdout (exit code={return_code}):")
for line in stdout.decode().split("\n"):
    print(line)

print(f"stderr (exit code={return_code}):")
for line in stderr.decode().split("\n"):
    print(line)

### Query the queue status
The command to run is `mysqueue`. While the job is running you could visualize the SLURM logs inside the `slurm-JOBID.out` file:

- Cat command: View the full content of the file. `cat slurm-JOBID.out`
- Tail command: View the last lines of the file. `tail -f slurm-JOBID.out` (with `-f` the command will follow the output automatically).

In [None]:
import subprocess

process = subprocess.Popen(['mysqueue'],
                     stdout=subprocess.PIPE,
                     stderr=subprocess.PIPE)

stdout, stderr = process.communicate()
return_code    = process.poll()

print(f"stdout (exit code={return_code}):")
for line in stdout.decode().split("\n"):
    print(line)

print(f"stderr (exit code={return_code}):")
for line in stderr.decode().split("\n"):
    print(line)

We are ready to continue working on the next Jupyter Notebook!