# OpenFold Local Notebook

Provides the flexibility to run inference on a target sequence using a local installation of OpenFold with Docker, along with the convenience of visualizing results using the same plots from the OpenFold Colab Notebook.

This notebook utilizes the provided utility functions to execute OpenFold via Docker. It includes logic to handle results, allowing you to experiment with different parameters, reuse computed MSAs, filter the best model, and plot metrics. It also supports asynchronous and long-running executions.

If you have access to a machine and want to perform quick inference and visualize results, this notebook offers several useful features:

- Use precomputed alignments, enabling you to run inference with different model parameters for result comparison.
- Identify the best model and generate metric plots.
- Manage long-running executions.
- Work with large datasets by splitting your input and performing asynchronous runs using threads on multiple GPUs.

While you can achieve this entirely through Docker commands in the terminal, you would need to code or adjust the Colab functions to work with local data. This notebook gives you a head start.

## Setup the notebook

Fist, build Openfold using Docker. Follow this [guide](https://openfold.readthedocs.io/en/latest/original_readme.html#building-and-using-the-docker-container).

Then, go to the notebook folder

`cd notebooks`

Create an environment to run Jupyter with the requirements

`mamba create -n openfold_notebook python==3.10`

Activate the environment

`mamba activate openfold_notebook`

Install the requirements

`pip install -r src/requirements.txt`

Start your Jupyter server in the current folder

`jupyter lab . --ip="0.0.0.0"`

Access the notebook URL or connect remotely using VSCode.


## Running Inference 

**Inputs:** files or strings with sequences

**Output:** 

```bash
data/ 
├── run_<date>_<run_id>/ # each is run stored with a random ID, this id can be use to re-run inference 
│   ├── fasta_dir/ 
│   │   ├── tmp/ # generated .fasta file per sequence
│   │   └── sequences.fasta # validated input sequences are merged into a .fasta file
│   └── output/
│       ├── alignments/ #  one folder per sequence of resulted MSA
│       ├── msa_plots/ #  one file per aligment  .png
│       ├── predictions/ # inference results .pkl and .pdb files
│       ├── selected_predictions/ # selected best inferece and metrics plots
│       └── timings.json # inference time
```

#### Initialize the client

In [3]:
import docker
from src.inference import InferenceClientOpenFold

# You can also use a remote docker server 
docker_client = docker.from_env()

# i.e connect to the remote Docker daemon
# remote_docker_client = docker.DockerClient(base_url='tcp://<remote_server_ip>:2375')

# Initialize the OpenFold Docker client setting the database path 

databases_dir = "/path/to/databases"

openfold_client = InferenceClientOpenFold(databases_dir, docker_client)

#### Inference using a sequence string

In [4]:
# For multiple sequences, separate sequences with a colon `:`
input_string = "DAGAQGAAIGSPGVLSGNVVQVPVHVPVNVCGNTVSVIGLLNPAFGNTCVNA:AGETGRTGVLVTSSATNDGDSGWGRFAG"

model_name = "multimer" # or "monomer"
weight_set = 'AlphaFold' # or 'OpenFold'

# Run inference
run_id = openfold_client.run_inference(weight_set, model_name, inference_input=input_string)

Run ID: 0KCTJ6
Fasta root directory: /home/juliocesar/Models/openfold/notebooks/data/run_27_08_24_23_06_24_0KCTJ6/fasta_dir
Sequences written to FASTA file: /home/juliocesar/Models/openfold/notebooks/data/run_27_08_24_23_06_24_0KCTJ6/fasta_dir/sequences.fasta
Saved sequence_E3XARJK5 to /home/juliocesar/Models/openfold/notebooks/data/run_27_08_24_23_06_24_0KCTJ6/fasta_dir/tmp/sequence_E3XARJK5.fasta
Saved sequence_OYILNP8A to /home/juliocesar/Models/openfold/notebooks/data/run_27_08_24_23_06_24_0KCTJ6/fasta_dir/tmp/sequence_OYILNP8A.fasta
Running Docker container...



== CUDA ==

CUDA Version 11.3.1

Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.



#### Inference using a fasta file

In [None]:
input_file = "/path/to/test.fasta"

run_id = openfold_client.run_inference(weight_set, model_name, inference_input=input_file)

#### Inference using pre-computed aligments for a run_id

In [None]:
model_name = "monomer"
weight_set = 'OpenFold'

openfold_client.run_inference(weight_set, model_name, use_precomputed_alignments=True, run_id=run_id)

## Metrics and Visualizations 

#### Get the MSA Plots for one sequence in a run

In [None]:
from src.plot_msas import get_msa_plot

# Provide the fasta sequence id and the run_id
get_msa_plot("<your_run_id>", fasta_id="<sequence_id>")

In [None]:

# To get all sequence aligments
get_msa_plot("<your_run_id>")

#### Get the best prediction by pLDDT and metrics

In [None]:
from src.metrics import get_metrics_and_visualizations, plot_plddt_legend

model_name = "multimer"
weight_set = 'AlphaFold'

plot_plddt_legend()
get_metrics_and_visualizations("<your_run_id>", weight_set, model_name, "sequence_id", relax_prediction=True)

## Concurrent/Async inference

If you have multiple cards and want to run concurrent inference for experiments

In [None]:
from concurrent.futures import ProcessPoolExecutor, as_completed
import signal

def experiment_1():
    print("Experiment 1 is running")
    input_file = "/path/to/experiment_1.fasta"
    gpu = "cuda:0"
    model_name = "multimer"
    weight_set = 'AlphaFold'
    run_id = openfold_client.run_inference(weight_set, model_name, inference_input=input_file, gpu=gpu)  
    return "Experiment 1 completed"

def experiment_2():
    print("Experiment 2 is running")
    input_file = "/path/to/experiment_2.fasta"
    gpu = "cuda:1"
    model_name = "monomer"
    weight_set = 'OpenFold'
    run_id = openfold_client.run_inference(weight_set, model_name, inference_input=input_file, gpu=gpu)
    return "Experiment 2 completed"

experiments = [experiment_1, experiment_2]

# Function to handle keyboard interrupt
def signal_handler(sig, frame):
    print("Interrupt received, stopping...")
    raise KeyboardInterrupt

# Register the signal handler
signal.signal(signal.SIGINT, signal_handler)

try:
    # Execute tasks in parallel
    with ProcessPoolExecutor() as executor:
        futures = [executor.submit(task) for task in experiments]
        results = []
        for future in as_completed(futures):
            results.append(future.result())
    print("Results:", results)
except KeyboardInterrupt:
    print("Execution interrupted by user.")


# License and Disclaimer

This notebook and other information provided is for theoretical modelling only, caution should be exercised in its use. It is provided ‘as-is’ without any warranty of any kind, whether expressed or implied. Information is not intended to be a substitute for professional medical advice, diagnosis, or treatment, and does not constitute medical or other professional advice.

## AlphaFold/OpenFold Code License

Copyright 2021 AlQuraishi Laboratory

Copyright 2021 DeepMind Technologies Limited.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

## Model Parameters License

DeepMind's AlphaFold parameters are made available under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license. You can find details at: https://creativecommons.org/licenses/by/4.0/legalcode


## Third-party software

Use of the third-party software, libraries or code referred to in this notebook may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.


## Mirrored Databases

The following databases have been mirrored by DeepMind, and are available with reference to the following:
* UniRef90: v2021\_03 (unmodified), by The UniProt Consortium, available under a [Creative Commons Attribution-NoDerivatives 4.0 International License](http://creativecommons.org/licenses/by-nd/4.0/).
* MGnify: v2019\_05 (unmodified), by Mitchell AL et al., available free of all copyright restrictions and made fully and freely available for both non-commercial and commercial use under [CC0 1.0 Universal (CC0 1.0) Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/).
* BFD: (modified), by Steinegger M. and Söding J., modified by DeepMind, available under a [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by/4.0/). See the Methods section of the [AlphaFold proteome paper](https://www.nature.com/articles/s41586-021-03828-1) for details.