# LDDT

We use a docker container with [OpenStructure (v2.1.0)](https://git.scicore.unibas.ch/schwede/openstructure/-/tree/master/docker) for computing LDDT scores.

Inside the container we run a custom python script adapted from one of the examples on the website.

LDDT scores, both global and local will be saved as a `CASP*/decoys/<target_id>.lddt.npz` file containing:
- `decoys`: 1D array of decoy names
- `global_lddt`: 1D array of global scores
- `local_lddt`: 2D array of local scores of shape `num_decoys x seq_length`

In [1]:
from pathlib import Path

import docker
import numpy as np
import pandas as pd

from loguru import logger
from joblib import Parallel, delayed

from graphqa.data.aminoacids import *

docker_client = docker.from_env()

Pull the [OpenStructure](https://www.openstructure.org/docs/2.0/install/) docker image and start a container with the LDDT python script mounted inside:

In [2]:
%%bash
docker pull 'registry.scicore.unibas.ch/schwede/openstructure:2.1.0'
docker stop lddt 2> /dev/null
docker run --rm --tty --detach \
  --name 'lddt' \
  --entrypoint 'bash' \
  --mount "type=bind,source=$(realpath ../src/graphqa/data/lddt_docker.py),target=/lddt.py" \
  --mount "type=bind,source=$(realpath ../data),target=/native" \
  --mount "type=bind,source=$(realpath ../data),target=/decoy" \
  --mount "type=bind,source=$(realpath ../data),target=/output" \
  'registry.scicore.unibas.ch/schwede/openstructure:2.1.0'
docker ps --filter "name=lddt"

2.1.0: Pulling from schwede/openstructure
Digest: sha256:501789035234406f903fb3633e0cda07176704de2e181c4828ba6833e42b46db
Status: Image is up to date for registry.scicore.unibas.ch/schwede/openstructure:2.1.0
95580d6ac8cb057d2f3cf72d0cdb317f139e243e7248fcd868ab8390164c0cae
CONTAINER ID        IMAGE                                                    COMMAND             CREATED             STATUS                  PORTS               NAMES
95580d6ac8cb        registry.scicore.unibas.ch/schwede/openstructure:2.1.0   "bash"              1 second ago        Up Less than a second                       lddt


In [3]:
lddt_container = docker_client.containers.get("lddt")
df_natives = pd.read_csv("natives_casp.csv")
target_lengths = pd.read_csv("sequences.csv").set_index("target_id").length.to_dict()

In [None]:
def run_lddt_in_docker(seq_len, native_path, decoys_dir, output_path):
    exit_code, (stdout, stderr) = lddt_container.exec_run(
        cmd=["/lddt.py", str(seq_len), native_path, decoys_dir, output_path], demux=True
    )

    if exit_code != 0:
        logger.error(f"LDDT error {native_path}: {stderr.decode()}")


with Parallel(n_jobs=10, prefer="threads") as pool:
    missing_targets = [
        dict(
            seq_len=target_lengths[target.target_id],
            native_path=f"CASP{target.casp_ed}/native/{target.target_id}.pdb",
            decoys_dir=f"CASP{target.casp_ed}/decoys/{target.target_id}",
            output_path=f"CASP{target.casp_ed}/decoys/{target.target_id}.lddt.npz",
        )
        for target in df_natives.itertuples()
        if not Path(
            f"CASP{target.casp_ed}/decoys/{target.target_id}.lddt.npz"
        ).is_file()
    ]
    logger.info(f"Launching {len(missing_targets)} jobs")
    pool(delayed(run_lddt_in_docker)(**target_dict) for target_dict in missing_targets)

In [5]:
pdb = set(p.with_suffix("").name for p in Path().glob("CASP*/native/*.pdb"))
lddt = set(p.with_suffix("").with_suffix("").name for p in Path().glob("CASP*/decoys/*.lddt.npz"))
for fail in pdb - lddt:
    logger.warning(f"LDDT failed on: {fail}")



In [6]:
%%bash
docker stop lddt

lddt
