# DEERFold Examples

In this notebook we will overview how to approach the following topics: 

* Unconstrained prediction 
* Constrained prediction 

## Unconstrained prediction

For comparison DEERFold can accept empty csv file to genreate unconstrained models. 

To unconditionally generate models from DEERFold, required parameters are as follows:
- `fasta_file`: Input sequence file in FASTA format.
- `msa_dir`: Directory containing multiple sequence alignments.
- `out_dir`: Directory to save output models.
- `model`: Directory storing all the model weights.
- `neff`: MSA Neff value.
- `num`: Number of models to generate.

In [2]:
# Import necessary modules
from your_module import predict_with_sp_batch  # Replace 'your_module' with the actual module name

# Define parameters
input_fasta = "examples/PfMATE.fasta"
output_dir = "examples/alignments"
model_type = "strand"
neff = 5
num = 100

# Load the model and other necessary components
# Note: You might need to adjust this part based on how your model is loaded
model, collater, tokenizer, scheme = predict_with_sp_batch.load_model(model_type)

# Run the prediction
results = predict_with_sp_batch.predict(
    input_fasta=input_fasta,
    output_dir=output_dir,
    model=model,
    collater=collater,
    tokenizer=tokenizer,
    scheme=scheme,
    neff=neff,
    num_sequences=num_sequences
)

# Display or process results
print(f"Prediction completed. Results saved in {output_dir}")
print(f"Number of sequences processed: {len(results)}")

ModuleNotFoundError: No module named 'your_module'

## Constrained prediction

### Input Format
DEERFold can accept input DEER distance constraints in csv format, the format is shown as below:
```
18,95,0,0,0,0,2.00E-05,7.00E-05,0.00027,0.0009,0.0027,0.00708,0.01622,0,0,0,0,...
18,215,0,0,0,0,0,0,8.00E-05,0.000940009,0.007260073,0.035440354,0.109791098,0.215792158,...
18,240,0,0,1.00E-05,4.00E-05,0.000169995,0.00067998,0.00233993,0.006879794,0.017259482,...
```

The constraints are in the format of distograms across 100 bins (shape LxLx100):
- Bin 1: Less than or equal to 1.5 Å
- Bin 2: (1.5 Å, 2.5 Å] 
- Bin 3: (2.5 Å, 3.5 Å] 
- ...
- Bin 99: (98.5 Å, 99.5 Å]
- Bin 100: Greater than 99.5 Å

To run inference on a sequence with the given DEER constraints, make sure you have the following:
- `fasta_file`: Input sequence file in FASTA format.
- `msa_dir`: Directory containing multiple sequence alignments.
- `out_dir`: Directory to save output models.
- `model`: Directory storing all the model weights.
- `sp`: Input file with DEER constraints in CSV format.
- `neff`: MSA Neff value.
- `num`: Number of models to generate.
- `refs`: Reference PDB files for RMSD and TM-score analysis (optional).

In [3]:
import os
from your_module import predict_with_sp_batch  # Replace 'your_module' with the actual module name

# Define parameters
fasta_file = "examples/PfMATE/PfMATE.fasta"
msa_dir = "examples/alignments"
out_dir = "out/PfMATE"
model_weights_dir = "model"
csv_file = "examples/PfMATE/PfMATE_low.csv"
neff = 5
num_models = 100
ref_pdbs = ["examples/PfMATE/6gwh.pdb", "examples/PfMATE/6fhz.pdb"]

# Ensure output directory exists
os.makedirs(out_dir, exist_ok=True)

# Run prediction
results = predict_with_sp_batch(
    fasta_file=fasta_file,
    msa_dir=msa_dir,
    out_dir=out_dir,
    model=model_weights_dir,
    sp=csv_file,
    neff=neff,
    num=num_models,
    refs=",".join(ref_pdbs)
)

print(f"Prediction completed. Results saved in {out_dir}")
print(f"Number of models generated: {num_models}")

ModuleNotFoundError: No module named 'your_module'

### Output

DEERFold will generate predicted models in PDB format. These models are ranked by EMD distance between the prediction and the input distance constraints. The top-ranking models should be those most closely fitting the input distance constraints.

### Analyzing Results

After running the prediction, you can analyze the results by examining the output PDB files in the specified output directory. If you provided reference PDB files, DEERFold will also perform RMSD and TM-score analysis comparing the predictions to these references.

To visualize or further analyze the top models, you can use various protein structure visualization tools or additional analysis scripts.