# 3. Property Filters

Property filters are the first major step in refining the initial pool of oligos based on their intrinsic sequence properties. This step eliminates sequences that do not meet specific experimental criteria, such as GC content or melting temperature (Tm), which ensures that only the most suitable oligos are retained for subsequent analysis.


## Imports and setup

In [None]:
import os

from pathlib import Path
from Bio.SeqUtils import MeltingTemp as mt

from oligo_designer_toolsuite.database import (
    OligoDatabase,
)

from oligo_designer_toolsuite.oligo_property_filter import (
    GCContentFilter,
    HardMaskedSequenceFilter,
    MeltingTemperatureNNFilter,
    PropertyFilter,
    SoftMaskedSequenceFilter,
)

In [2]:
dir_output = os.path.abspath("./results")
Path(dir_output).mkdir(parents=True, exist_ok=True)

n_jobs = 3

## Filtering by property

### Load the database
Property filters operate on `OligoDatabase` objects. If you don't know how they work, please check our [oligo database tutorial](2-oligo-database.ipynb). In this tutorial, we will load an existing database.

In [None]:
# Create Database object
min_oligos_per_region = 3
write_regions_with_insufficient_oligos = True
lru_db_max_in_memory=n_jobs * 2 + 2
database_name="db_oligos"

oligo_database = OligoDatabase(
    min_oligos_per_region=min_oligos_per_region, 
    write_regions_with_insufficient_oligos=write_regions_with_insufficient_oligos, 
    lru_db_max_in_memory=lru_db_max_in_memory, 
    database_name=database_name, 
    dir_output=dir_output, 
    n_jobs=n_jobs,
)

# Load Database
dir_database = os.path.abspath("./data/1_db_oligos_initial")
oligo_database.load_database(dir_database=dir_database, database_overwrite=True)

### Define property filters
Each property filter is implemented as a class inheriting from the abstract base class `PropertyFilterBase`. This ensures all filters have a standardized `apply()` method, which takes an `OligoDatabase` object as input, applies the filter, and returns the filtered database.

In [4]:
# define parameters of property filters
oligo_GC_content_min = 40
oligo_GC_content_max = 60

oligo_Tm_min = 65 
oligo_Tm_max = 75 

Tm_parameters_oligo = {
    "check": True, #default
    "strict": True, #default
    "c_seq": None, #default
    "shift": 0, #default
    "nn_table": "DNA_NN3", # Allawi & SantaLucia (1997)
    "tmm_table": "DNA_TMM1", #default
    "imm_table": "DNA_IMM1", #default
    "de_table": "DNA_DE1", #default
    "dnac1": 50, #[nM]
    "dnac2": 0, #[nM]
    "selfcomp": False, #default
    "saltcorr": 7, # Owczarzy et al. (2008)
    "Na": 39, #[mM]
    "K": 75, #[mM]
    "Tris": 20, #[mM]
    "Mg": 10, #[mM]
    "dNTPs": 0, #[mM] default
}
Tm_parameters_oligo["nn_table"] = getattr(mt, Tm_parameters_oligo["nn_table"])
Tm_parameters_oligo["tmm_table"] = getattr(mt, Tm_parameters_oligo["tmm_table"])
Tm_parameters_oligo["imm_table"] = getattr(mt, Tm_parameters_oligo["imm_table"])
Tm_parameters_oligo["de_table"] = getattr(mt, Tm_parameters_oligo["de_table"])

Tm_chem_correction_param_oligo = {
    "DMSO": 0, #default
    "fmd": 20,
    "DMSOfactor": 0.75, #default
    "fmdfactor": 0.65, #default
    "fmdmethod": 1, #default
    "GC": None, #default
}

Tm_salt_correction_param_oligo = None # use default settings

# Create property filters
hard_masked_sequences = HardMaskedSequenceFilter()
soft_masked_sequences = SoftMaskedSequenceFilter()
gc_content = GCContentFilter(
    GC_content_min=oligo_GC_content_min, 
    GC_content_max=oligo_GC_content_max 
)
melting_temperature = MeltingTemperatureNNFilter(
    Tm_min=oligo_Tm_min, 
    Tm_max=oligo_Tm_max, 
    Tm_parameters=Tm_parameters_oligo, 
    Tm_chem_correction_parameters=Tm_chem_correction_param_oligo, 
    Tm_salt_correction_parameters=Tm_salt_correction_param_oligo, 
)

### Apply filters
To streamline the application of multiple filters, the `PropertyFilter` wrapper class allows users to define a sequence of filters to be applied in order. Filters with lower computational cost (e.g., GC content) should be applied first to reduce the dataset size before more complex filters (e.g., Tm). A list of implemented property filters is available [here](https://oligo-designer-toolsuite.readthedocs.io/en/latest/_api_docs/oligo_designer_toolsuite.oligo_property_filter.html)

> ⚠️ Order Matters! 
>
> The sequential application of filters minimizes runtime by processing smaller datasets in later, more computationally intensive steps.

In [None]:
filters = [
    hard_masked_sequences,
    soft_masked_sequences,
    gc_content,
    melting_temperature,
]

property_filter = PropertyFilter(filters=filters)

oligo_database = property_filter.apply(
    oligo_database=oligo_database,
    sequence_type="oligo",
    n_jobs=n_jobs,
)

dir_database = oligo_database.save_database(name_database="2_db_oligos_property_filter")

Applying property filters to the OligoDatabase is critical for several reasons:

- **Improves Experimental Suitability:** Ensures that sequences meet critical physical and chemical requirements for optimal binding and stability.
- **Reduces Computational Load:** Eliminates unsuitable sequences early, saving resources for downstream processes.
- **Modular and Extensible:** The PropertyFilterBase design makes it easy to add new filters for additional properties as needed.