# Filter By Polymer Chain Type Demo

This example shows how to build a custom filter for polymer chain types. It also demonstrates how to combine filters with **or**, **and**, and **not**.

#### Supported polymer chain type includes  (Both string and class variable can be used a input parameter)

* ContainsPolymerChainType.D_PEPTIDE_COOH_CARBOXY_TERMINUS = "D-PEPTIDE COOH CARBOXY TERMINUS"
* ContainsPolymerChainType.D_PEPTIDE_NH3_AMINO_TERMINUS = "D-PEPTIDE NH3 AMINO TERMINUS"
* ContainsPolymerChainType.D_PEPTIDE_LINKING = "D-PEPTIDE LINKING"
* ContainsPolymerChainType.D_SACCHARIDE = "D-SACCHARIDE"
* ContainsPolymerChainType.D_SACCHARIDE_14_and_14_LINKING = "D-SACCHARIDE 1,4 AND 1,4 LINKING"
* ContainsPolymerChainType.D_SACCHARIDE_14_and_16_LINKING = "D-SACCHARIDE 1,4 AND 1,6 LINKING"
* ContainsPolymerChainType.DNA_OH_3_PRIME_TERMINUS = "DNA OH 3 PRIME TERMINUS"
* ContainsPolymerChainType.DNA_OH_5_PRIME_TERMINUS = "DNA OH 5 PRIME TERMINUS"
* ContainsPolymerChainType.DNA_LINKING = "DNA LINKING"
* ContainsPolymerChainType.L_PEPTIDE_COOH_CARBOXY_TERMINUS = "L-PEPTIDE COOH CARBOXY TERMINUS"
* ContainsPolymerChainType.L_PEPTIDE_NH3_AMINO_TERMINUS = "L-PEPTIDE NH3 AMINO TERMINUS"
* ContainsPolymerChainType.L_PEPTIDE_LINKING = "L-PEPTIDE LINKING"
* ContainsPolymerChainType.L_SACCHARIDE = "L-SACCHARIDE"
* ContainsPolymerChainType.L_SACCHARIDE_14_AND_14_LINKING = "L-SACCHARDIE 1,4 AND 1,4 LINKING"
* ContainsPolymerChainType.L_SACCHARIDE_14_AND_16_LINKING = "L-SACCHARIDE 1,4 AND 1,6 LINKING"
* ContainsPolymerChainType.PEPTIDE_LINKING = "PEPTIDE LINKING"
* ContainsPolymerChainType.RNA_OH_3_PRIME_TERMINUS = "RNA OH 3 PRIME TERMINUS"
* ContainsPolymerChainType.RNA_OH_5_PRIME_TERMINUS = "RNA OH 5 PRIME TERMINUS"
* ContainsPolymerChainType.RNA_LINKING = "RNA LINKING"
* ContainsPolymerChainType.OTHER = "OTHER"
* ContainsPolymerChainType.SACCHARIDE = "SACCHARIDE"

In [1]:
from pyspark.sql import SparkSession
from mmtfPyspark.io import mmtfReader
from mmtfPyspark.filters import *
from mmtfPyspark.structureViewer import view_structure

#### Configure Spark 

In [2]:
spark = SparkSession.builder.master("local[4]") \
                                 .appName("FilterByPolymerChainTypeDemo") \
                                 .getOrCreate()

## Read in MMTF Files, filter and count

The `ContainsPolymerChainType` filter can be used to create a custom filter.  We also show how to negate a filter by wrapping it in a `NotFilter`. Here we create a filter that returns structure that:
1. Contain polymer chains with L-amino acids, **or** D-amino acids, **or** Glycine (PEPTIDE_LINKING)
2. **And** that do **not** contain DNA **or** RNA polymer chains.

In [3]:
path = "../../resources/mmtf_reduced_sample/"

structures = mmtfReader.read_sequence_file(path) \
                .filter(ContainsPolymerChainType([ContainsPolymerChainType.L_PEPTIDE_LINKING,  # use predefined types
                                                  ContainsPolymerChainType.D_PEPTIDE_LINKING,  # multiple types are or-ed
                                                  ContainsPolymerChainType.PEPTIDE_LINKING])) \
                .filter(NotFilter(ContainsPolymerChainType(["DNA_LINKING",  ## or specify types as strings
                                                            "RNA_LINKING"])))
                                                            
print(f"Number of proteins: {structures.count()}")

Number of proteins: 9707


## View Structures

In [4]:
structure_names = structures.keys().collect()
view_structure(structure_names, style='sphere');

interactive(children=(IntSlider(value=0, continuous_update=False, description='Structure', max=9706), Output()…

## Terminate Spark 

In [5]:
spark.stop()