# Filter Protein Dna Complexes Demo

This example shows how to filter PDB by protein DNA complexes

![](./figures/ProteinDnaComplex.png)

## Imports

In [1]:
from pyspark.sql import SparkSession
from mmtfPyspark.io import mmtfReader
from mmtfPyspark.filters import ContainsDnaChain, ContainsLProteinChain
from mmtfPyspark.structureViewer import view_structure

#### Configure Spark 

In [2]:
spark = SparkSession.builder.appName("FilterProteinDnaComplexesDemo").getOrCreate()

## Read in MMTF Files

In [3]:
path = "../../resources/mmtf_reduced_sample/"

pdb = mmtfReader.read_sequence_file(path)

## Filter proteins that cotinas Dna chain and L protein chain

1) Retain pdb entires that exclusively contain L-peptide chains
2) Retain pdb entries that exclusively contain L-Dna

In [4]:
structures = pdb.filter(ContainsLProteinChain()) \
         .filter(ContainsDnaChain())

## Count number of entires

In [5]:
count = structures.count()

print(f"Number of entires that contain L-protein and L-DNA: {count}")

Number of entires that contain L-protein and L-DNA: 148


## Visualize Structures

In [6]:
structure_names = structures.keys().collect()
view_structure(structure_names)

interactive(children=(IntSlider(value=0, continuous_update=False, description='Structure', max=147), Output())…

<function mmtfPyspark.structureViewer.view_structure.<locals>.view3d(i=0)>

## Terminate Spark 

In [7]:
spark.stop()