# Custom Report Demo

This demo shows how to create and query a dataset. The dataset in this case is generated by running an RCSB PDB web service to create a custom report of PDB annotations.

[PDB custom report](http://www.rcsb.org/pdb/results/reportField.do)

## Imports

In [1]:
from pyspark.sql import SparkSession
from mmtfPyspark.datasets import customReportService

#### Configure Spark 

In [2]:
spark = SparkSession.builder.appName("CustomReportDemo").getOrCreate()

## Retrieve PDB annotation
Binding affinities (Ki, Kd), group name of the ligand (hetId), and the SMILES string of the ligand.

In [3]:
ds = customReportService.get_dataset(["Ki","Kd","hetId","ligandSmiles"])

## Show the schema of this dataset

In [4]:
ds.printSchema()

root
 |-- structureChainId: string (nullable = true)
 |-- structureId: string (nullable = true)
 |-- chainId: string (nullable = true)
 |-- Ki: string (nullable = true)
 |-- Kd: string (nullable = true)
 |-- hetId: string (nullable = true)
 |-- ligandSmiles: string (nullable = true)



## Filtering

### Select structures that either have Ki or Kd values(s)

In [5]:
ds = ds.filter("Ki IS NOT NULL OR Kd IS NOT NULL")

ds.sample(fraction = 0.01, seed = 1).show(10)

+----------------+-----------+-------+-----------------+--------------------+-----+--------------------+
|structureChainId|structureId|chainId|               Ki|                  Kd|hetId|        ligandSmiles|
+----------------+-----------+-------+-----------------+--------------------+-----+--------------------+
|          1A0T.R|       1A0T|      R|             null|  50000000 (PDBbind)|  SUC|C([C@@H]1[C@H]([C...|
|          1A4L.D|       1A4L|      D| 0.001-0.01 (BDB)|                null|  DCF|c1nc2c(n1[C@H]3C[...|
|          1A85.A|       1A85|      A|      26000 (BDB)|                null|  0DY|                null|
|          1AD8.H|       1AD8|      H|    250 (PDBbind)|                null|  MDL|CN[C@H](Cc1ccccc1...|
|          1BNV.A|       1BNV|      A|             null|1.70 (PDBbind)#1....|  AL7|CN[C@@H]1CN(S(=O)...|
|          1BXR.E|       1BXR|      E| 400000 (PDBbind)|                null|  ANP|c1nc(c2c(n1)n(cn2...|
|          1D6S.A|       1D6S|      A|             null

## Terminate Spark

In [6]:
spark.stop()