# 5-MapReduce
This tutorial demonstrates how to use Map and Reduce to count the number of atoms in a structure.

In [1]:
from pyspark import SparkContext
from pyspark.sql import SparkSession
from mmtfPyspark.io import mmtfReader

### Configure Spark

In [2]:
spark = SparkSession.builder.master("local[4]").appName("5-MapReduce").getOrCreate()
sc = spark.sparkContext

### Read PDB structures

In [3]:
path = "../resources/mmtf_full_sample"

pdb = mmtfReader.read_sequence_file(path, sc)

## Map
Use a lambda expression to get the number of atoms for each entry.
The variable t represents a tuple (PDB ID, mmtfStructure)

* t[0]: PDB ID
* t[1]: mtfStructure

In [4]:
num_atoms = pdb.map(lambda t: t[1].num_atoms)

Print the number of atoms for 10 entries

In [5]:
num_atoms.take(10)

[3099, 2111, 1379, 2730, 2183, 1657, 2220, 2997, 4891, 8873]

## Reduce
Use the reduce method with a summation function defined as a lambda expression

In [6]:
num_atoms.reduce(lambda a, b: a+b)

34248081

In [7]:
spark.stop()