# **LAB 11 - Executing MNIST classifier on Standalone Cluster in AWS**

### Note: Before executing the script ensure the EC2 Instances are running, and Elephas is installed in Master and Workers.

### Start the spark cluster in the Master using the command "sh /opt/spark/sbin/start-all.sh"

Import the Pyspark library

In [None]:
import pyspark
import findspark

Find the spark installation folder

In [None]:
findspark.find()

Initialize the spark in the installed folder

In [None]:
findspark.init('/opt/spark')

Import Spark context and Spark configuration libraries

In [None]:
from pyspark import SparkContext, SparkConf

Connect to the Spark Cluster using the private IP address of the Master. <br>
Note: Please add the IP address in place of "private-ip-address of Master" in the below command before execution. 

In [None]:
conf = SparkConf().setAppName('Mnist_Spark_MLP_1').setMaster('spark://<private-ip-address of Master>:7077')
sc = SparkContext(conf=conf)

Import the necessary tensorflow and keras libraries to build a model for classifying the MNIST data

In [None]:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.utils import to_categorical

from elephas.spark_model import SparkModel
from elephas.utils.rdd_utils import to_simple_rdd

Define the batch size, number of output classes, and number of epochs for trianing.

In [None]:
# Define basic parameters
batch_size = 64
nb_classes = 10
epochs = 3

Load the MNIST data <br>
Reshape the data <br>
Normalize the data <br>
Initialize the Train and Test variables.

In [None]:
# Load data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype("float32")
x_test = x_test.astype("float32")
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices
y_train = to_categorical(y_train, nb_classes)
y_test = to_categorical(y_test, nb_classes)

Define the model for classifying the MNIST data

In [None]:
model = Sequential()
model.add(Dense(128, input_dim=784))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))

sgd = SGD(lr=0.1)
model.compile(sgd, 'categorical_crossentropy', ['acc'])

Convert the Training data in the numpy format into RDD format <br>
Initialize the SparkModel from the Tensorflow.Keras model defined above in the existing Spark context.

In [None]:
# Build RDD from numpy features and labels
rdd = to_simple_rdd(sc, x_train, y_train)

# Initialize SparkModel from tensorflow.keras model and Spark context
spark_model = SparkModel(model, mode='asynchronous')

Train the Spark model on the Cluster <br>
Observe the communication messages between the Master and the workers

In [None]:
# Train Spark model
spark_model.fit(rdd, epochs=epochs, batch_size=batch_size, verbose=2, validation_split=0.1)

Evaluate the trained model on the test data

In [None]:
# Evaluate Spark model by evaluating the underlying model
score = spark_model.evaluate(x_test, y_test, verbose=2)
print('Test accuracy:', score[1])

Stop the Spark session

In [None]:
sc.stop()