Spark provides a simple standalone deploy mode. You can launch a standalone cluster either manually, 
by starting a master and workers by hand, or use provided launch scripts. 
It is also possible to run these daemons on a single machine for testing.


The launch scripts do not currently support Windows. To run a Spark cluster on Windows, 
start the master and workers by hand.(http://spark.apache.org/docs/latest/spark-standalone.html)

# Architecture

In [None]:
Driver Program (Spark Context) => Master(Cluster Manager) = > Workers( Executors=> cores/slots)

# Master hosts a cluster Manager on it and keep track of avaliable resources on worker nodes
# Within driver Program, we have SparkContext to initialize the application, coordinates the running of processes on cluster
## In client mode, driver program is on local machine, in cluster mode driver program is launched on worker machine
## (i.e. in spark shell in terminal , when we submit the script while launching appllication, it is cluster mode)

## Workers are like nodes, independent machines

## As we interactive in jupyter notebook it is client mode.

<img src="Data/Architecture.PNG">

In [2]:
## Through this exercise, I will configure standalone mode cluster setting , 
## start by powershell command, access the running cluster in jupyter notebook to do the analysis

# Settings

# Accessing Cluster in Jupyter

In [3]:
## After doing settings and starting the terminal, I am accessing it here
from pyspark.sql import SparkSession

In [6]:
spark=SparkSession.builder.appName("spark-shell").master("spark://127.0.0.1:7077").getOrCreate()

In [7]:
df = spark.read.format("json").load("Data\sparkify_log_small.json")

In [8]:
df.count()

10000

In [9]:
spark.sparkContext.stop()

## First running above setting this is how my WEB UI look:

2 executors per worker each with 2 cores and total memory (1g) (Worker with 2 gigs and 4 cores)

<img src="Data\executors per worker.PNG">

<img src="Data\Total_executors.PNG">

# Reflection
####  We have three executors. By default, driver is also a executor

In [None]:
## If I look at the memory, 
## I have 1g memory divided into three parts equally among all executors. But when I check my master
## Memory is allocated correctly. So Let us understand memory alocation

## I have setting of driver memory , set SPARK_DRIVER_MEMORY=1g, I think it effects in some way. Let us figure it out
## We have daemons: Master and Slaves
##    SPARK_DAEMON_MEMORY	Memory to allocate to the Spark master and worker daemons themselves (default: 1g)
##    SPARK_WORKER_MEMORY	Total amount of memory to allow Spark applications to use on the machine, 
##   e.g. 1000m, 2g (default: total memory minus 1 GB); note that each application's individual memory 
## is configured using its spark.executor.memory property.

In [None]:
# Driver will ask Master for resources, Master then allocates Workers to this application, 
# and Worker will start Executors, which are processes that run computations and store data for your application.

In [None]:
## When I give SPARK_WORKER_MEMORY, i am keeping a portion of memory of machine for spark applications
## SPARK_DAEMON_MEMORY = 1g (Memory to allocate to the Spark master and worker daemons themselves)
## SPARK_WORKER_MEMORY = 2g
## SPARK_EXECUTOR_MEMORY =THESE ARE APPLICATION SEPCIFIC

In [10]:
spark=SparkSession.builder.appName("spark-shell2").master("spark://127.0.0.1:7077").config("spark.executor.memory", "1300m").getOrCreate()

In [None]:
##  "spark.executor.memory", "1300m" = JAVA HEAP
reserved = 450
spark memory = (1300-450)*0.6 = 510 => 513(on web ui)
 

In [None]:
Driver Memory(System Memory) = 2G
reservedMemory = 300
minSystemMemory = (reservedMemory * 1.5) = 1.5 * 300 = 450
usableMemory = systemMemory - reservedMemory = 2048 -450 = 1598M
Spark memory = 0.6 *1598 = 958.8
In conversion process bytes in all it become => 912 ( On WEB UI)

In [None]:
Driver Memory(System Memory) = 1G
Spark memory = (1024-450)*0.6 = 344.4
In conversion process, there will be slight variation =>366.3 (On web ui)