# Chapter 7.1 - Spark Streaming

Paul E. Anderson

## Ice Breaker

What was the best halloween costume you saw over the weekend?

Also, what's the best candy?

## Streaming and Data Analysis
Analyzing data as it comes in at a high velocity in real time.

<img src="https://opensistemas.com/wp-content/uploads/2020/06/4-Vs-of-big-data-1.jpg">

## Velocity

* Data drivers
    * Social media (e.g., Twitter)
    * IoT (e.g., Smart Watches)
    * Mobile applications

## Business Use Cases

### Streaming ETL
* Traditional ETL (Extract, Transform, Load) tools used for batch processing in data warehouse environments must read data, convert it to a database compatible format, and then write it to the target database
* With Streaming ETL, data is continually cleaned and aggregated before it is pushed into data stores.

### Data Enrichment
* Enriches live data by combining it with static data, thus allowing organizations to conduct more complete real-time data analysis.
* e.g., Online advertisers use data enrichment to combine historical customer data with live customer behavior data and deliver more personalized and targeted ads in real-time and in context with what customers are doing.

### Trigger Event Detection
* Detect and respond quickly to rare or unusual behaviors (“trigger events”) that could indicate a potentially serious problem within the system. 
* Financial institutions use triggers to detect fraudulent transactions and stop fraud in their tracks. * Hospitals also use triggers to detect potentially dangerous health changes while monitoring patient vital signs—sending automatic alerts to the right caregivers who can then take immediate and appropriate action.

### Complex Session Analysis
* Events relating to live sessions—such as user activity after logging into a website or application—can be grouped together and quickly analyzed
* Session information can also be used to continuously update machine learning models
* Companies such as Netflix use this functionality to gain immediate insights as to how users are engaging on their site and provide more real-time movie recommendations

### Other high level use cases
* Twitter wants to process billions of tweets/s to publish trending topics
* Credit card companies need to process millions of transactions for identifying fraud
* Mobile applications like whatsapp need to constantly crunch logs for service availability

### Real Time Analytics
* We need to process TB's of streaming data in real time to get up to date analysis
* Data will be coming from more than one stream
* Need to combine historical data with real time data
* Ability to process stream data for downstream application

## There are alternatives to Spark
* Apache Storm
    * Stream processing built on HDFS
    * Built by twitter

## Spark Streaming
<img src="https://miro.medium.com/max/720/1*FLYjc6U-qAQ64yDLLrzdWw.jpeg">

### Micro batch
* Spark streaming is a fast batch processing system
* Collects stream data into small batches and processes them
* Batch interval can be small (1s) or multiple hours
* Batches are called DStreams

## Example: WordCount

### The usual SparkContext

In [1]:
from pyspark import SparkConf
from pyspark.context import SparkContext

sc = SparkContext.getOrCreate(SparkConf().setMaster("local[*]"))

23/11/14 02:32:50 WARN Utils: Your hostname, classes resolves to a loopback address: 127.0.1.1; using 192.168.122.111 instead (on interface enp1s0)
23/11/14 02:32:50 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/11/14 02:32:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/11/14 02:32:51 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
23/11/14 02:32:51 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
23/11/14 02:32:51 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
23/11/14 02:32:51 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.


### Grab a streaming context

In [2]:
from pyspark.streaming import StreamingContext

ssc = StreamingContext(sc, 1)



In [3]:
PORT=9999 # Change this to a unique port before running individually
HOST="localhost"

In [4]:
print("Run this command at the terminal and type in words and hit enter periodically:")
print(f"nc -lk {PORT}")

Run this command at the terminal and type in words and hit enter periodically:
nc -lk 9999


In [5]:
lines = ssc.socketTextStream(HOST, PORT)
counts = lines.flatMap(lambda line: line.split(" "))\
              .map(lambda word: (word, 1))\
              .reduceByKey(lambda a, b: a+b)
counts.pprint()

ssc.start()
ssc.awaitTerminationOrTimeout(60) # wait 60 seconds

                                                                                

-------------------------------------------
Time: 2023-11-14 02:33:06
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:07
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:08
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:09
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:10
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:11
-------------------------------------------



                                                                                

-------------------------------------------
Time: 2023-11-14 02:33:12
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:13
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:14
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:15
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:16
-------------------------------------------



23/11/14 02:33:21 WARN RandomBlockReplicationPolicy: Expecting 1 replicas with only 0 peer/s.
23/11/14 02:33:21 WARN BlockManager: Block input-0-1699929201400 replicated to only 0 peer(s) instead of 1 peers


-------------------------------------------
Time: 2023-11-14 02:33:17
-------------------------------------------



23/11/14 02:33:23 WARN RandomBlockReplicationPolicy: Expecting 1 replicas with only 0 peer/s.
23/11/14 02:33:23 WARN BlockManager: Block input-0-1699929202400 replicated to only 0 peer(s) instead of 1 peers
23/11/14 02:33:25 WARN RandomBlockReplicationPolicy: Expecting 1 replicas with only 0 peer/s.
23/11/14 02:33:25 WARN BlockManager: Block input-0-1699929205000 replicated to only 0 peer(s) instead of 1 peers


-------------------------------------------
Time: 2023-11-14 02:33:18
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:19
-------------------------------------------



23/11/14 02:33:26 WARN RandomBlockReplicationPolicy: Expecting 1 replicas with only 0 peer/s.
23/11/14 02:33:26 WARN BlockManager: Block input-0-1699929206400 replicated to only 0 peer(s) instead of 1 peers


-------------------------------------------
Time: 2023-11-14 02:33:20
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:21
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:22
-------------------------------------------
('this', 1)
('is', 1)
('a', 1)
('test', 1)

-------------------------------------------
Time: 2023-11-14 02:33:23
-------------------------------------------
('test', 1)



                                                                                

-------------------------------------------
Time: 2023-11-14 02:33:24
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:25
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:26
-------------------------------------------
('test', 1)

-------------------------------------------
Time: 2023-11-14 02:33:27
-------------------------------------------
('testing', 1)
('test', 2)

-------------------------------------------
Time: 2023-11-14 02:33:28
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:29
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:30
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:31
-------------------------------------------

-----------------------------------------

                                                                                

-------------------------------------------
Time: 2023-11-14 02:33:41
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:42
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:43
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:44
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:45
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:46
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:47
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:48
-------------------------------------------



                                                                                

-------------------------------------------
Time: 2023-11-14 02:33:49
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:50
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:51
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:52
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:53
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:54
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:55
-------------------------------------------



                                                                                

-------------------------------------------
Time: 2023-11-14 02:33:56
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:57
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:33:58
-------------------------------------------



False

-------------------------------------------
Time: 2023-11-14 02:33:59
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:00
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:01
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:02
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:03
-------------------------------------------



                                                                                

-------------------------------------------
Time: 2023-11-14 02:34:04
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:05
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:06
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:07
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:08
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:09
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:10
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:11
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:12
----------

                                                                                

-------------------------------------------
Time: 2023-11-14 02:34:13
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:14
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:15
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:16
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:17
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:18
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:19
-------------------------------------------



                                                                                

-------------------------------------------
Time: 2023-11-14 02:34:20
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:21
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:22
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:23
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:24
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:25
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:26
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:27
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:28
----------

                                                                                

-------------------------------------------
Time: 2023-11-14 02:34:29
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:30
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:31
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:32
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:33
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:34
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:35
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:36
-------------------------------------------



                                                                                

-------------------------------------------
Time: 2023-11-14 02:34:37
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:38
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:39
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:40
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:41
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:42
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:43
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:44
-------------------------------------------



                                                                                

-------------------------------------------
Time: 2023-11-14 02:34:45
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:46
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:47
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:48
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:49
-------------------------------------------

-------------------------------------------
Time: 2023-11-14 02:34:50
-------------------------------------------

