# Spark Streaming

## Which of the following statements accurately describes Spark Streaming?

- Spark Streaming is a batch processing framework for processing large volumes of data
- Spark Streaming is a real-time data processing framework for processing streaming data ***
- Spark Streaming is a relational database management system
- Spark Streaming is a machine learning framework

## What is a DStream in Spark Streaming and how does it differ from a RDD in Apache Spark?

- A DStream is a sequence of static datasets, while a RDD is a continuous stream of data
- A DStream is a continuous stream of data, while a RDD represents static datasets ***
- A DStream and a RDD are both continuous streams of data, but a DStream is optimized for real-time processing
- A DStream and a RDD are both static datasets, but a DStream is optimized for parallel processing

## What is the typical process of Spark Streaming?

- Spark Streaming reads data from a file, processes it in real-time, and writes the output to a file
- Spark Streaming reads data from a database, processes it in real-time, and writes the output to a database
- Spark Streaming reads data from a streaming source, processes it in real-time, and writes the output to a streaming sink ***
- Spark Streaming reads data from a batch source, processes it in near-real-time, and writes the output to a batch sink

## What is `StreamingContext` in Spark Streaming and what is its role in the streaming process?

- `StreamingContext` is a data structure that represents the input data in Spark Streaming, and it is responsible for processing the data streams in real-time
- `StreamingContext` is the entry point for Spark Streaming applications, and it represents the main interface for creating DStreams and configuring the streaming process ***
- `StreamingContext` is a component that is responsible for writing the output data to external storage systems, such as HDFS or S3
- `StreamingContext` is a machine learning library in Spark that is used for training models on real-time data streams


## Which of the following code snippets correctly creates a `StreamingContext` in Spark Streaming?

- This code
``` python
from pyspark.streaming import StreamingContext
ssc = StreamingContext(sparkContext, 10)
```

- This code
``` python
from pyspark import StreamingContext
ssc = StreamingContext("local[2]", "MyStreamingApp", 10)
```

- This code
``` python
from pyspark.streaming import StreamingContext
ssc = StreamingContext(sparkConf, 10)
```

- This code ***
``` python
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
sc = SparkContext("local[2]", "MyStreamingApp")
ssc = StreamingContext(sc, 10) 
```




## What does `awaitTermination()` method do in Spark Streaming?

- `awaitTermination()` method waits for the completion of all batch processing tasks before stopping the streaming context
- `awaitTermination()` method waits indefinitely until the streaming context is stopped or terminated by a user interrupt signal ***
- `awaitTermination()` method waits for a specified amount of time for the streaming context to process the incoming data streams, and then stops the context
- `awaitTermination()` method waits for the availability of new data streams and continuously processes the data until the streaming context is explicitly stopped



## What are window operations in Spark Streaming?

- A way to create subsets of data which are more performant
- A way to perform aggregation on a window of data over a specified period of time ***
- A way to filter out irrelevant data points from the incoming data stream
- A way to perform machine learning algorithms on the incoming data streams



## What is the output of the following code that performs a window operation on a DStream?

``` python
from pyspark.streaming import StreamingContext

ssc = StreamingContext(sparkContext, 1)
ssc.checkpoint("checkpoint_dir")

input_stream = ssc.socketTextStream("localhost", 9999)
numbers_stream = input_stream.map(lambda x: int(x))
windowed_stream = numbers_stream.window(10, 5)
count_stream = windowed_stream.count()

count_stream.pprint()

ssc.start()
ssc.awaitTermination()
```

- The code will not execute as there is no data source defined
- The code will calculate the count of numbers over a sliding window of 10 seconds and print the results every 5 seconds ***
- The code will calculate the count of numbers over a sliding window of 5 seconds and print the results every 10 seconds
- The code will calculate the sum of numbers over a sliding window of 10 seconds and print the results every 5 seconds

## Which of the following methods can be used to save a DStream in Spark Streaming?

- `dstream.saveAsTextFiles("output_dir")`
- `dstream.saveAsParquetFile("output_dir")`
- `dstream.saveAsSequenceFile("output_dir")`
- All of the above ***



## Which of the following statements is true about `foreachRDD` in Spark Streaming?

- `foreachRDD` is a transformation that applies a function to each RDD in a DStream ***
- `foreachRDD` is an action that applies a function to each element in a DStream
- `foreachRDD` is a method that writes each RDD in a DStream to a file system or external storage
- `foreachRDD` is a method that allows custom processing of some Dataframes in a DStream

## Which of the following statements is true about `persist()` in Spark Streaming?

- `persist()` is a method that saves a DStream to a file system or external storage
- `persist()` is a transformation that applies a function to each RDD in a DStream
- `persist()` is a method that caches the RDDs of a DStream in memory or disk for faster processing ***
- `persist()` is an action that returns the first element of a DStream

## Which of the following statements is true about checkpointing in Spark Streaming?

- Checkpointing is a method that saves a DStream to a file system or external storage
- Checkpointing is a transformation that applies a function to each RDD in a DStream
- Checkpointing is a mechanism that stores metadata about the state of a Spark Streaming application for fault tolerance ***
- Checkpointing is an action that returns the first element of a DStream