# Challenge 1: Kafka Stream Setup for VPN Security Events

## Task Description
In this challenge, we need to:
1. Connect to a Kafka topic that streams VPN connection events
2. Define a schema for the data
3. Create a streaming DataFrame
4. Display the streaming data

## Setup

In [None]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *

# Create a Spark session
spark = SparkSession.builder \
    .appName("VPN Security Stream Processing") \
    .master("local[*]") \
    .config("spark.sql.shuffle.partitions", "8") \
    .config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.1") \
    .getOrCreate()

# Set log level
spark.sparkContext.setLogLevel("WARN")

## Define Schema
The VPN connection events have the following structure:

In [None]:
# TODO: Define schema for VPN connection events
schema = StructType([
    # Define fields here
    # Hint: Include user_id, timestamp, country, device_type, connection_status
    StructField("user_id", StringType(), True),
    # Add more fields...
])

## Connect to Kafka Stream

In [None]:
# Kafka connection parameters
kafka_bootstrap_servers = "kafka:9092"
kafka_topic = "vpn_connection_events"

# TODO: Create streaming DataFrame from Kafka
stream_df = spark \
    .readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", kafka_bootstrap_servers) \
    .option("subscribe", kafka_topic) \
    .option("startingOffsets", "earliest") \
    # Add any other options you need
    .load()

# TODO: Parse the value column from Kafka
# Hint: The value column contains the JSON data as binary

In [None]:
# Display stream (for development)
query = parsed_df \
    .writeStream \
    .outputMode("append") \
    .format("console") \
    .option("truncate", False) \
    .start()

# Wait for the streaming query to terminate
query.awaitTermination()

## Testing Notes
- You can test this code once the Kafka producer is running
- Watch the console output to verify your stream is working
- The schema should match what the producer is sending