# Setup dependencies
I will be using pandas and sklearn for managing data and machine learning.
<details>
    <summary>pip install...</summary>

```python
# Allows to install a python package
pip install package-name
# or install python package with a specific version
pip install package-name==version
```
</details>


In [1]:
# Used to suppress warnings generated by your code:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

# Create Spark Session

In [2]:
import findspark

# Initializing FindSpark to locate Spark installation
findspark.init()

from pyspark.sql import SparkSession

# hdfs://localhost:9000
# Initialize Spark context
spark = SparkSession.builder.appName("Smart Building HVAC Monitoring")\
.master("local[*]") \
.config("spark.hadoop.fs.defaultFS", "file:///C:/Users/carlo/Desktop/Trabajo")\
.config("spark.driver.extraJavaOptions", "-Duser.timezone=UTC") \
.config("spark.executor.extraJavaOptions", "-Duser.timezone=UTC") \
.getOrCreate()
spark.sparkContext.setLogLevel("DEBUG")

### Simulate sensor data:

Use Spark’s rate source to generate continuous readings from multiple rooms.

In [3]:
from pyspark.sql.functions import expr, rand,when

# Simulate sensor data with room IDs and readings
sensor_data = spark.readStream.format("rate")\
    .option("checkpointLocation", "/spark/tmp/checkpoint") \
    .option("rowsPerSecond", 5).load() \
    .withColumn("room_id", expr("CAST(value % 10 AS STRING)")) \
    .withColumn("temperature", when(expr("value % 10 == 0"), 15)  # Set temperature to 15 for one specific record
                .otherwise(20 + rand() * 25)) \
    .withColumn("humidity", expr("40 + rand() * 30"))

### Create a temporary SQL view:
Create temporary SQL view to perform SQL queries on the streaming data.

In [4]:
# Create a temporary SQL view for the sensor data
sensor_data.createOrReplaceTempView("sensor_table")

### Define SQL queries for aggregation and analysis:

* **Critical temperature query**: Detect rooms with critical temperature levels
* **Average readings query**: Calculate average readings over a 1-minute window
* **Attention needed query**: Identify rooms that need immediate attention based on humidity levels


In [5]:
# SQL Query to detect rooms with critical temperatures
critical_temperature_query = """
    SELECT 
        room_id, 
        temperature, 
        humidity, 
        timestamp 
    FROM sensor_table 
    WHERE temperature < 18 OR temperature > 60
"""

# SQL Query to calculate average readings over a 1-minute window
average_readings_query = """
    SELECT 
        room_id,
        window(timestamp, '1 minute') time,
        AVG(temperature) AS avg_temperature, 
        AVG(humidity) AS avg_humidity, 
        window.start AS window_start 
    FROM sensor_table
    GROUP BY room_id, window(timestamp, '1 minute')
"""

# SQL Query to find rooms that need immediate attention based on humidity
attention_needed_query = """
    SELECT 
        room_id, 
        COUNT(*) AS critical_readings 
    FROM sensor_table 
    WHERE humidity < 45 OR humidity > 75
    GROUP BY room_id
"""


### Execute the SQL queries:
Execute each SQL query to create streaming DataFrames.

In [6]:
# Execute the critical temperature query
critical_temperatures_stream = spark.sql(critical_temperature_query)

# Execute the average readings query
average_readings_stream = spark.sql(average_readings_query)

# Execute the attention needed query
attention_needed_stream = spark.sql(attention_needed_query)

### Output the results to the console:
Display the results from each query in real-time.

In [7]:
# Output the results to the console for all queries
critical_query = critical_temperatures_stream.writeStream \
    .outputMode("append") \
    .format("console") \
    .queryName("Critical Temperatures") \
    .start()

average_query = average_readings_stream.writeStream \
    .outputMode("complete") \
    .format("console") \
    .queryName("Average Readings") \
    .start()

attention_query = attention_needed_stream.writeStream \
    .outputMode("complete") \
    .format("console") \
    .queryName("Attention Needed") \
    .start()

### Keep the streams running:
Ensure that the streaming queries continue to run to process incoming data.

In [None]:
# Keep the streams running
print("********Critical Temperature Values*******")
critical_query.awaitTermination()

print("********Average Readings Values********")
average_query.awaitTermination()

print("********Attention Needed Values********")
attention_query.awaitTermination()

# Stop Spark Session

In [None]:
spark.stop()

In [None]:
spark.version