# Exploratory Data Analysis on IoT Data

This notebook is used for exploratory data analysis (EDA) on the IoT data streams. The goal is to visualize and understand the patterns and anomalies present in the data.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pyspark.sql import SparkSession

# Initialize Spark session
spark = SparkSession.builder.appName('IoT Anomaly Detection EDA').getOrCreate()

# Load the IoT data
data_path = '../data/processed/iot_data.csv'
iot_data = spark.read.csv(data_path, header=True, inferSchema=True)

# Convert to Pandas DataFrame for EDA
iot_data_pd = iot_data.toPandas()

# Display the first few rows of the dataset
iot_data_pd.head()

In [None]:
# Summary statistics
iot_data_pd.describe()

In [None]:
# Visualize the distribution of key features
plt.figure(figsize=(12, 6))
sns.histplot(iot_data_pd['temperature'], bins=30, kde=True)
plt.title('Temperature Distribution')
plt.xlabel('Temperature')
plt.ylabel('Frequency')
plt.show()

In [None]:
# Correlation heatmap
plt.figure(figsize=(10, 8))
correlation_matrix = iot_data_pd.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## Conclusion

In this notebook, we performed exploratory data analysis on the IoT data. We visualized the distribution of key features and examined the correlations between them. This analysis will help in understanding the patterns and potential anomalies in the data.