# Exploratory Data Analysis

In this notebook, we will perform exploratory data analysis (EDA) on the simulated sensor data. We will visualize trends and patterns to gain insights into the data.

In [1]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

In [2]:
# Load the processed data
data_path = '../data/processed/processed_data.csv'
data = pd.read_csv(data_path)

# Display the first few rows of the dataset
data.head()

In [3]:
# Check for missing values
missing_values = data.isnull().sum()
missing_values[missing_values > 0]

In [4]:
# Visualize the distribution of temperature
plt.figure(figsize=(10, 6))
sns.histplot(data['temperature'], bins=30, kde=True)
plt.title('Temperature Distribution')
plt.xlabel('Temperature (°C)')
plt.ylabel('Frequency')
plt.show()

In [5]:
# Visualize the relationship between temperature and humidity
plt.figure(figsize=(10, 6))
sns.scatterplot(x='temperature', y='humidity', data=data)
plt.title('Temperature vs Humidity')
plt.xlabel('Temperature (°C)')
plt.ylabel('Humidity (%)')
plt.show()

In [6]:
# Analyze trends over time
data['timestamp'] = pd.to_datetime(data['timestamp'])
data.set_index('timestamp', inplace=True)

# Plot temperature over time
plt.figure(figsize=(14, 7))
data['temperature'].plot(label='Temperature', color='red')
data['humidity'].plot(label='Humidity', color='blue')
plt.title('Temperature and Humidity Over Time')
plt.xlabel('Time')
plt.ylabel('Values')
plt.legend()
plt.show()

In [7]:
# Correlation heatmap
plt.figure(figsize=(10, 8))
correlation = data.corr()
sns.heatmap(correlation, annot=True, fmt='.2f', cmap='coolwarm', square=True)
plt.title('Correlation Heatmap')
plt.show()

## Conclusion

In this exploratory data analysis, we visualized the distribution of temperature and humidity, analyzed their relationship, and examined trends over time. The correlation heatmap provided insights into how different features relate to each other.