# IoT Weather Monitoring System - Complete Time Series Analysis
## Temperature, Humidity, Pressure & Dew Point Forecasting
### Date: 26-11-2025 | Location: Gurugram, Haryana

---

This notebook contains the complete analysis including:
- Data loading and statistical description
- Data preprocessing for time series
- Model training (ARIMA, SARIMA, GARCH)
- Performance evaluation
- Future forecasting (2:15 PM - 6:15 PM)
- Comprehensive visualizations

## Step 1: Import Libraries

In [None]:
# Run this cell to import all required libraries
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
from arch import arch_model
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

print('✓ All libraries imported successfully!')

## Step 2: Load Data

In [None]:
# Load the sensor data
df = pd.read_excel('iot_sensor_readings.xlsx')
print(f'Dataset Shape: {df.shape}')
print(f'\nColumns: {list(df.columns)}')
df.head(10)

In [None]:
# Display last 10 rows
df.tail(10)

## Step 3: Statistical Description

In [None]:
# Statistical summary
print('Statistical Description of Weather Parameters:')
print('='*80)
df.describe().round(2)

In [None]:
# Check for missing values
print('Missing Values Check:')
print(df.isnull().sum())
print(f'\nTotal Missing Values: {df.isnull().sum().sum()}')

In [None]:
# Correlation analysis
correlation_matrix = df[['Temperature (°C)', 'Humidity (%)', 'Pressure (hPa)', 'Dew Point (°C)']].corr()
print('Correlation Matrix:')
print(correlation_matrix.round(3))

## Step 4: Data Preprocessing

In [None]:
# Create DateTime index
df['DateTime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'], format='%d-%m-%Y %I:%M:%S %p')
df = df.set_index('DateTime')
df = df.sort_index()

print('✓ DateTime index created')
print('✓ Data sorted by time')
print(f'\nTime range: {df.index[0]} to {df.index[-1]}')

## Step 5: Visualizations

In [None]:
# Time series plots
fig, axes = plt.subplots(4, 1, figsize=(15, 12))

axes[0].plot(df.index, df['Temperature (°C)'], color='#FF6B6B', linewidth=2)
axes[0].set_title('Temperature Over Time', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Temperature (°C)', fontsize=12)
axes[0].grid(True, alpha=0.3)

axes[1].plot(df.index, df['Humidity (%)'], color='#4ECDC4', linewidth=2)
axes[1].set_title('Humidity Over Time', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Humidity (%)', fontsize=12)
axes[1].grid(True, alpha=0.3)

axes[2].plot(df.index, df['Pressure (hPa)'], color='#95E1D3', linewidth=2)
axes[2].set_title('Pressure Over Time', fontsize=14, fontweight='bold')
axes[2].set_ylabel('Pressure (hPa)', fontsize=12)
axes[2].grid(True, alpha=0.3)

axes[3].plot(df.index, df['Dew Point (°C)'], color='#F38181', linewidth=2)
axes[3].set_title('Dew Point Over Time', fontsize=14, fontweight='bold')
axes[3].set_ylabel('Dew Point (°C)', fontsize=12)
axes[3].set_xlabel('Time', fontsize=12)
axes[3].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, fmt='.3f', 
            linewidths=2, square=True, cbar_kws={'shrink': 0.8})
plt.title('Correlation Matrix of Weather Parameters', fontsize=16, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()

## Step 6: Stationarity Testing

In [None]:
# Augmented Dickey-Fuller Test
def adf_test(series, name):
    result = adfuller(series.dropna())
    print(f'\n{name}:')
    print(f'  ADF Statistic: {result[0]:.4f}')
    print(f'  p-value: {result[1]:.4f}')
    if result[1] <= 0.05:
        print(f'  ✓ Stationary (p-value <= 0.05)')
    else:
        print(f'  ✗ Non-stationary (p-value > 0.05)')
    return result[1] <= 0.05

print('Stationarity Tests (ADF Test):')
print('='*80)
adf_test(df['Temperature (°C)'], 'Temperature')
adf_test(df['Humidity (%)'], 'Humidity')
adf_test(df['Pressure (hPa)'], 'Pressure')
adf_test(df['Dew Point (°C)'], 'Dew Point')

## Step 7: Train-Test Split

In [None]:
# Split data 80-20
train_size = int(len(df) * 0.8)
train_data = df[:train_size]
test_data = df[train_size:]

print(f'Training samples: {len(train_data)}')
print(f'Testing samples: {len(test_data)}')
print(f'\nTrain period: {train_data.index[0]} to {train_data.index[-1]}')
print(f'Test period: {test_data.index[0]} to {test_data.index[-1]}')

## Step 8: Model Training & Evaluation

### Note: Run the Python scripts for complete model training
```python
# Run in terminal:
python model_training_forecasting.py
```

In [None]:
# Load performance metrics
perf_df = pd.read_excel('model_performance_metrics.xlsx')
print('Model Performance Summary:')
print('='*80)
perf_df

## Step 9: Future Forecasting Results

In [None]:
# Load future forecast
forecast_df = pd.read_excel('future_forecast_2pm_to_6pm.xlsx')
print(f'Future Forecast (2:15 PM - 6:15 PM):')
print(f'Total predictions: {len(forecast_df)}')
print(f'\nFirst 10 predictions:')
forecast_df.head(10)

In [None]:
# Forecast summary
print('Forecast Summary:')
print('='*80)
forecast_df[['Temperature (°C)', 'Humidity (%)', 'Pressure (hPa)', 'Dew Point (°C)']].describe().round(2)

## Step 10: View Generated Visualizations

All visualizations have been saved as PNG files:
- time_series_plots.png
- correlation_matrix.png
- acf_pacf_plots.png
- temperature_model_comparison.png
- humidity_model_comparison.png
- pressure_model_comparison.png
- dewpoint_model_comparison.png
- future_forecast_visualization.png
- temperature_performance_comparison.png
- humidity_performance_comparison.png
- pressure_performance_comparison.png
- dew point_performance_comparison.png
- methodology_flowchart.png

## Conclusion

This analysis successfully:
1. ✓ Generated realistic IoT sensor data (300 entries)
2. ✓ Performed statistical analysis and preprocessing
3. ✓ Trained 3 time series models (ARIMA, SARIMA, GARCH) for 4 parameters
4. ✓ Evaluated models using RMSE, MAE, MAPE, and R² metrics
5. ✓ Generated future forecasts (240 predictions)
6. ✓ Created comprehensive visualizations
7. ✓ Generated methodology flowchart

**Best Models:**
- Temperature: GARCH (RMSE: 0.5855)
- Humidity: ARIMA (RMSE: 1.4674)
- Pressure: ARIMA (RMSE: 0.1199)
- Dew Point: ARIMA (RMSE: 0.3908)