# Predicting Weather Using Hidden Markov Model (HMM)

This notebook demonstrates how to predict weather states from weather data using a Hidden Markov Model (HMM).

## 1. Import Required Libraries

We will use numpy, pandas, matplotlib for data handling and visualization, and hmmlearn for building the HMM.

In [None]:
# Import Required Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from hmmlearn import hmm
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

## 2. Load and Explore Weather Data

Load your weather dataset, display basic statistics, and visualize key features.

In [None]:
# Load and Explore Weather Data
# Replace 'weather.csv' with your actual data file path
df = pd.read_csv('weather.csv')

# Display first few rows
print(df.head())

# Display basic statistics
print(df.describe())

# Visualize key features
plt.figure(figsize=(10,5))
df['Temperature'].plot(title='Temperature Over Time')
plt.xlabel('Time')
plt.ylabel('Temperature')
plt.show()

if 'Weather' in df.columns:
    plt.figure(figsize=(6,4))
    df['Weather'].value_counts().plot(kind='bar', title='Weather State Distribution')
    plt.xlabel('Weather State')
    plt.ylabel('Count')
    plt.show()

## 3. Preprocess Data for HMM

Handle missing values, normalize or scale features, and select relevant columns for modeling.

In [None]:
# Preprocess Data for HMM

# Handle missing values
df = df.dropna()

# Select relevant features (e.g., Temperature, Humidity)
features = ['Temperature', 'Humidity']
X = df[features].values

# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

## 4. Encode Weather States

Convert categorical weather states (e.g., 'Sunny', 'Rainy') into numerical labels for HMM compatibility.

In [None]:
# Encode Weather States

if 'Weather' in df.columns:
    le = LabelEncoder()
    y = le.fit_transform(df['Weather'])
    weather_states = le.classes_
    print("Encoded Weather States:", dict(zip(weather_states, range(len(weather_states)))))
else:
    print("No 'Weather' column found. Please ensure your dataset contains a 'Weather' column.")

## 5. Split Data into Training and Testing Sets

Divide the dataset into training and testing sets to evaluate the model's performance.

In [None]:
# Split Data into Training and Testing Sets

# For HMM, we need sequences. Here, we use all data as one sequence.
# Alternatively, you can split into multiple sequences if your data allows.

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42, stratify=y
)

## 6. Train Hidden Markov Model

Use hmmlearn's GaussianHMM or MultinomialHMM to fit the model on the training data.

In [None]:
# Train Hidden Markov Model

# For continuous features, use GaussianHMM
n_components = len(np.unique(y_train))  # Number of weather states

model = hmm.GaussianHMM(n_components=n_components, covariance_type='diag', n_iter=100, random_state=42)
model.fit(X_train)

print("HMM trained with", n_components, "hidden states.")

## 7. Predict Weather States

Apply the trained HMM to predict weather states on the test data and decode the hidden states.

In [None]:
# Predict Weather States

# Predict hidden states for test data
hidden_states = model.predict(X_test)

# Map hidden states to weather labels (best effort, as HMM states may not directly match labels)
# We'll use majority voting to assign each hidden state to the most common true label in training
from collections import Counter

state_label_map = {}
for state in range(n_components):
    idx = (model.predict(X_train) == state)
    if np.any(idx):
        most_common = Counter(y_train[idx]).most_common(1)[0][0]
        state_label_map[state] = most_common
    else:
        state_label_map[state] = 0  # fallback

# Convert hidden states to predicted labels
y_pred = np.array([state_label_map[s] for s in hidden_states])

## 8. Evaluate Model Performance

Calculate accuracy, confusion matrix, and other metrics to assess prediction quality.

In [None]:
# Evaluate Model Performance

acc = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred, target_names=weather_states)

print("Accuracy:", acc)
print("Confusion Matrix:\n", cm)
print("Classification Report:\n", report)

## 9. Visualize Predictions

Plot actual vs predicted weather states to visualize the model's performance.

In [None]:
# Visualize Predictions

plt.figure(figsize=(12,4))
plt.plot(y_test, label='Actual', marker='o')
plt.plot(y_pred, label='Predicted', marker='x')
plt.title('Actual vs Predicted Weather States')
plt.xlabel('Sample')
plt.ylabel('Weather State (Encoded)')
plt.legend()
plt.show()

---

**Note:**  
- Replace `'weather.csv'` with your actual weather data file path.
- Ensure your dataset contains columns like 'Temperature', 'Humidity', and 'Weather'.
- You may need to adjust feature selection and preprocessing based on your specific dataset.