# **Weather Markov Model Analysis**

**This notebook analyzes weather patterns using a Markov Chain model to predict future weather states based on historical data.**

In [1]:
import pandas as pd
import numpy as np
import random

## 1️. Load and Clean Data

Load the weather history data and remove any missing values in the Summary column.

In [2]:
df = pd.read_csv("../data/weatherHistory.csv")

df = df.dropna(subset=["Summary"])

## 2️. Map Weather Summary to 3 States

Classify weather summaries into three states: Sunny, Cloudy, and Rainy.

In [3]:
def map_weather(summary):
    summary = summary.lower()
    if "clear" in summary or "sunny" in summary:
        return "Sunny"
    elif "cloudy" in summary or "overcast" in summary:
        return "Cloudy"
    elif "rain" in summary or "drizzle" in summary or "storm" in summary:
        return "Rainy"
    else:
        return "Cloudy"

df["State"] = df["Summary"].apply(map_weather)

## 3️. Create Weather Sequence

Extract the weather states as a sequence for analysis.

In [4]:
weather_sequence = df["State"].tolist()

states = ["Sunny", "Cloudy", "Rainy"]

## 4️. Initial Probability Vector π₀

Calculate the initial state distribution based on weather frequency.

In [5]:
initial_counts = {s: 0 for s in states}

for state in weather_sequence:
    initial_counts[state] += 1

total = sum(initial_counts.values())
pi_0 = np.array([initial_counts[s] / total for s in states])

print("\nInitial State Distribution π0:")
for s, p in zip(states, pi_0):
    print(f"{s}: {round(p,4)}")


Initial State Distribution π0:
Sunny: 0.1129
Cloudy: 0.8859
Rainy: 0.0012


## 5️. Transition Count Matrix

Count the number of transitions between states.

In [6]:
transition_counts = {s: {s2: 0 for s2 in states} for s in states}

for i in range(len(weather_sequence) - 1):
    curr_state = weather_sequence[i]
    next_state = weather_sequence[i+1]
    transition_counts[curr_state][next_state] += 1

transition_matrix = pd.DataFrame(transition_counts).T
print("\nTransition Count Matrix:")
print(transition_matrix)


Transition Count Matrix:
        Sunny  Cloudy  Rainy
Sunny    7576    3314      0
Cloudy   3314   82118     18
Rainy       0      18     94


## 6️. Transition Probability Matrix

Normalize the transition counts to get probabilities.

In [7]:
transition_prob = transition_matrix.div(transition_matrix.sum(axis=1), axis=0)
print("\nTransition Probability Matrix:")
print(transition_prob)


Transition Probability Matrix:
           Sunny    Cloudy     Rainy
Sunny   0.695684  0.304316  0.000000
Cloudy  0.038783  0.961006  0.000211
Rainy   0.000000  0.160714  0.839286


## 7️. Steady State Distribution

Calculate the long-term probability distribution using eigenvalue decomposition.

In [8]:
A_np = transition_prob.values

# Eigen decomposition
eigenvalues, eigenvectors = np.linalg.eig(A_np.T)

# Find eigenvector corresponding to eigenvalue ~ 1
idx = np.argmin(np.abs(eigenvalues - 1))

steady_state = eigenvectors[:, idx].real
steady_state = steady_state / steady_state.sum()

print("\nSteady-State Distribution:")
for s, p in zip(states, steady_state):
    print(f"{s}: {round(p,4)}")


Steady-State Distribution:
Sunny: 0.1129
Cloudy: 0.8859
Rainy: 0.0012


## 8️. Predict Next Day

Create a function to predict the next weather state based on the current state.

In [9]:
def predict_next_day(current_state, transition_prob):
    next_states = list(transition_prob.columns)
    probabilities = transition_prob.loc[current_state].values
    return random.choices(next_states, probabilities)[0]

current_state = "Sunny"
next_day = predict_next_day(current_state, transition_prob)
print(f"\nTomorrow prediction if today is {current_state}: {next_day}")


Tomorrow prediction if today is Sunny: Sunny


## 9️. Generate Future Weather

Generate a sequence of predicted weather for multiple days.

In [10]:
def generate_weather_sequence(start_state, days, transition_prob):
    sequence = [start_state]
    current_state = start_state
    for _ in range(days - 1):
        next_state = predict_next_day(current_state, transition_prob)
        sequence.append(next_state)
        current_state = next_state
    return sequence

weather_7days = generate_weather_sequence("Sunny", 7, transition_prob)
print("\nPredicted Weather for 7 Days:")
print(weather_7days)


Predicted Weather for 7 Days:
['Sunny', 'Cloudy', 'Cloudy', 'Cloudy', 'Cloudy', 'Cloudy', 'Cloudy']


## 10. Compare Initial and Steady-State Distributions

Compare the initial state probabilities with the long-term steady-state distribution.

In [11]:
print("\nComparison Between Initial and Steady-State:\n")
for i, s in enumerate(states):
    print(f"{s}: π0 = {round(pi_0[i],4)} , Steady = {round(steady_state[i],4)}")


Comparison Between Initial and Steady-State:

Sunny: π0 = 0.1129 , Steady = 0.1129
Cloudy: π0 = 0.8859 , Steady = 0.8859
Rainy: π0 = 0.0012 , Steady = 0.0012
