The naive approach to weather forecasting is to make the assumption that the probabilities of weather states are uncorrelated: if we have a long term estimate of the number of rainy days, versus snowy days, versus sunny days, we can simply make an estimate based on their long term probability.

In [1]:
from __future__ import division,print_function

import numpy as np
N_sunny = 159.6875
N_rainy = 140.677
N_snowy = 64.635

p = np.array([N_sunny,N_rainy,N_snowy])
p/=p.sum()
weather_tomorrow = np.random.choice(['Sunny','Rainy','Snowy'],p=p)
print(weather_tomorrow)


Sunny


We could produce our long-term forecast by just sampling randomly over many times

In [2]:
weather = np.random.choice(['Sunny','Rainy','Snowy'],10000,p=p)
print(weather[:50])
print(sum(weather=='Sunny')/len(weather),p[0])


['Sunny' 'Sunny' 'Sunny' 'Sunny' 'Sunny' 'Rainy' 'Rainy' 'Sunny' 'Rainy'
 'Snowy' 'Snowy' 'Rainy' 'Sunny' 'Sunny' 'Rainy' 'Snowy' 'Rainy' 'Snowy'
 'Snowy' 'Sunny' 'Rainy' 'Snowy' 'Rainy' 'Snowy' 'Sunny' 'Rainy' 'Snowy'
 'Sunny' 'Rainy' 'Sunny' 'Rainy' 'Snowy' 'Sunny' 'Rainy' 'Sunny' 'Sunny'
 'Snowy' 'Sunny' 'Sunny' 'Sunny' 'Sunny' 'Snowy' 'Rainy' 'Snowy' 'Sunny'
 'Sunny' 'Sunny' 'Sunny' 'Snowy' 'Sunny']
0.4381 0.43750059931588947


This is clearly a silly model for short term forecasting (although quite close to what is actually used for long-term forecasting).  

A better model might be that the weather tomorrow is predicted by the weather today, or 
$$
W_{t+1} = f(W_t).
$$
Imagine that we collected long-term statistics, and found that there is a conditional probability table given by:

In [3]:
A = np.array([[0.5,0.4,0.1],[0.2,0.5,0.3],[0.8,0.1,0.1]])

where entry $A_{kj}$ represents the probability of tomorrow's state being the $j$-th state, given that we are in the $k$-th state now.  Thus, if use the ordering Sunny,Rainy,Snowy, we have that the probability that it will be snowy tomorrow if it's sunny today is 0.1, the probability that it will get sunny tomorrow if it's snowing today is 0.8, and so on.  

Now if we want to make a prediction of tomorrow's state, we can use this transition matrix.  Let's imagine that today's weather is observed to be sunny, which gives us the row vector
$$
P(W_t) = [1,0,0].
$$
To assess the probability of tomorrow's weather, we can (right)-multiply this by the transition matrix

In [4]:
PW_t = np.array([1,0,0])
np.dot(PW_t,A)

array([0.5, 0.4, 0.1])

Of course, the most probable case for the weather tomorrow is again sunny, which we could have read from our transition matrix.  What about the weather after two days?  One of the nice properties of the transition matrix is that we can make predictions later by taking powers of the transition matrix:  $(A\times A)_{kj}$ is the probability that we will be in state $k$ in two days, given that we are in state $j$ now.   

In [5]:
np.dot(PW_t,np.dot(A,A))

array([0.41, 0.41, 0.18])

Now, our probabilities are more ambiguous.  As it turns out, as we take powers of the transition matrix, it converges to the long term probabilities of each state, the so-called stable distribution.  If had a long term record of the data, we could compute this empirically.

In [6]:
A100 = reduce(np.dot,[A]*100)
np.dot(PW_t,A100)

array([0.4375    , 0.38541667, 0.17708333])

If we look at the columns of $A$, we'll see that they are constants.  If we multiply any vector that sums to one (as our weather probabilities must) by this matrix, we'll just get the columns back again.  Thus, our initial data, that the weather was sunny, has diffused away, and our estimate reverts back to the frequencies from the data:

In [10]:
#print (A100)
print(np.dot(PW_t,A100))
print(p)

[0.4375     0.38541667 0.17708333]
[0.4375006  0.38541697 0.17708243]


Of course, taking high powers of the transition matrix is wasteful.  A better way is to recognize that for a stable state:
$$
P(W_{t+1}) = P(W_{t}) = P(W_{t}) A,
$$
which is to say that applying the transition matrix doesn't change our state probabilities.  A way to compute this special state (we'll call it $P(\hat{W}_t)$) is more easily seen by taking the transpose and defining $\lambda=1$:
$$
A^T P(\hat{W}_t) = \lambda P(\hat{W}_t).
$$
This is the equation for an eigenvector/value pair, with the eigenvalue fixed at 1 (a matrix where all the columns sum to one is guaranteed to have one of its eigenvalues be one).  We can compute this easily: 

In [11]:
w,v = np.linalg.eig(A.T)

Eigenvectors are non-unique, so we can just normalize the eigenvector associated with $\lambda=1$ to one, which leaves us the steady probabilities:

In [13]:
print (w[0].real)
p_stable = v[:,0].real/v[:,0].sum(axis=0).real
print (p_stable)

1.0000000000000009
[0.4375     0.38541667 0.17708333]


Which are the same as those we got by taking high powers of $A$.

Now let's use the transition matrix $A$ to generate some data.  We can initialize with our observation of today's weather as a prior.

In [14]:
states = ['Sunny','Rainy','Snowy']
W_i = 'Sunny'
weather_log = [W_i]

Now, we can simply loop over the number of days that we want to predict, and draw randomly based on our probability table.

In [15]:
for i in range(50):
    W_i = np.random.choice(states,p=A[states.index(W_i)])
    weather_log.append(W_i)

In [16]:
print(weather_log)

['Sunny', 'Rainy', 'Snowy', 'Sunny', 'Sunny', 'Snowy', 'Sunny', 'Sunny', 'Sunny', 'Sunny', 'Sunny', 'Rainy', 'Rainy', 'Sunny', 'Rainy', 'Rainy', 'Rainy', 'Sunny', 'Sunny', 'Rainy', 'Rainy', 'Rainy', 'Snowy', 'Sunny', 'Sunny', 'Rainy', 'Snowy', 'Sunny', 'Rainy', 'Rainy', 'Rainy', 'Sunny', 'Sunny', 'Snowy', 'Sunny', 'Rainy', 'Rainy', 'Rainy', 'Rainy', 'Rainy', 'Snowy', 'Sunny', 'Rainy', 'Rainy', 'Snowy', 'Sunny', 'Sunny', 'Snowy', 'Sunny', 'Sunny', 'Rainy']


A quick glance at this data indicates that after a snowy day, the weather almost invariably becomes sunny again, etc.  Thus the Markov model lets us model random sequences in which there should be explicit time dependency.