# Clustering Weather model

Clustering is a Machine Learning technique that involves the grouping of data points. In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. (https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68)

Basic Algorithm for K-Means:
* Step 1: Randomly pick K points to place K centroids
* Step 2: Assign all the data points to the centroids by distance. The closest centroid to a point is the one it is assigned to.
* Step 3: Average all the points belonging to each centroid to find the middle of those clusters (center of mass). Place the corresponding centroids into that position.
* Step 4: Reassign every point once again to the closest centroid.
* Step 5: Repeat steps 3-4 until no point changes which centroid it belongs to.

In [15]:
import tensorflow as tf
import tensorflow_probability as tfp

# Weather Model
We will model a simple weather system and try to predict the temperature on each day given the following information.

1. Cold days are encoded by a 0 and hot days are encoded by a 1.
2. The first day in our sequence has an 80% chance of being cold.
3. A cold day has a 30% chance of being followed by a hot day.
4. A hot day has a 20% chance of being followed by a cold day.
5. On each day the temperature is normally distributed with mean and standard deviation 0 and 5 on a cold day and mean and standard deviation 15 and 10 on a hot day.

In [16]:
tfd = tfp.distributions  # making a shortcut for later on
initial_distribution = tfd.Categorical(probs=[0.2, 0.8])  # Refer to point 2 above
transition_distribution = tfd.Categorical(probs=[[0.7, 0.3],
                                                 [0.2, 0.8]])  # refer to points 3 and 4 above
observation_distribution = tfd.Normal(loc=[0., 15.], scale=[5., 10.])  # refer to point 5 above

# the loc argument represents the mean and the scale is the standard devitation

Hidden Markov model distribution.

Inherits From: Distribution


tfp.distributions.HiddenMarkovModel(\
    initial_distribution,\
    transition_distribution,\
    observation_distribution,\
    num_steps,\
    validate_args=False,\
    allow_nan_stats=True,\
    time_varying_transition_distribution=False,\
    time_varying_observation_distribution=False,\
    mask=None,\
    name='HiddenMarkovModel'\
)

In [17]:
model = tfd.HiddenMarkovModel(
    initial_distribution=initial_distribution,
    transition_distribution=transition_distribution,
    observation_distribution=observation_distribution,
    num_steps=7)

The number of steps represents the number of days that we would like to predict information for. In this case we've chosen 7, an entire week.

To get the expected temperatures on each day we can do the following.

In [18]:
mean = model.mean()

# due to the way TensorFlow works on a lower level we need to evaluate part of the graph
# from within a session to see the value of this tensor

# in the new version of tensorflow we need to use tf.compat.v1.Session() rather than just tf.Session()
with tf.compat.v1.Session() as sess:  
  print(mean.numpy())

[12.       10.5       9.75      9.375001  9.1875    9.093751  9.046875]


I0000 00:00:1730194695.414918   90987 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3620 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3050 6GB Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6


Days:
1. 12
2. 10.5
3. 9.75
4. 9.37
5. 9.1875
7. 9.09