In [1]:
!git clone https://github.com/Ignas12345/Pattern-Recognition.git
!mv Pattern-Recognition/* ./

Cloning into 'Pattern-Recognition'...
remote: Enumerating objects: 49, done.[K
remote: Counting objects: 100% (49/49), done.[K
remote: Compressing objects: 100% (49/49), done.[K
remote: Total 49 (delta 19), reused 0 (delta 0), pack-reused 0[K
Receiving objects: 100% (49/49), 670.62 KiB | 5.04 MiB/s, done.
Resolving deltas: 100% (19/19), done.


Here is the main training loop. After each iteration, the means and variances of each gaussian (corresponding to the states of running, standing or walking) are printed. Also the transition matrix is printed.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
from GaussD import GaussD
import numpy as np
from MarkovChain import MarkovChain
from HMM import HMM

# So this function will be used to set up the paramaters for the gaussians at the start. We use the fact that we know what state each traing data sequence came from. We do not use this information when testing. So it's fine. Maybe.

def initialize_emissions(activity_data):
    standing_mean = activity_data[0]['Absolute acceleration (m/s^2)'].mean()
    standing_std = activity_data[0]['Absolute acceleration (m/s^2)'].std()

    walking_mean = activity_data[1]['Absolute acceleration (m/s^2)'].mean()
    walking_std = activity_data[1]['Absolute acceleration (m/s^2)'].std()

    running_mean = activity_data[2]['Absolute acceleration (m/s^2)'].mean()
    running_std = activity_data[2]['Absolute acceleration (m/s^2)'].std()

    standing_dist = [standing_mean, standing_std]
    walking_dist = [walking_mean, walking_std]
    running_dist = [running_mean, running_std]
    return standing_dist, walking_dist, running_dist

#Yeah, all of this is just accessing the training and testing data.

file_path_running_train = 'train_sets/Acceleration_without_g_running_train_1.xls'
file_path_standing_train = 'train_sets/Acceleration_without_g_standing_train_1.xls'
file_path_walking_train = 'train_sets/Acceleration_without_g_walking_train_1.xls'

file_path_running_test = 'test_sets/Acceleration_without_g_running_test_1.xls'
file_path_standing_test = 'test_sets/Acceleration_without_g_standing_test_1.xls'
file_path_walking_test = 'test_sets/Acceleration_without_g_walking_test_1.xls'

# Load the data from Excel files
running_train = pd.read_excel(file_path_running_train, engine='xlrd', usecols=["Absolute acceleration (m/s^2)"])
standing_train = pd.read_excel(file_path_standing_train, engine='xlrd', usecols=["Absolute acceleration (m/s^2)"])
walking_train = pd.read_excel(file_path_walking_train, engine='xlrd', usecols=["Absolute acceleration (m/s^2)"])

running_test = pd.read_excel(file_path_running_test, engine='xlrd', usecols=["Absolute acceleration (m/s^2)"])
standing_test = pd.read_excel(file_path_standing_test, engine='xlrd', usecols=["Absolute acceleration (m/s^2)"])
walking_test = pd.read_excel(file_path_walking_test, engine='xlrd', usecols=["Absolute acceleration (m/s^2)"])

#set up initial guesses for the transition matrix and the initial state array.

q = np.array([1/3, 1/3, 1/3])
A = np.array([
    [0.9, 0.05, 0.05],
    [0.05, 0.9, 0.05],
    [0.05, 0.05, 0.9]
])

#Set up the Markov chain and the initial guesses.

chain = MarkovChain(q, A)
training_data = [standing_train, walking_train, running_train]
initial_standing_distribution, initial_walking_distribution, initial_running_distribution = initialize_emissions(training_data)
standing_distribution = GaussD(means=[initial_standing_distribution[0]], stdevs=[initial_standing_distribution[1]])
walking_distribution = GaussD(means=[initial_walking_distribution[0]], stdevs=[initial_walking_distribution[1]])
running_distribution = GaussD(means=[initial_running_distribution[0]], stdevs=[initial_running_distribution[1]])
h = HMM(chain, [standing_distribution, walking_distribution, running_distribution])

combined_training_array = np.concatenate([
    standing_train['Absolute acceleration (m/s^2)'].values,
    walking_train['Absolute acceleration (m/s^2)'].values,
    running_train['Absolute acceleration (m/s^2)'].values
])

standing_testdata = standing_test['Absolute acceleration (m/s^2)'].values
walking_testdata = walking_test['Absolute acceleration (m/s^2)'].values
running_testdata = running_test['Absolute acceleration (m/s^2)'].values

# calculate the likelihoods of each observation for each state.
nStates = 3
nSamples = len(combined_training_array)
pX = np.zeros((nStates, nSamples))
scale_factors = np.zeros(nSamples)
for t in range(nSamples):
    for j, g in enumerate([standing_distribution, walking_distribution, running_distribution]):
        pX[j, t] = g.prob(combined_training_array[t])
# Yeah, this runs the training loop (the EM algorithm) to calculate the probabilities of each observation coming from each state and the probabilities of being in a certain state at a certain observation.
# This is done via forward and backward algorithms. Then, using EM update formulas from the book, the A (transition matrix) and B (the probability of being at a certain state at a given observation matrix) matrices are updated.
# Also, the parameters of the gaussians are updated in this.
# This is done for a few iterations.
h.train(combined_training_array, pX)
print('training is done!')

A-matrix [[9.79218751e-01 2.06223540e-02 1.58894505e-04]
 [3.04670267e-02 9.60018057e-01 9.51491634e-03]
 [2.02903750e-04 7.71061540e-03 9.92086481e-01]]
means 0.507439316703383
std 0.39473294736306214
means 2.5148214185400715
std 1.0231641662243183
means 14.873604845369034
std 8.34156310499677
Iteration 1 complete
A-matrix [[9.79530232e-01 2.04696824e-02 8.59619331e-08]
 [2.75514743e-02 9.66972232e-01 5.47629372e-03]
 [1.44547564e-07 4.44069857e-03 9.95559157e-01]]
means 0.4622175385520439
std 0.34585968705397474
means 2.3820458871341557
std 0.9462422232704037
means 14.672869625355364
std 8.373573055700525
Iteration 2 complete
A-matrix [[9.79049955e-01 2.09500446e-02 3.40147988e-11]
 [2.53843811e-02 9.69829438e-01 4.78618122e-03]
 [7.86265950e-11 4.04531456e-03 9.95954685e-01]]
means 0.42623153662186236
std 0.3112969045448223
means 2.2863239232539736
std 0.9400434384486926
means 14.586845446570683
std 8.390933239512211
Iteration 3 complete
A-matrix [[9.79783657e-01 2.02163427e-02 1.88

In the following cells the Viterbi algorithm is used with the learned matrices and parameters to calculate the most likely state sequence for each sequence of of the test data. Then the state which occurs most often is picked to classify the sequence as either running, walking or standing.

In [2]:
seq = h.viterbi(standing_testdata)
h.classify_sequence(seq)

The sequence is classified as standing!


In [5]:
seq = h.viterbi(running_testdata)
h.classify_sequence(seq)

The sequence is classified as running!


In [6]:
seq = h.viterbi(walking_testdata)
h.classify_sequence(seq)

The sequence is classified as walking!
