# Kalman Filter for Air Quality
see also 'Optimum Linear Estimation' at https://www.sciencedirect.com/topics/social-sciences/kalman-filter

- Replaces a multiple linear regression (MLR) batch run, runs recursively on an Arduino Atmega1284P as adaptive filter for the MLR regression coefficients
- Kalman Filter is derived from https://github.com/zziz/kalman-filter
- For theory, please read https://en.wikipedia.org/wiki/Kalman_filter
- Key is to set the covariance of the process noise matrix to a zero matrix! See hint in 'Optimum Linear Estimation' at https://www.sciencedirect.com/topics/social-sciences/kalman-filter.
- I send a big thank you to those who provided these great basis contributions
- Click on the button 'Re-start the kernel, and then re-run the whole notebook' above

## Prerequisites (only necessary if you use interactive graphs with '%matplotlib widgets', see below)

- For LINUX operating systems the following software installations are necessary for getting interactive matplotlib graphs working.
- For other operating system, please check in the internet for appropriate solutions.
- Please run the following commands from the LINUX command line:
- Install anaconda and pip if not yet installed
- Exit jupyter lab, shutdown the jupyterlab notebook server
> pip install ipympl<br/>
> conda install -y nodejs<br/>
> pip install --upgrade jupyterlab<br/>
> jupyter labextension list<br/>
> jupyter lab clean --all<br/>
> jupyter labextension install @jupyter-widgets/jupyterlab-manager<br/>
> jupyter labextension install jupyter-matplotlib<br/>
> jupyter nbextension enable --py widgetsnbextension<br/>
> jupyter lab build<br/>
> jupyter labextension list<br/>
- Then re-start jupyter lab



## Basic Kalman Filter class from https://github.com/zziz/kalman-filter:

In [1]:
class KalmanFilter(object):
    def __init__(self, F = None, B = None, H = None, Q = None, R = None, P = None, x0 = None):

        if(F is None or H is None):
            raise ValueError("Set proper system dynamics.")

        self.n = F.shape[1]
        self.m = H.shape[1]

        self.F = F
        self.H = H
        self.B = 0 if B is None else B
        self.Q = np.eye(self.n) if Q is None else Q
        self.R = np.eye(self.n) if R is None else R
        self.P = np.eye(self.n) if P is None else P
        self.x = np.zeros((self.n, 1)) if x0 is None else x0

    def predict(self, u = 0):
        self.x = np.dot(self.F, self.x) + np.dot(self.B, u)             # Predicted (a priori) state estimate
        self.P = np.dot(np.dot(self.F, self.P), self.F.T) + self.Q      # Predicted (a priori) estimate covariance
        return self.x

    def update(self, z):
        y = z - np.dot(self.H, self.x)                                  # Innovation or measurement pre-fit residual
        S = self.R + np.dot(self.H, np.dot(self.P, self.H.T))           # Innovation (or pre-fit residual) covariance
        #print("\nUpdate: self.H = ", self.H)
        #print("\nUpdate: self.P = ", self.P)
        #print("\nUpdate: self.R = ", self.R)
        K = np.dot(np.dot(self.P, self.H.T), np.linalg.inv(S))          # Optimal Kalman gain
        #print("\nUpdate: Kalman gain matrix K  = ", K)
        self.x = self.x + np.dot(K, y)
        I = np.eye(self.n)
        
        self.P = np.dot(np.dot(I - np.dot(K, self.H), self.P), (I - np.dot(K, self.H)).T) + np.dot(np.dot(K, self.R), K.T) # Updated (a posteriori) estimate covariance 

## Read historian.csv (same input file as 'Multiple linear regression for BME680 gas readings of a single sensor.ipynb' is using

In [2]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import pandas as pd
from datetime import datetime

import numpy as np


dateparse = lambda x: pd.datetime.strptime(x, '%d.%m.%Y %H:%M:%S,%f')
  
df0 = pd.read_csv("historian.csv", sep=';', thousands=".", decimal=",", skiprows = [0,1,2],dtype={'High': np.float64, 'Low': np.float64}, header = None, encoding= 'unicode_escape',  parse_dates=[0], date_parser=dateparse, names = [ 'Datum', 'Mode', 'raw_gas_resistance', 'relative_humidity', 'temperature'])

# print first 5 lines of the pandas dataframe

df0.head(19)


Unnamed: 0,Datum,Mode,raw_gas_resistance,relative_humidity,temperature
0,2021-01-15 19:00:00.000,2,177160,28.9,24.6
1,2021-01-15 19:00:56.674,2,177160,28.9,24.6
2,2021-01-15 19:00:56.683,2,177160,29.6,24.6
3,2021-01-15 19:00:56.688,2,174900,29.6,24.6
4,2021-01-15 19:05:29.488,2,174900,29.6,24.6
5,2021-01-15 19:05:29.498,2,174900,30.1,24.6
6,2021-01-15 19:05:29.504,2,175280,30.1,24.6
7,2021-01-15 19:10:02.302,2,175280,30.1,24.6
8,2021-01-15 19:10:02.306,2,175280,29.7,24.6
9,2021-01-15 19:10:02.312,2,175860,29.7,24.6


In [3]:
# keep every 3rd row (CCU historian is tacking every change of a datapoint separately)
df = df0[(df0.index % 3 == 0)]

df.head(7)

Unnamed: 0,Datum,Mode,raw_gas_resistance,relative_humidity,temperature
0,2021-01-15 19:00:00.000,2,177160,28.9,24.6
3,2021-01-15 19:00:56.688,2,174900,29.6,24.6
6,2021-01-15 19:05:29.504,2,175280,30.1,24.6
9,2021-01-15 19:10:02.312,2,175860,29.7,24.6
12,2021-01-15 19:14:35.120,2,173780,29.9,24.6
15,2021-01-15 19:19:07.932,2,170320,30.3,24.6
18,2021-01-15 19:23:40.751,2,167060,30.7,24.6


Print values of Pandas dataframe. Please x-check if they meet your expectation!

In [4]:
df.values

array([[Timestamp('2021-01-15 19:00:00'), 2, 177160, 28.9, 24.6],
       [Timestamp('2021-01-15 19:00:56.688000'), 2, 174900, 29.6, 24.6],
       [Timestamp('2021-01-15 19:05:29.504000'), 2, 175280, 30.1, 24.6],
       ...,
       [Timestamp('2021-01-19 16:41:43.161000'), 2, 112700, 38.5, 24.9],
       [Timestamp('2021-01-19 16:46:15.860000'), 2, 112860, 38.5, 24.9],
       [Timestamp('2021-01-19 16:50:48.569000'), 2, 113640, 38.6, 24.9]],
      dtype=object)

In [5]:
df.head(-1)

Unnamed: 0,Datum,Mode,raw_gas_resistance,relative_humidity,temperature
0,2021-01-15 19:00:00.000,2,177160,28.9,24.6
3,2021-01-15 19:00:56.688,2,174900,29.6,24.6
6,2021-01-15 19:05:29.504,2,175280,30.1,24.6
9,2021-01-15 19:10:02.312,2,175860,29.7,24.6
12,2021-01-15 19:14:35.120,2,173780,29.9,24.6
...,...,...,...,...,...
3660,2021-01-19 16:28:04.985,2,111520,38.4,24.9
3663,2021-01-19 16:32:37.689,2,112140,38.5,24.9
3666,2021-01-19 16:37:10.407,2,112460,38.4,24.9
3669,2021-01-19 16:41:43.161,2,112700,38.5,24.9


## Formulas for calculating the absolute humidity

In [6]:
import numpy as np
# Create a function that  calculates the absolute humidity from the two arguments 'temperature' and 'relative humidity'
# see for details https://www.kompf.de/weather/vent.html or  https://rechneronline.de/barometer/luftfeuchtigkeit.php for x-checking the calculated result

a = 6.112
b = 17.67
c = 243.5

# Compute saturated water vapor pressure in hPa
# Param t - temperature in °C
def svp(t):
  svp = a * np.exp((b*t)/(c+t))
  return svp

# Compute actual water vapor pressure in hPa
# Param rh - relative humidity in %
# Param t - temperature in °C
def vp(rh, t):
  vp = rh/100. * svp(t)
  return vp

# Compute the absolute humidity in g/m³
# Param rh - relative humidity in %
# Param t - temperature in °C
def calculate_absolute_humidity(t, rh):
  mw = 18.016 # kg/kmol (Molekulargewicht des Wasserdampfes)
  rs = 8314.3 # J/(kmol*K) (universelle Gaskonstante)
  ah = 10**5 * mw/rs * vp(rh, t)/(t + 273.15)
  #return the absolute humidity in [g/m³]
  return ah

# now apply the above defined formulas to get the pandas dataframe column 'absolute_humidity'
df['absolute_humidity'] = calculate_absolute_humidity(df['temperature'], df['relative_humidity'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['absolute_humidity'] = calculate_absolute_humidity(df['temperature'], df['relative_humidity'])


## Plot the calculated absolute humidity

In [7]:
%matplotlib widget
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter
from matplotlib.ticker import (MultipleLocator, FormatStrFormatter, AutoMinorLocator)
fig, ax = plt.subplots(figsize=(12, 12))
plt.xticks(rotation=60)
ax.xaxis.set_major_formatter(DateFormatter('%b %d %Y %H:%M'))
ax.xaxis.set_minor_locator(AutoMinorLocator())
ax.plot_date(df['Datum'], df['absolute_humidity'], linestyle='solid', marker=" ", color='brown')         
plt.title('absolute humidity [g/m³]', fontsize=18)
plt.xlabel('time', fontsize=14)
plt.ylabel('absolute humidity [g/m³]', fontsize=14)
plt.grid(True)
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Print again the first 5 lines of the Pandas dataframe. Check if a column for the absolute humidity has been added.

In [8]:
df.head()

Unnamed: 0,Datum,Mode,raw_gas_resistance,relative_humidity,temperature,absolute_humidity
0,2021-01-15 19:00:00.000,2,177160,28.9,24.6,6.504326
3,2021-01-15 19:00:56.688,2,174900,29.6,24.6,6.661871
6,2021-01-15 19:05:29.504,2,175280,30.1,24.6,6.774402
9,2021-01-15 19:10:02.312,2,175860,29.7,24.6,6.684377
12,2021-01-15 19:14:35.120,2,173780,29.9,24.6,6.72939


Create a subset of the measurement data: 'raw_gas_resistance','temperature','absolute_humidity'

In [9]:
my_observations = df[['raw_gas_resistance','temperature','absolute_humidity']] 
my_observations.head()

Unnamed: 0,raw_gas_resistance,temperature,absolute_humidity
0,177160,24.6,6.504326
3,174900,24.6,6.661871
6,175280,24.6,6.774402
9,175860,24.6,6.684377
12,173780,24.6,6.72939


Create a numpy array of measurements for further processing

In [10]:
list_of_rows = [list(row) for row in my_observations.values]

Print the first four elements of list of lists i.e. rows

In [11]:
print(list_of_rows[:4])

[[177160.0, 24.6, 6.504326360105633], [174900.0, 24.6, 6.661870597201618], [175280.0, 24.6, 6.77440219512732], [175860.0, 24.6, 6.684376916786758]]


Convert the selection of measurements to a numpy array

In [12]:
np.array(list_of_rows)
measurements = np.array(list_of_rows)
print("number of measurement datapoints = ", len(measurements))

number of measurement datapoints =  1226


## Set the parameters of the Kalman filter

- Kalman filter with a zero covariance matrix for the process noise is well known a the recursive minimum least-square error (LMMSE) filter for a linear system with some assumptions on auto- and cross-correlations of process and measurement noise and initial state.

- observation vector y                         :   [raw_gas_resistance]; n=1; note: 'temperature' and 'aH' are NOT part of the observation vector! 
- system state vector X                        :   [VOC_resistance, alpha_temperature, beta_ah, delta_intercept]; m=4
- state transition matrix F                    :   identity matrix (m, n)
- observation transition matrix H              :   initial identidy matrix (1,m); then set to state dependant
- covariance matrix of the process noise Q     :   zero matrix (m,m)
- covariance matrix of the observation noise R :   matrix(1,1) with very small value

In [13]:
F = np.eye(4)
H = np.array([ [1, 1, 1, 1] ]).reshape(1, 4)
# key ist to set Q to a zero matrix, in this case the Kalman filter works an ordinary least squares minimizer filter
Q = np.array([ [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0] ]).reshape(4, 4)
# set covariance of gast resistance measurements also to a very small value
R = np.array([ [0.00000001] ]).reshape(1, 1)

print("\nF = ",F)  # the state-transition model;
print("\nInitial H = ",H)  # the observation model;
print("\nQ = ",Q)  # covariance of the process noise
print("\nR = ",R)  # covariance of the observation noise



F =  [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

Initial H =  [[1 1 1 1]]

Q =  [[0 0 0 0]
 [0 0 0 0]
 [0 0 0 0]
 [0 0 0 0]]

R =  [[1.e-08]]


## Initialize the Kalman filter

In [14]:
kf = KalmanFilter(F = F, H = H, Q = Q, R = R)
predictions = []
compensated_gas_resistance=[]
states=[]

#print("raw gas resistance measurements =", measurements[:,0])

print("dim measurements : ", measurements.shape)

last_index = len(measurements)

print ("last index of measurement array = ", last_index)



dim measurements :  (1226, 3)
last index of measurement array =  1226


## Run the Kalman filter

In [15]:
it = 0  # iteration index
#print("\nState vector kf.x= ", kf.x)
for z in measurements:
    zg = z[0] # raw_gas_resistance
    # make observation model matrix state dependant
    H = np.array([[1, z[1], z[2], 1]]).reshape(1, 4)
    # z[1]: measured temperature
    # z[2]: calculated absolute humidity ah(T, rH)
    # estimated state vector x:
    # x[0]: estimated VOC resistance
    # x[1]: estimated regression coefficient for T temperature dependency
    # x[2]: estimated regression coefficient for aH aboslute humidity dependency
    # x[3]: estimated intercept of linear regression
    kf.H = H
    it = it + 1
    #print("\nState vector kf.x= ", kf.x)
    #print results for the last sample of the measurement sequence
    if ((it == last_index)):  # print results of last measurement index 
        print ("\nIteration index = ", it)
        print ("\n")
        print("\nState vector kf.x= ", kf.x)
        print("\nObservation vector z = ", z)
        print("\nObservation transition matrix kf.H = ", kf.H)
        print("\nKalman filter prediction = ", kf.predict())
        print("\nKalman filter update = ",np.dot(H,  kf.predict()))
        print ("\n\n")
    predictions.append(np.dot(H,  kf.predict()))
   
    compensated_gas_resistance.append(zg-kf.predict()[1,0]*z[1]-kf.predict()[2,0]*z[2])
    #compensated_gas_resistance.append(-kf.predict()[1,0]*z[1]-kf.predict()[2,0]*z[2])
    #compensated_gas_resistance.append(-kf.predict()[1,0]*z[1])
    print("\nraw gas resistance                = ",zg)
    print("\ntemperature coefficent prediction = ",kf.predict()[1,0])
    print("\ntemperature                       = ",z[1])
    print("\ntemperature compensation          = ",-kf.predict()[1,0]*z[1])
    print("\nhumidity coefficent prediction    = ",kf.predict()[2,0])
    print("\nabsolute humidity                 = ",z[2])
    print("\nhumidity compensation             = ",-kf.predict()[2,0]*z[2])
    #print("\nKalman state prediction          = ",kf.predict())
    #print("\ntmperature coefficent prediction = ",kf.predict()[1,0])
    #print("\ncompensated gas resistance       = ",zg-kf.predict()[1,0]*z[1]-kf.predict()[2,0]*z[2])
    states.append(kf.x)
    kf.update(zg)  #only zg raw_gas_resistance is an observation variable!


raw gas resistance                =  177160.0

temperature coefficent prediction =  0.0

temperature                       =  24.6

temperature compensation          =  -0.0

humidity coefficent prediction    =  0.0

absolute humidity                 =  6.504326360105633

humidity compensation             =  -0.0

raw gas resistance                =  174900.0

temperature coefficent prediction =  6710.334714765191

temperature                       =  24.6

temperature compensation          =  -165074.2339832237

humidity coefficent prediction    =  1774.236055706469

absolute humidity                 =  6.661870597201618

humidity compensation             =  -11819.731012005897

raw gas resistance                =  175280.0

temperature coefficent prediction =  10958.321227917626

temperature                       =  24.6

temperature compensation          =  -269574.7022067736

humidity coefficent prediction    =  -14345.163290546509

absolute humidity                 =  6.774402195

## Plot the results of the Kalman filter

### Plot measured gas resistance versus corrected gas resistance (compensation of temperature and humidity interference)

In [16]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 12))
ax.plot(range(len(measurements)), measurements[:,0], label = 'Measurements gas resistance R_raw')
ax.plot(range(len(predictions)), compensated_gas_resistance[:], label = 'corrected Kalman Filter Prediction R_raw_corrected')
ax.legend()
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

## Plot alpha (temperature coefficient) and beta (aH coefficient) regression coefficients

In [17]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 12))
ax.plot(range(len(predictions)), np.array(states)[:,1], label = 'alpha (temperature coefficient)')
ax.plot(range(len(predictions)), np.array(states)[:,2], label = 'beta (aH coefficient)')
ax.legend()
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

# Regression results of the recursive minimum least-square error (LMMSE) Kalman filter

In [18]:
print("\n\nLinear regression coefficient of temperature interference alpha_LMMSE      = %11.3lf" % kf.predict()[1][0])
print("Linear regression coefficient of absolute humidity interference beta_LMMSE = %11.3lf" % kf.predict()[2][0])
print("Linear regression intercept delta_LMMSE                                    = %11.3lf" % (kf.predict()[0][0]+ kf.predict()[3][0]))
print("\n\n")



Linear regression coefficient of temperature interference alpha_LMMSE      =    7601.788
Linear regression coefficient of absolute humidity interference beta_LMMSE =  -30282.263
Linear regression intercept delta_LMMSE                                    =  165457.980





# You are done! Congratulations!