# Report for 2D Project Physical World and Digital World

Cohort: 9

Team No.: 8

Members:

Chan Jun Wei - 1002920

Gabriel Chan Zheng Yong - 1002820

Lo An Guo - 1003142

Khairunnisa Bte Kunhimohamed N - 1002508

Yoo Fei Yi - 1003102

# Introduction

Contact thermometers cannot directly measure the temperature of their surroundings, instead they measure their own temperature. Hence, when measuring the temperature of any object with a contact thermometer, heat must flow between the thermometer and the object until thermal equilibrium is approached. Only then will the temperature of the thermometer be sufficiently close to the object's actual temperature. The rate of heat flow determines the speed at which the thermometer will approach thermal equilibrium. The heat flow is in turn determined by the thermal conductance between the thermometer and the object. Conventional thermometers can take around 60 seconds to deliver an accurate reading.

However, the system of the thermometer and the object can, to a first-order approximation, be modelled by Newton's law. The application of Newton's law yields a fist-order differential equation describing the evolution of the system over time. In other words, it is possible to find a relationship between the behaviour of the system in a shorter time period and the final temperature. This allows for the predicition of the final temperature without having to wait for thermal equilibrium to be established.

Linear regression is used to predict the final temperature and the average accuracy of the model was measured to be 97.4%.

# Description of Data from Experiment

 ## Data Collection

To obtain the dataset, a hollow styrofoam sphere was filled halfway with hot water (estimated 2 litre volume, the sphere was not completely filled to due issues with leakage). This is to ensure that minimal heat loss to the surroundings occurs during the measurement session, and that the water is as close as possible to an ideal heatsink as modelled in the physical analysis. A loop was used to continuously read the thermometer temperature and the associated timing for 100 seconds as fast as the thermometer allows (around once every 0.88 seconds). The measurements are repeated over a range of temperatures between 10 and 60 degrees Celsius until 12 sets of measurements are obtained. The reference cooking thermometer has been confirmed in earlier tests to agree with the readings of the temperature sensor, therefore the final temperature of the temperature sensor is used as the final temperature.


## Data Preparation

In order to put the data into a form suitable for training a regression model, the measurements were truncated such that the time at 0 seconds corresponds to the moment the thermometer is immersed in the water and its temperature starts increasing. The final temperature for each measurement is set as the temperature attained at the end of the measurement session. This stage was performed manually. The excel file is included.

## Data Format

The dataset consists of 12 sets of temperature-time measurements with final temperatures ranging from 10 to 60 degrees Celsius.  For training the model, only the first 10 readings (corresponding to the first 8 seconds) of each dataset is used since the model is expected to be predictive. The initial temperature is then subtracted from each measurement set. This is because the solution to the Newton's law equation can be expressed as an equation giving the temperature change as an exponential function of time. The final temperature can be reconstructed with the initial temperature and the predicted final temperature change. This means the initial temperature is the first feature.

The truncated measurements are then fit to a quadratic function of time and the initial time derivative of the temperature change is obtained as the coefficient of the linear term. Newton's law states that the rate of heat transfer (and by extension the rate of change of the temperature) between objects is proportional to the temperature difference. Therefore the initial time derivative is expected to be a linear function of the final temperature change. This gives the second feature.

![scatter.png](attachment:scatter.png)

The code for preprocessing the data is given below.



In [17]:
def preprocess(df):
    time = df.values[:9,0].reshape(-1,1); temp = df.values[:9,1].reshape(-1,1)
    delta_t = temp - temp[0][0]
    poly_time = PolynomialFeatures(2,include_bias=False).fit_transform(time)
    maclaurin = linear_model.LinearRegression()
    maclaurin.fit(poly_time,delta_t)
    init_grad = maclaurin.coef_[0][0]; init_temp = temp[0][0]
    return [init_grad, init_temp]


# Training Model

To train the model, the dataset was first preprocessed and split into training and testing sets in a 10:2 ratio. The model, implemented usind a class, is trained on the extracted features and reference final temperatures. The train method of the class subtracts the initial temperatures from the final temperatures and uses the difference to train the internal scikit linear regression model. The predict method of the class uses the input gradient feature to predict the final temperature change, then adds the initial temperature feature to the result to obtain the predicted final temperature. The class definition is given below.

In [18]:
class tempmodel:
    def __init__(self):
        self.model = linear_model.LinearRegression()
        
    def train(self,feat_list,target_list):
        grad_list = feat_list[:,0].reshape(-1,1)
        target_list -= feat_list[:,1].reshape(-1,1)
        self.model.fit(grad_list,target_list)
        return self.model.coef_
        
    def predict(self,features):
        init_grad = features[0]; init_temp = features[1]
        delta_t_final = self.model.predict(init_grad) + init_temp
        return delta_t_final[0][0]

# Verification and Accuracy

The testing set as split in the previous section is used to validate the model. As there are only 2 sets of temperature data in the testing set. The test is repeated 20 times with different random seeds and the accuracy of the 40 tests averaged. The average accuracy was 97.4%. The complete code used to train and test the accuracy is included in the accompanying modeltraining.py file.

However, as this is a linear model, the model will only work for the sensor used to train it. This is because the coefficient of the linearisation of the physical model contains terms dependent on the physical parameters (more specifically, the time constant) of the sensor - if a different sensor is used with different thermal conductivity, the model will fail to predict accurately. The time constant is also assumed to be constant, but in reality may vary. In cases where the time constant is different or variable, an alternative is to directly fit the theorectical curve itself to a small window of datapoints. The alternative fitting function using stochastic gradient descent is given below. A W12PredTherm.py file containing the function is also included. Since this method fits the theorectical curve itself with the time constant and final temperatures as parameters, it is expected to work on the instructor dataset as well. 
The accuracy of this method when used on the first 15 measurements of the sample data yielded an accuracy of 97.7% when compared to the sensor final temperature, and 98.9% when compared to the alcohol thermometer value. This alternative function is used in the physical demonstration.

The disadvantage of this method is that stochastic gradient descent requires a significant number of iterations to converge. When run on the raspberry pi in a continuous loop (so that the program can predict the temperature continuously), the program can only sample once every 2.3 seconds. As it was set to predict using the previous 10 measurements, the program took 23 seconds to predict the final temperature. The prediction can be made faster by reducing the number of measurements used, but likely at the expense of accuracy. Alternatively, this method can be used to measure the new time constant, which is then used to correct the output of the pretrained linear model. This is possible as a detailed mathematical analysis will show that the final temperature is proportional to inital gradient of the temperature change (the selected feature) by the time constant. The correction multiplier is then the ratio of time constants. The time constant for the sample data was determined to be 19. However, this does not address the issue that the time constant varies with the temperature difference.

In [19]:
# This function accepts a variable number of temperature-time datapoints.
# The learning rate is calibrated for 10 temperature-time inputs.
def tempfit(t,T): # t is a python list of timings, T is a list of temperatures, Ts is the final temperature and is returned in
                  # the first element of W (weights)
    Tpi = T[0]; tau = 18.; Ts = float(Tpi + tau*(T[1]-T[0])/(t[1]-t[0]))
    W = np.asarray([Ts,tau]).reshape(-1,1) #W = [[Ts],[tau]]  tau is the time constant
    
    grad_mag = 1; step = np.asarray([0,0]).reshape(-1,1); wdc = 0
    while grad_mag > 1e-6:
        if wdc > 25000:
            return W, loss_sum
        X = np.exp(-t/W[1][0]); tX = -((Tpi-W[0][0])/W[1][0]**2)*np.multiply(t,X); L = T - (Tpi-W[0][0])*X - W[0][0]; loss_sum = L.sum()
        XtX = np.concatenate((X.T,tX.T)); d_loss = 2*(np.matmul(XtX,L)+np.asarray([[-loss_sum],[0]]))
        step = 0.43*d_loss + 0.69*step
        W -= step; wdc += 1
        grad_mag = np.linalg.norm(d_loss)
        
        return W, loss_sum # loss_sum is used to determine the goodness of fit. The predicted value is rejected and
                            # the program displays 'predicting' if this value exceeds a set threshold (0.06).

# Testing Using Instructor's Data
(Student's Note: this won't work on the vanilla linear model)

Instruction:

* Store your trained model into a pickle object which can be loaded. 
* Read an excel file with the following format:
```
time (s)	reading
0.00	    25.812
0.90	    28.562
1.79	    31.875
2.68	    35.062
3.55	    37.937
4.43	    40.687
5.30	    43.25
```
where the first column indicates the time in seconds and the second column indicates the sensor reading in Celsius. 
* The number of rows in the instructors' data can be of any number. If your code has a minimum number of rows, your code must be able to handle and exit safely when the data provided is less than the required minimum.
* Write a code to prepare the data for prediction.
* Write a code to predict the final temperature.



In [20]:
# write a code to load your trained model from a pickle object
import pickle
from modeltraining import tempmodel

filename = 'model.p' # enter your pickle file name containing the model
with open(filename,'rb') as f:
    model = pickle.load(f)


In [21]:
# write a code to read an excel file
import pandas as pd
num_test = 9
filename = 'temp_' 
filekey = [] # instructors will key in this
dataframe = {} # this is to store the data for different temperature, the keys are in filekey
for idx in range(num_test):
    dataframe[filekey[idx]] = pd.read_excel(filename+filekey[idx]+'.xlsx')


IndexError: list index out of range

In [None]:
# write a code to prepare the data for predicting
def preprocess(df):
    # use this function to extract the features from the data frame
    time = df.values[:9,0].reshape(-1,1); temp = df.values[:9,1].reshape(-1,1)
    delta_t = temp - temp[0][0]
    poly_time = PolynomialFeatures(2,include_bias=False).fit_transform(time)
    maclaurin = linear_model.LinearRegression()
    maclaurin.fit(poly_time,delta_t)
    init_grad = maclaurin.coef_[0][0]; init_temp = temp[0][0]
    return [init_grad, init_temp]

data_test = {}
for key in filekey:
    data_test[key]=preprocess(dataframe[key])

In [None]:
# write a code to predict the final temperature
# store the predicted temperature in a variable called "predicted"
# predicted is a dictionary where the keys are listed in filekey

predicted = {}
for key in filekey:
    predicted[key]=model.predict(data_test[key])

In [None]:
# checking accuracy

# first instructor will load the actual temp from a pickle object
import pickle
error_d = {}
accuracy_percent_d = {}

for test in range(num_test):
    filename = 'data_'+filekey[test]+'.pickle'
    with open(filename,'rb') as f:
        final_temp, worst_temp = pickle.load(f)

    # then calculate the error
    error_final = abs(final_temp-predicted[filekey[test]])
    accuracy_final_percent = 100-error_final/final_temp*100
    error_worst = abs(worst_temp-predicted[filekey[test]])
    accuracy_worst_percent = 100-error_worst/worst_temp*100
    
    error_d[filekey[test]] = (error_final, error_worst)
    accuracy_percent_d[filekey[test]] = (accuracy_final_percent, accuracy_worst_percent)

    # displaying the error
    print('===================================')
    print('Testing: {}'.format(filekey[test]))
    print('Predicted Temp: {:.2f}'.format(predicted[filekey[test]]))
    print('Final Sensor Temp: {:.2f}, Alcohol Temp:{:.2f}'.format(final_temp, worst_temp))
    print('Error w.r.t Final Sensor Temp: {:.2f} deg, {:.2f}% accuracy'.format(error_final, accuracy_final_percent))
    print('Error w.r.t Alcohol Temp: {:.2f} deg, {:.2f}% accuracy'.format(error_worst, accuracy_worst_percent))
    
avg_final = sum([ final for final, worst in accuracy_percent_d.values()])/len(error_d.values())
avg_worst = sum([ worst for final, worst in accuracy_percent_d.values()])/len(error_d.values())
print('==============================')
print('Average accuracy for final sensor temp: {:.2f}'.format(avg_final))
print('AVerage accuracy for alcohol temp: {:.2f}'.format(avg_worst))
