# Regression Model for ATMO Temperature Prediction 

Created by Aman Mishra and Visheshh Mundra 

## Fetch and Clean Data from IoT 

This section will first access ThingSpeak servers to fetch a certain history of data to learn from the Temperature and Humidity sensors set up inside and outside of a location. Once it receives the information, it will then clean improper entries with missing information and prepare the data for training.

In [45]:
import requests
import pandas as pd

# Define your ThingSpeak Channel details
channel_id = '2556720'
read_api_key = 'TF8W9CGKN47UWESL'
results = 50000  # Number of results to fetch

# Construct the URL for the ThingSpeak API
url = f'https://api.thingspeak.com/channels/{channel_id}/feeds.json?api_key={read_api_key}&results={results}'

# Fetch the data
response = requests.get(url)
data = response.json()

# Parse the data into a DataFrame
feeds = data['feeds']
df = pd.DataFrame(feeds)

# Extract relevant fields (assuming fields 1 to 4 are used for temperature and humidity from two sensors)
df = df[['created_at', 'field1', 'field2', 'field3', 'field4']]
df.columns = ['timestamp', 'sensor1_humidity', 'sensor1_temp', 'sensor2_humidity', 'sensor2_temp']

# Convert to appropriate data types
df['sensor1_humidity'] = pd.to_numeric(df['sensor1_humidity'])
df['sensor1_temp'] = pd.to_numeric(df['sensor1_temp'])
df['sensor2_humidity'] = pd.to_numeric(df['sensor2_humidity'])
df['sensor2_temp'] = pd.to_numeric(df['sensor2_temp'])

print(df.head(15))


               timestamp  sensor1_humidity  sensor1_temp  sensor2_humidity  \
0   2024-06-06T00:35:26Z               NaN           NaN          67.95340   
1   2024-06-06T00:35:42Z               NaN           NaN          67.96265   
2   2024-06-06T00:35:58Z               NaN           NaN          67.96159   
3   2024-06-06T00:36:13Z               NaN           NaN          68.05897   
4   2024-06-06T00:36:29Z               NaN           NaN          67.99745   
5   2024-06-06T00:36:45Z               NaN           NaN          67.99278   
6   2024-06-06T00:37:00Z          75.45586      19.44275               NaN   
7   2024-06-06T00:37:16Z          75.47913      19.44675               NaN   
8   2024-06-06T00:37:32Z          75.44203      19.44027               NaN   
9   2024-06-06T00:37:47Z          75.54178      19.44923               NaN   
10  2024-06-06T00:38:03Z          75.51546      19.44599               NaN   
11  2024-06-06T00:38:19Z          75.49429      19.43989        

In [46]:
# Backward fill the NaN values with the first non-NaN value encountered in each column
df['sensor1_humidity'].fillna(method='bfill', inplace=True)
df['sensor1_temp'].fillna(method='bfill', inplace=True)
df['sensor2_humidity'].fillna(method='bfill', inplace=True)
df['sensor2_temp'].fillna(method='bfill', inplace=True)

# Create the servo_temp column
# Normalize sensor1_temp to a range of 0-1
normalized_temp = (df['sensor1_temp'] - df['sensor1_temp'].min()) / (df['sensor1_temp'].max() - df['sensor1_temp'].min())

# Invert the normalized values
inverted_temp = 1 - normalized_temp

# Scale the inverted values to the range 18.00 to 22.00
df['servo_temp'] = 18.00 + (inverted_temp * 4.00)

print(df.head(15))  # Display the first 10 rows for review

               timestamp  sensor1_humidity  sensor1_temp  sensor2_humidity  \
0   2024-06-06T00:35:26Z          75.45586      19.44275          67.95340   
1   2024-06-06T00:35:42Z          75.45586      19.44275          67.96265   
2   2024-06-06T00:35:58Z          75.45586      19.44275          67.96159   
3   2024-06-06T00:36:13Z          75.45586      19.44275          68.05897   
4   2024-06-06T00:36:29Z          75.45586      19.44275          67.99745   
5   2024-06-06T00:36:45Z          75.45586      19.44275          67.99278   
6   2024-06-06T00:37:00Z          75.45586      19.44275          67.91039   
7   2024-06-06T00:37:16Z          75.47913      19.44675          67.91039   
8   2024-06-06T00:37:32Z          75.44203      19.44027          67.91039   
9   2024-06-06T00:37:47Z          75.54178      19.44923          67.91039   
10  2024-06-06T00:38:03Z          75.51546      19.44599          67.91039   
11  2024-06-06T00:38:19Z          75.49429      19.43989        

In [47]:
# Drop any remaining rows with NaN values
df.dropna(inplace=True)

## Train model for predictions 

This section is where the script will learn from the habits of the user and the outdoor seasons, and create a prediction of the user's future preferences for indoor temperatures.

We tested two regression models as we believe regression training will be the best choice over classification training becase we are trying to predict a value over a non-descrete range of temperatures.

In [84]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, accuracy_score
from sklearn.preprocessing import StandardScaler

In [85]:
# Prepare the dataset
X = df[['sensor1_humidity', 'sensor1_temp', 'sensor2_humidity']]
y = df['sensor2_temp']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model
modelLR = LinearRegression()
modelLR.fit(X_train, y_train)

# Standardize the features
scalerLR = StandardScaler()
X_train_scaled = scalerLR.fit_transform(X_train)
X_test_scaled = scalerLR.transform(X_test)

# Predict and evaluate
y_pred = modelLR.predict(X_test)
mse = mean_squared_error(y_test, y_pred)

print(f'Mean Squared Error: {mse}')


Mean Squared Error: 0.007266862608707839


In [86]:
# Prepare the dataset
X = df[['sensor1_humidity', 'sensor1_temp', 'sensor2_humidity']]
y = df['sensor2_temp']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Random Forest model
modelRFR = RandomForestRegressor(n_estimators=100, random_state=42)
modelRFR.fit(X_train, y_train)

# Standardize the features
scalerRFR = StandardScaler()
X_train_scaled = scalerRFR.fit_transform(X_train)
X_test_scaled = scalerRFR.transform(X_test)

# Predict and evaluate
y_pred = modelRFR.predict(X_test)
mse = mean_squared_error(y_test, y_pred)

print(f'Mean Squared Error: {mse}')


Mean Squared Error: 0.00027313443903056545


Since temperature preferences are not linear, a Linear Regression model for training and predicting is not ideal. So we will use Random Forest Regression.

## Prediction with Real Data 

In [75]:
# Example new data for prediction
new_data = {
    'sensor1_humidity': [65.0, 75.0, 55.0, 69.44],
    'sensor1_temp': [22.0, 32.0, 15.0, 19.17],
    'sensor2_humidity': [60.0, 40.0, 63.0, 63.52]
}
new_df = pd.DataFrame(new_data)

In [79]:
# Predict the temperature using Linear Regression
predicted_temp = modelLR.predict(new_df)
print(f'Predicted Sensor 2 Temperature using Linear Regression: {predicted_temp}')

Predicted Sensor 2 Temperature using Linear Regression: [21.66434562 29.92427125 16.23735307 20.00070898]


In [80]:
# Predict the temperature using Random Forest Regression
predicted_temp = modelRFR.predict(new_df)
print(f'Predicted Sensor 2 Temperature using Radom Forest: {predicted_temp}')

Predicted Sensor 2 Temperature using Radom Forest: [19.55642208 20.8607933  19.6588648  20.05720622]


## Result 

We conclude that the best method of predicting temperature for ATMO is Random Forest Regression and it is possible to export this model into further development.

In [88]:
import joblib

# Save the Random Forest model to a PKL file
joblib.dump(modelRFR, 'random_forest_model.pkl')

print("Random Forest model saved as random_forest_model.pkl")

Random Forest model saved as random_forest_model.pkl


In [87]:
# Save the scaler to a PKL file
joblib.dump(scalerRFR, 'scaler.pkl')

print("Scaler saved as scaler.pkl")

Scaler saved as scaler.pkl
