# LSTM Model

This Long Short-Term Memory (LSTM) deep learning model is designed to learn and classify patterns in indoor air quality dynamics rather than identifying specific substances. Using time series data from three gas sensors along with temperature and humidity, the model captures how air conditions evolve over time during normal household activity and during events that degrade air quality, such as poor ventilation, cooking, or the presence of harmful gases. By modeling temporal dependencies, the LSTM can distinguish brief, harmless fluctuations from sustained or abnormal changes that may impact occupant health.

The model runs in the cloud and processes incoming data in near real time, producing air quality state predictions that are smoothed over multiple windows before triggering alerts. Model performance is evaluated using classification metrics such as accuracy and F1-score, and results are visualized as predicted air quality states over time.


### Environment Set Up

In [1]:
!git clone https://github.com/aladenisun/MSAAI_530_FinalProject

Cloning into 'MSAAI_530_FinalProject'...
remote: Enumerating objects: 48, done.[K
remote: Counting objects: 100% (48/48), done.[K
remote: Compressing objects: 100% (40/40), done.[K
remote: Total 48 (delta 9), reused 29 (delta 4), pack-reused 0 (from 0)[K
Receiving objects: 100% (48/48), 990.96 KiB | 3.20 MiB/s, done.
Resolving deltas: 100% (9/9), done.


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
!pip install -q -r /content/MSAAI_530_FinalProject/requirements.txt

In [4]:
import sys
import os

def in_colab():
    return "COLAB_GPU" in os.environ or "google.colab" in sys.modules

if in_colab():
    # Running in Google Colab
    repo_path = "/content/MSAAI_530_FinalProject"
    data_path = "/content/MSAAI_530_FinalProject/data"

    # Set working directory to the repo root
    os.chdir(repo_path)

else:
    # Running locally in VS Code
    repo_path = os.path.abspath(os.path.join(os.getcwd(), ".."))
    data_path = os.path.abspath(os.path.join(repo_path, "data"))

    # Add repo root to Python path
    if repo_path not in sys.path:
        sys.path.append(repo_path)

    # Set working directory to the repo root
    os.chdir(repo_path)

print("Using repo path:", repo_path)
print("Using data path:", data_path)
print("CWD:", os.getcwd())

Using repo path: /content/MSAAI_530_FinalProject
Using data path: /content/MSAAI_530_FinalProject/data
CWD: /content/MSAAI_530_FinalProject


In [12]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import keras
import os

from sklearn.preprocessing import StandardScaler

### Create Train, Validation, and Test Datasets

We will train the model on a segment of our data and then measure its performance on simulated streaming data another segment of the data. Using a chronological 80/20 spilt for the training and validation sets. The test split will come from the training data as part of the tensorflow LSTM model call.

In [21]:
dataset_csv = os.path.join(data_path, "Cleaned_HT_Sensor_Dataset.csv")

# Load CSVs
df = pd.read_csv(dataset_csv, delimiter=",").dropna()

# Focus on the first three sensors, temp, and humidity to match our system design
feature_cols = ["R1", "R2", "R3", "Temp", "Humidity"]
df_model = df[feature_cols].copy()

print(df_model.tail())

# Chronological 80/20 split
split_idx = int(np.floor(0.8 * len(df_model)))

train_df = df_model.iloc[:split_idx].reset_index(drop=True)
val_df   = df_model.iloc[split_idx:].reset_index(drop=True)

# Implemented scailing because of temp and humidity
scaler = StandardScaler()
train_scaled = scaler.fit_transform(train_df.values)
val_scaled   = scaler.transform(val_df.values)

print(train_scaled.shape)
print(val_scaled.shape)


            R1       R2       R3     Temp  Humidity
12810  13.0313  9.49602  9.56483  26.1461   56.9043
12811  13.0312  9.49625  9.56518  26.1456   56.9057
12812  13.0317  9.49630  9.56567  26.1450   56.9069
12813  13.0313  9.49633  9.56628  26.1446   56.9080
12814  13.0314  9.49603  9.56666  26.1441   56.9090
(10252, 5)
(2563, 5)
