Using environmental variable data and machine learning methods, we seek to identify room occupancy. We are using the UCI ML Repository's Occupancy Detection Dataset for this. Here, the time-stamped images of environmental factors including temperature, humidity, light, and CO2 are taken every minute to determine the ground truth occupancy. The use of an ML algorithm as opposed to a real-world PIR sensor will be cheaper and require no upkeep. This could be helpful in the HVAC industry (Heating, Ventilation, and Air Conditioning).
Three data sets are submitted, for training and testing. Ground-truth occupancy was obtained from time stamped pictures that were taken every minute. For the journal publication, the processing R scripts can be found in:. Data Link
Dataset has the following columns:
- date: the datetime of collection, format as year-month-day hour:minute:second
- Temperature: room temperature in celsius
- Humidity: the percentage of humidity of the air
- Light: Intensity of illumination measured in lux
- CO2: co2 parts per million - ppm
- HumidityRatio: Derived quantity from temperature and relative humidity, in kgwater-vapor/kg-air
- Occupancy: Categorical value 0 or 1, 0 for not occupied, 1 for occupied status
Here is a snapshot of the data
date | Temperature | Humidity | Light | CO2 | HumidityRatio | Occupancy |
---|---|---|---|---|---|---|
2015-02-04 17:51:00 | 23.18 | 27.2720 | 426.0 | 721.25 | 0.004793 | 1 |
2015-02-04 17:51:59 | 23.15 | 27.2675 | 429.5 | 714.00 | 0.004783 | 1 |
2015-02-04 17:53:00 | 23.15 | 27.2450 | 426.0 | 713.50 | 0.004779 | 1 |
2015-02-04 17:54:00 | 23.15 | 27.2000 | 426.0 | 708.25 | 0.004772 | 1 |
2015-02-04 17:55:00 | 23.10 | 27.2000 | 426.0 | 704.50 | 0.004757 | 1 |
The date is not important columns, so it is dropped.
To explore more about data, the following questions need to be answered:
- What is the distribution of Temperature column?
- What is the distribution of Humidity column?
- What is the distribution of Light column?
- What is the distribution of CO2 column?
- What is the distribution of HumidityRatio column?
- What is the distribution of Occupancy column?
- What is the correlation & distribution between all features together?
- Set up the data with required columns
- Split data into train set & test set
- Scaling using standard scaler
- Fitting Logistic Regression Model
- Predict test data 1 & 2
- Accuracy score
- Classification report (precision, recall, f1-score)
- Confusion matrix & Heatmap
- ROC curve for 3 sets
- Fitting Linear Support Vector Classifier Model
- Predict test data 1 & 2
- Accuracy score
- Classification report (precision, recall, f1-score)
- Confusion matrix & Heatmap
- ROC curve for 3 sets
- Fitting DecisionTreeClassifier Model
- Predict test data 1 & 2
- Accuracy score
- Classification report (precision, recall, f1-score)
- Confusion matrix & Heatmap
- ROC curve for 3 sets