Joseph Bruno, Stacey Van, Vu Dinh

CMPE 255 Term Project / Group 2

## **Part 2: PEM-SF_prototype**
This notebook contains the prototype that classifies day-of-week based on specific traffic inputs.
<br> Follow steps outline in the README.md file for details.

**Required Files:**
1. PEMS-SF_prototype.ipynb
2. PEMS_test.txt
3. PEMS_trainlabels.txt
4. guess_1_train.txt
5. guess_1_test.txt
6. guess_2_train.txt
7. guess_2_test.txt
8. guess_3_train.txt
9. guess_3_test.txt
10. random_forest_model.pkl

### **Import Python Libraries**

In [264]:
import joblib
import numpy as np
import pandas as pd
from sklearn.metrics import classification_report, accuracy_score

### **Load Model from Part 1**

In [266]:
# load the model
loaded_model = joblib.load('random_forest_model.pkl')

#### **Ensure proper formating**

In [268]:
def load_labels(file_path): # load and clean labels
    """
    1. Load labels with pandas. 
    2. Clean format: brackets, whitespace. 
    """
    df = pd.read_csv(file_path, header=None, sep=r'\s+')  # read the file as a single column 
    df = df.apply(lambda col: col.astype(str).str.replace('[', '', regex = False).str.replace(']', '', regex = False).str.strip())  # removes brackets, strip extra whitespace in columns

    labels = df.values.flatten().astype(int) # convert to integers
    return labels

# parse data files
def parse_pems_data(file_path):
    """
    1. Parse each line to show day's time-series data 
    2. Format: MATLAB matrix syntax
    """
    data = []
    with open(file_path, 'r') as file:
        for line in file: # convert MATLAB matrix syntax to NumPy array
            line = line.strip().replace('[', '').replace(']', '')
            rows = line.split(';')
            matrix = np.array([[float(x) for x in row.split()] for row in rows])
            data.append(matrix)
    return np.array(data)

#### **Assign numerical values with day-of-week**

In [270]:
day_names = {1: 'Sunday', 2: 'Monday', 3: 'Tuesday', 4: 'Wednesday', 5: 'Thursday', 6: 'Friday', 7: 'Saturday'}

#### **Load files and print datasets shape**

In [272]:
# prototype example
test_file = "PEMS_test.txt"
test_label_file = "PEMS_testlabels.txt"
first_guess_file = "guess_1_test.txt"
first_guess_label_file = "guess_1_label.txt"
second_guess_file = "guess_2_test.txt"
second_guess_label_file = "guess_2_label.txt"
third_guess_file = "guess_3_test.txt"
third_guess_label_file = "guess_3_label.txt"

# parse
test_data = parse_pems_data(test_file)
first_guess_test_data = parse_pems_data(first_guess_file)
second_guess_test_data = parse_pems_data(second_guess_file)
third_guess_test_data = parse_pems_data(third_guess_file)

# load 
test_labels = load_labels(test_label_file)
first_guess_test_labels = load_labels(first_guess_label_file)
second_guess_test_labels = load_labels(second_guess_label_file)
third_guess_test_labels = load_labels(third_guess_label_file)

# print
print("Test data shape:", test_data.shape)
print("Test labels shape:", test_labels.shape)
print("\nFirst guess data shape:", first_guess_test_data.shape)
print("First guess label shape:", first_guess_test_labels.shape)
print("\nSecond guess data shape:", second_guess_test_data.shape)
print("Second guess label shape:", second_guess_test_labels.shape)
print("\nThird guess data shape:", third_guess_test_data.shape)
print("Third guess label shape:", third_guess_test_labels.shape)

Test data shape: (18, 963, 144)
Test labels shape: (18,)

First guess data shape: (1, 963, 144)
First guess label shape: (1,)

Second guess data shape: (1, 963, 144)
Second guess label shape: (1,)

Third guess data shape: (1, 963, 144)
Third guess label shape: (1,)


#### **F1-score and Classification Report for Baseline Model**

In [274]:
# predictions
test_pred = loaded_model.predict(test_data.reshape(test_data.shape[0], -1))
print("Baseline Accuracy:", accuracy_score(test_labels, test_pred))
print("\nClassification Report:\n", classification_report(test_labels, test_pred))

Baseline Accuracy: 0.9444444444444444

Classification Report:
               precision    recall  f1-score   support

           1       1.00      1.00      1.00         3
           2       1.00      1.00      1.00         3
           3       1.00      1.00      1.00         3
           4       1.00      0.75      0.86         4
           5       0.67      1.00      0.80         2
           6       1.00      1.00      1.00         1
           7       1.00      1.00      1.00         2

    accuracy                           0.94        18
   macro avg       0.95      0.96      0.95        18
weighted avg       0.96      0.94      0.95        18



# **Prototype Demonstration**
### &nbsp;&nbsp;&nbsp;&nbsp;The moment we've all been waiting for...

In [276]:
print("\033[1mAre you smarter than an AI model?\033[0m")
print("\033[1mWelcome to AI Trivial!!!!\033[0m")

[1mAre you smarter than an AI model?[0m
[1mWelcome to AI Trivial!!!![0m


## **Senario:**
#### &nbsp;&nbsp;&nbsp;&nbsp;While taking a stroll in downtown San Francisco, you happen to stumble upon a data file. 

#### &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Strangely enough, the single sample contains exactly 963 features (same as test data) 
#### &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ANDD exactly 144 time dimensions (what a coincidence..) 
<br>
&nbsp;&nbsp;&nbsp;&nbsp; [<i>** Queue the sound track: Wild Pokemon Battle **</i>]

### **Try #1**

In [279]:
first = first_guess_test_data.reshape(first_guess_test_data.shape[0], -1)
print(f'\033[1mFrom this data can you guess what day it is?\033[0m\n')
print(f'{first[0, :144]}')

[1mFrom this data can you guess what day it is?[0m

[0.0203 0.0218 0.0234 0.0256 0.0256 0.0212 0.0209 0.0236 0.0223 0.021
 0.0219 0.0205 0.0229 0.0253 0.0222 0.025  0.0262 0.0224 0.0196 0.0233
 0.022  0.0244 0.0247 0.0254 0.0275 0.0239 0.0308 0.0321 0.0337 0.0348
 0.0358 0.0449 0.0454 0.0578 0.0618 0.0768 0.0712 0.0931 0.086  0.0672
 0.0721 0.0684 0.0643 0.0669 0.069  0.064  0.0621 0.0631 0.063  0.0637
 0.0768 0.071  0.0715 0.0755 0.0656 0.0605 0.0741 0.0689 0.0542 0.0472
 0.0603 0.0632 0.0678 0.0684 0.072  0.0744 0.0757 0.0692 0.0659 0.0675
 0.069  0.0553 0.0659 0.0703 0.0751 0.076  0.0773 0.0707 0.0685 0.063
 0.0666 0.062  0.0713 0.0653 0.0716 0.071  0.0694 0.0616 0.0729 0.0599
 0.0683 0.0745 0.0638 0.0709 0.0609 0.0641 0.0681 0.0684 0.067  0.0637
 0.0654 0.0611 0.0686 0.0679 0.0668 0.0613 0.0628 0.0624 0.0671 0.0646
 0.0611 0.0563 0.0605 0.0594 0.0458 0.0563 0.0446 0.0494 0.0437 0.0467
 0.0429 0.0448 0.0396 0.04   0.0405 0.0398 0.0373 0.0375 0.0371 0.0373
 0.0387 0.0349 0.0327 0.0

In [280]:
print(f'\033[1mNow it is the AI models turn\033[0m\n')
first_guess_test_pred = loaded_model.predict(first_guess_test_data.reshape(first_guess_test_data.shape[0], -1))
print("Baseline Accuracy:", accuracy_score(first_guess_test_labels, first_guess_test_pred))
print("\nClassification Report:\n", classification_report(first_guess_test_labels, first_guess_test_pred))
first_value = first_guess_test_pred[0]
print(f'\033[1mWhat day is it: {day_names[first_value]}\033[0m')

[1mNow it is the AI models turn[0m

Baseline Accuracy: 1.0

Classification Report:
               precision    recall  f1-score   support

           6       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1

[1mWhat day is it: Friday[0m


### **Try #2**

In [282]:
second = second_guess_test_data.reshape(second_guess_test_data.shape[0], -1)
print(f'\033[1mFrom this data can you guess what day it is?\033[0m\n')
print(f'{second[0, :144]}')

[1mFrom this data can you guess what day it is?[0m

[0.012  0.0134 0.0116 0.0146 0.0113 0.0118 0.0097 0.0076 0.0085 0.0073
 0.0092 0.01   0.0096 0.0105 0.0078 0.0087 0.0097 0.0097 0.0069 0.0087
 0.0093 0.0075 0.0082 0.0101 0.0091 0.0083 0.0118 0.015  0.0114 0.0117
 0.0155 0.0163 0.0188 0.0226 0.0209 0.0181 0.0188 0.0185 0.0257 0.0308
 0.0329 0.0279 0.0325 0.0323 0.0345 0.0306 0.0356 0.0382 0.0404 0.0368
 0.0378 0.0447 0.0459 0.0498 0.0489 0.0471 0.043  0.049  0.0539 0.0595
 0.0516 0.0548 0.0537 0.0585 0.0598 0.0592 0.0635 0.064  0.0565 0.0625
 0.0661 0.0703 0.0598 0.0628 0.0745 0.0659 0.07   0.0687 0.063  0.0621
 0.07   0.054  0.0685 0.0619 0.0577 0.0597 0.0673 0.0658 0.0673 0.0599
 0.0637 0.0551 0.0546 0.0644 0.0679 0.0517 0.0697 0.0594 0.0663 0.0601
 0.0627 0.0561 0.0573 0.056  0.0575 0.0577 0.0578 0.0482 0.0527 0.0597
 0.0554 0.0473 0.0509 0.0523 0.0459 0.038  0.0326 0.04   0.0395 0.0405
 0.0344 0.0394 0.0341 0.0323 0.0414 0.0357 0.0368 0.0371 0.0355 0.0337
 0.0373 0.0326 0.0313 0

In [283]:
print(f'\033[1mNow it is the AI models turn\033[0m\n')
second_guess_test_pred = loaded_model.predict(second_guess_test_data.reshape(second_guess_test_data.shape[0], -1))
print("Baseline Accuracy:", accuracy_score(second_guess_test_labels, second_guess_test_pred))
print("\nClassification Report:\n", classification_report(second_guess_test_labels, second_guess_test_pred))
second_value = second_guess_test_pred[0]
print(f'\033[1mWhat day is it: {day_names[second_value]}\033[0m')

[1mNow it is the AI models turn[0m

Baseline Accuracy: 1.0

Classification Report:
               precision    recall  f1-score   support

           7       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1

[1mWhat day is it: Saturday[0m


### **Try #3**

In [285]:
third = third_guess_test_data.reshape(third_guess_test_data.shape[0], -1)
print(f'\033[1mFrom this data can you guess what day it is?\033[0m\n')
print(f'{third[0, :144]}')

[1mFrom this data can you guess what day it is?[0m

[0.0243 0.0224 0.0182 0.0189 0.0248 0.0163 0.0232 0.0229 0.0171 0.0178
 0.0214 0.0231 0.0227 0.0196 0.0246 0.0227 0.0277 0.0196 0.021  0.0262
 0.0245 0.0245 0.027  0.0294 0.0275 0.0318 0.0322 0.0318 0.0397 0.0334
 0.0413 0.0436 0.0498 0.0665 0.0647 0.0665 0.0833 0.0876 0.0888 0.2484
 0.1889 0.0987 0.0594 0.062  0.0551 0.059  0.0637 0.0546 0.0623 0.0681
 0.075  0.0715 0.0619 0.0701 0.0688 0.0557 0.0601 0.0565 0.0652 0.0677
 0.0568 0.0673 0.0682 0.0642 0.0671 0.0711 0.0635 0.0587 0.0662 0.068
 0.077  0.0634 0.071  0.0629 0.0596 0.0661 0.0684 0.0614 0.0634 0.0705
 0.0681 0.0705 0.0723 0.0623 0.0633 0.063  0.0652 0.06   0.0695 0.074
 0.0668 0.0682 0.0627 0.0628 0.0665 0.0609 0.0625 0.0599 0.0643 0.051
 0.0652 0.0604 0.0708 0.0681 0.061  0.0704 0.0576 0.0687 0.0582 0.0625
 0.0676 0.0636 0.0548 0.0591 0.0505 0.0445 0.0481 0.0459 0.0462 0.0415
 0.0384 0.0354 0.0397 0.0401 0.0364 0.0376 0.0375 0.0326 0.0348 0.035
 0.0335 0.0292 0.033  0.033

In [286]:
print(f'\033[1mNow it is the AI models turn\033[0m\n')
third_guess_test_pred = loaded_model.predict(third_guess_test_data.reshape(third_guess_test_data.shape[0], -1))
print("Baseline Accuracy:", accuracy_score(third_guess_test_labels, third_guess_test_pred))
print("\nClassification Report:\n", classification_report(third_guess_test_labels, third_guess_test_pred))
third_value = third_guess_test_pred[0]
print(f'\033[1mWhat day is it: {day_names[third_value]}\033[0m')

[1mNow it is the AI models turn[0m

Baseline Accuracy: 1.0

Classification Report:
               precision    recall  f1-score   support

           3       1.00      1.00      1.00         1

    accuracy                           1.00         1
   macro avg       1.00      1.00      1.00         1
weighted avg       1.00      1.00      1.00         1

[1mWhat day is it: Tuesday[0m


### &nbsp;&nbsp;&nbsp;&nbsp;**Woohoo. How did you do?**
#### &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;**Congrats on completing "Are you smarter than an AI Model?"!!!**