# Monitoring IoT Sensor Data
## Data Subscriber Simulation

This part of code simulates an IoT data subscriber that receives data sent out by a publisher via a MQTT broker. Note a subscriber doesn't work without a publisher or a broker.

See IoT and sensor data.ipynb for more details about the publisher and broker.

 - The subscriber continuously receives the data. For each latest record r received, apply the 3NN classification to the last 5 records before r, and compare the classification result with the Outcome label in r.
   - Repeat this for 1000 times, and report the number of correct classifications. 

In [1]:
import paho.mqtt.client as mqtt
import time
import ast  # Import for safe dictionary conversion
from collections import deque  # Import for deque
import numpy as np  # Import numpy for array manipulation
from sklearn.neighbors import KNeighborsClassifier  # Import the KNN classifier


In [2]:
# Initialize a deque to store the last 6 records (5 previous + 1 current)
records = deque(maxlen=6)

# Define the on_connect function to handle connection events
def on_connect(client, userdata, flags, rc):
    print("Connected with result code", rc)
    # Subscribe to the "Records" topic
    client.subscribe("Records")

# Initialize a counter for correct classifications
correct_classifications = 0
total_iterations = 1000

# Define the on_message function to handle incoming messages
def on_message(client, userdata, msg):
    global records, correct_classifications, total_iterations
    # Decode the message payload
    raw_data = msg.payload.decode()
    print("Raw received data:", raw_data)  # Debugging step

    try:
        # Convert string representation of dictionary to actual dictionary
        data = ast.literal_eval(raw_data)

        # Extract expected features from the dictionary
        record = [
            data["Glucose"],
            data["BloodPressure"],
            data["Insulin"],
            data["Outcome"]
        ]

        # Store the record in the deque and process records as before
        records.append(record)

        # If we have at least 6 records, we can perform the 3NN classification
        if len(records) == 6:
            # Convert records to a numpy array
            records_array = np.array(records)
            # Split the features and the outcome
            X = records_array[:-1, :-1]  # Last 5 records, all features except the last column
            y = records_array[:-1, -1]   # Last 5 records, only the last column (Outcome)
            r = records_array[-1, :-1]   # Latest record, all features except the last column
            actual_outcome = records_array[-1, -1]  # Latest record, only the last column (Outcome)

            # Debugging steps to print the training data and new record
            print("Training Data (X):", X)
            print("Training Labels (y):", y)
            print("New Record (r):", r)

            # Create and train the 3NN classifier
            knn = KNeighborsClassifier(n_neighbors=3)
            knn.fit(X, y)
            # Predict the outcome for the latest record
            predicted_outcome = knn.predict([r])[0]

            # Print the predicted and actual outcomes
            print(f"Predicted Outcome: {predicted_outcome}, Actual Outcome: {actual_outcome}")
            print(total_iterations)

            # Check if the prediction is correct
            if predicted_outcome == actual_outcome:
                correct_classifications += 1

            # Decrement the total iterations counter
            total_iterations -= 1

         
    except Exception as e:
        print("Error:", e)

# Create MQTT client and define event handlers
mqttc = mqtt.Client()
mqttc.on_connect = on_connect
mqttc.on_message = on_message

# Connect to MQTT broker
mqttc.connect("mqtt.eclipseprojects.io", 1883, 60)
# Start the loop to process network traffic and dispatch callbacks
# Start the MQTT loop in a separate thread
mqttc.loop_start()

# Wait until total_iterations reaches zero
while total_iterations > 0:
    time.sleep(0.1)  # Small delay to prevent high CPU usage

# Stop and disconnect after 1000 iterations
mqttc.loop_stop()
mqttc.disconnect()

print(f"Final Result: {correct_classifications} correct classifications out of 1000.")

Connected with result code 0
Raw received data: {'Glucose': 0.7989949748743719, 'BloodPressure': 0.5245901639344263, 'Insulin': 0.0, 'Outcome': 0}
Raw received data: {'Glucose': 0.9045226130653267, 'BloodPressure': 0.5409836065573771, 'Insulin': 0.0, 'Outcome': 1}
Raw received data: {'Glucose': 0.7336683417085427, 'BloodPressure': 0.45901639344262296, 'Insulin': 0.0, 'Outcome': 0}
Raw received data: {'Glucose': 0.35678391959798994, 'BloodPressure': 0.5737704918032788, 'Insulin': 0.0, 'Outcome': 0}
Raw received data: {'Glucose': 0.5175879396984925, 'BloodPressure': 0.5409836065573771, 'Insulin': 0.0, 'Outcome': 1}
Raw received data: {'Glucose': 0.5276381909547738, 'BloodPressure': 0.0, 'Insulin': 0.0, 'Outcome': 0}
Training Data (X): [[0.79899497 0.52459016 0.        ]
 [0.90452261 0.54098361 0.        ]
 [0.73366834 0.45901639 0.        ]
 [0.35678392 0.57377049 0.        ]
 [0.51758794 0.54098361 0.        ]]
Training Labels (y): [0. 1. 0. 0. 1.]
New Record (r): [0.52763819 0.        

found 0 physical cores < 1
  File "c:\Users\cbech\anaconda3\Lib\site-packages\joblib\externals\loky\backend\context.py", line 282, in _count_physical_cores
    raise ValueError(f"found {cpu_count_physical} physical cores < 1")


Raw received data: {'Glucose': 0.5175879396984925, 'BloodPressure': 0.6557377049180328, 'Insulin': 0.09692671394799054, 'Outcome': 0}
Training Data (X): [[0.90452261 0.54098361 0.        ]
 [0.73366834 0.45901639 0.        ]
 [0.35678392 0.57377049 0.        ]
 [0.51758794 0.54098361 0.        ]
 [0.52763819 0.         0.        ]]
Training Labels (y): [1. 0. 0. 1. 0.]
New Record (r): [0.51758794 0.6557377  0.09692671]
Predicted Outcome: 0.0, Actual Outcome: 0.0
999
Raw received data: {'Glucose': 0.507537688442211, 'BloodPressure': 0.4098360655737705, 'Insulin': 0.0425531914893617, 'Outcome': 0}
Training Data (X): [[0.73366834 0.45901639 0.        ]
 [0.35678392 0.57377049 0.        ]
 [0.51758794 0.54098361 0.        ]
 [0.52763819 0.         0.        ]
 [0.51758794 0.6557377  0.09692671]]
Training Labels (y): [0. 0. 1. 0. 0.]
New Record (r): [0.50753769 0.40983607 0.04255319]
Predicted Outcome: 0.0, Actual Outcome: 0.0
998
Raw received data: {'Glucose': 0.44221105527638194, 'BloodPr

The first run the final Result was 623 correct classifications out of 1000.
Which indicates the model is learning, but 62.3% is not much more reliable than random guessing