
Use the diabetes.csv dataset to do the following:
1. Select the following 4 attributes (3 features + 1 class label) :
• Glucose, BloodPressure, Insulin, Outcome
2. Normalize Glucose, BloodPressure and Insulin to [0, 1] using MinMax.
3. Store the new data (3 normalized features + 1 class label) in another dataset S.
4. Modify the MQTT example to do the following:
• The publisher publishes records in S continuously. When it reaches the end of S, it continues to send from the
beginning again.
• The subscriber continuously receives the data. For each latest record r received, apply the 3NN classification to the
last 5 records before r, and compare the classification result with the Outcome label in r.
• Repeat this for 1000 times, and report the number of correct classifications.

In [3]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import time
import paho.mqtt.client as mqtt


In [4]:
df = pd.read_csv("diabetes.csv")

# Display first few rows
print(df.info())
df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               768 non-null    int64  
 1   Glucose                   768 non-null    int64  
 2   BloodPressure             768 non-null    int64  
 3   SkinThickness             768 non-null    int64  
 4   Insulin                   768 non-null    int64  
 5   BMI                       768 non-null    float64
 6   DiabetesPedigreeFunction  768 non-null    float64
 7   Age                       768 non-null    int64  
 8   Outcome                   768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
None


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


1. Select the following 4 attributes (3 features + 1 class label) :
• Glucose, BloodPressure, Insulin, Outcome

In [5]:
# Select the required columns: Glucose, BloodPressure, Insulin, Outcome
selected_columns = ["Glucose", "BloodPressure", "Insulin", "Outcome"]
df_selected = df[selected_columns]

# Display first few rows
df_selected.head()


Unnamed: 0,Glucose,BloodPressure,Insulin,Outcome
0,148,72,0,1
1,85,66,0,0
2,183,64,0,1
3,89,66,94,0
4,137,40,168,1


2. Normalize Glucose, BloodPressure and Insulin to [0, 1] using MinMax.

In [6]:
# Initialize MinMaxScaler
scaler = MinMaxScaler()

# Apply scaling to the selected features
df_selected[["Glucose", "BloodPressure", "Insulin"]] = scaler.fit_transform(df_selected[["Glucose", "BloodPressure", "Insulin"]])

# Display normalized dataset
df_selected.head()


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_selected[["Glucose", "BloodPressure", "Insulin"]] = scaler.fit_transform(df_selected[["Glucose", "BloodPressure", "Insulin"]])


Unnamed: 0,Glucose,BloodPressure,Insulin,Outcome
0,0.743719,0.590164,0.0,1
1,0.427136,0.540984,0.0,0
2,0.919598,0.52459,0.0,1
3,0.447236,0.540984,0.111111,0
4,0.688442,0.327869,0.198582,1


3. Store the new data (3 normalized features + 1 class label) in another dataset S.

In [7]:
# Save the new dataset to a CSV file
file_path= "S.csv"
df_selected.to_csv(file_path, index=False)

print(f"Normalized dataset saved as '{file_path}'")


Normalized dataset saved as 'S.csv'


4. Modify the MQTT example to do the following:
- The publisher publishes records in S continuously. When it reaches the end of S, it continues to send from the
beginning again.
- The subscriber continuously receives the data. For each latest record r received, apply the 3NN classification to the
last 5 records before r, and compare the classification result with the Outcome label in r.
- Repeat this for 1000 times, and report the number of correct classifications.

In [8]:
# Load the dataset (Ensure 'normalized_diabetes.csv' exists in the working directory)
df = pd.read_csv(file_path)

# Convert dataframe to list of dictionaries
records = df.to_dict(orient='records')
num_records = len(records)

# MQTT Setup
mqttc = mqtt.Client()
mqttc.connect("mqtt.eclipseprojects.io", 1883, 60)

def publish_data():
    index = 0
    print("Publishing records...")
    while True:
        record = records[index % num_records]  # Loop over records
        mqttc.publish("diabetes/data", str(record))
        print(f"Published: {record}")
        index += 1
        time.sleep(0.05)  # Publishing interval

if __name__ == "__main__":
    publish_data()


  mqttc = mqtt.Client()


Publishing records...
Published: {'Glucose': 0.7437185929648241, 'BloodPressure': 0.5901639344262295, 'Insulin': 0.0, 'Outcome': 1}
Published: {'Glucose': 0.4271356783919598, 'BloodPressure': 0.5409836065573771, 'Insulin': 0.0, 'Outcome': 0}
Published: {'Glucose': 0.9195979899497488, 'BloodPressure': 0.5245901639344263, 'Insulin': 0.0, 'Outcome': 1}
Published: {'Glucose': 0.4472361809045226, 'BloodPressure': 0.5409836065573771, 'Insulin': 0.1111111111111111, 'Outcome': 0}
Published: {'Glucose': 0.6884422110552764, 'BloodPressure': 0.3278688524590164, 'Insulin': 0.1985815602836879, 'Outcome': 1}
Published: {'Glucose': 0.5829145728643216, 'BloodPressure': 0.6065573770491803, 'Insulin': 0.0, 'Outcome': 0}
Published: {'Glucose': 0.3919597989949749, 'BloodPressure': 0.4098360655737705, 'Insulin': 0.1040189125295508, 'Outcome': 1}
Published: {'Glucose': 0.577889447236181, 'BloodPressure': 0.0, 'Insulin': 0.0, 'Outcome': 0}
Published: {'Glucose': 0.9899497487437188, 'BloodPressure': 0.5737704

KeyboardInterrupt: 