<center><h1>Real Time Monitoring of Water Distribution System</h1></center>


# Installing Required Libraries

This cell installs necessary libraries like `kafka-python`, which allows Python to interact with Kafka. It will attempt to install the library if it is not already present in the environment. 



In [2]:
pip install kafka-python

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [12]:
pip install pandas

Defaulting to user installation because normal site-packages is not writeable
Collecting pandas
  Downloading pandas-2.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.1 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.1/13.1 MB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
Collecting tzdata>=2022.7
  Downloading tzdata-2024.2-py2.py3-none-any.whl (346 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m346.6/346.6 KB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m[31m5.0 MB/s[0m eta [36m0:00:01[0m
Installing collected packages: tzdata, pandas
Successfully installed pandas-2.2.3 tzdata-2024.2
Note: you may need to restart the kernel to use updated packages.


In [30]:
pip install tkinter

Defaulting to user installation because normal site-packages is not writeable
[31mERROR: Could not find a version that satisfies the requirement tkinter (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for tkinter[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [15]:
pip install seaborn

Defaulting to user installation because normal site-packages is not writeable
Collecting seaborn
  Downloading seaborn-0.13.2-py3-none-any.whl (294 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.9/294.9 KB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m[31m2.7 MB/s[0m eta [36m0:00:01[0m
Installing collected packages: seaborn
Successfully installed seaborn-0.13.2
Note: you may need to restart the kernel to use updated packages.


In [30]:
pip install dash

Defaulting to user installation because normal site-packages is not writeable
Collecting dash
  Downloading dash-2.18.1-py3-none-any.whl (7.5 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.5/7.5 MB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
Collecting retrying
  Downloading retrying-1.3.4-py3-none-any.whl (11 kB)
Collecting Werkzeug<3.1
  Downloading werkzeug-3.0.6-py3-none-any.whl (227 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m228.0/228.0 KB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m MB/s[0m eta [36m0:00:01[0m
[?25hCollecting dash-core-components==2.0.0
  Downloading dash_core_components-2.0.0-py3-none-any.whl (3.8 kB)
Collecting dash-html-components==2.0.0
  Downloading dash_html_components-2.0.0-py3-none-any.whl (4.1 kB)
Collecting dash-table==5.0.0
  Downloading dash_table-5.0.0-py3-none-any.whl (3.9 kB)
Collecting Flask<3.1,>=1.0.4
  Downloading flask-3.0.3-py3-none-a

# Generating Synthetic Data

In this section, we generate synthetic data to simulate real-world events. This data will be sent to a Kafka topic, allowing us to test the Kafka producer and consumer functionality. By generating data with random values, timestamps, or other features, we can simulate scenarios like sensor readings, financial transactions, or user activities.

Using Python’s `random` and `datetime` libraries, we generate numerical data with timestamps. This example creates random integer and float values to represent data like sensor readings . Each entry will have:
## Unique timestamp
> Denoting the time at which the Sensor reading were taken
## Flow rate  
> I have chosen Normal Distribution because flow rate typically fluctuates around an average value, with occasional spikes or dips. The normal distribution captures this behavior with a bell-shaped curve where most values fall near the mean and fewer occur at the extremes.
## Pressure
> Pressure in a water distribution system is unlikely to be negative, but it can vary significantly. The log-normal distribution is skewed towards positive values and has a long tail, making it suitable for modeling pressure readings where most values are positive and some can be much higher.
## Temperature
> Similar to flow rate, temperature in a water distribution system might fluctuate around an average value with occasional deviations
## pH
> Ideally, the pH of water should be close to neutral (pH = 7). The normal distribution with a small standard deviation (sigma) allows for slight variations around this ideal value while keeping most readings within a narrow acceptable range.
## Turbidity
>Turbidity refers to the cloudiness of water. Similar to pressure, turbidity readings are unlikely to be negative and can vary considerably. The log-normal distribution with a low mean reflects this, with most values being low (clear water) and some potential for higher turbidity levels.

This data structure will be published to Kafka as individual JSON messages.



In [2]:
import random
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns

# Function to generate random water monitoring data for a single sensor
def generate_sensor_data(sensor_id, start_time, num_records):
    data = []
    for i in range(num_records):
        timestamp = start_time + timedelta(minutes=i * 10)  # Data recorded every 10 minutes
        
        # More realistic distributions
        flow_rate = round(random.gauss(mu=50, sigma=10), 2)  # Normal distribution for flow rate
        pressure = round(np.random.lognormal(mean=4, sigma=0.25), 2)  # Log-normal for pressure
        temperature = round(random.gauss(mu=20, sigma=5), 2)  # Normal distribution for temperature
        pH = round(random.gauss(mu=7, sigma=0.5), 2)  # Normal distribution for pH
        turbidity = round(np.random.lognormal(mean=0.5, sigma=0.75), 2)  # Log-normal for turbidity

        data.append([sensor_id, timestamp, flow_rate, pressure, temperature, pH, turbidity])
    
    return data

# Function to generate data for multiple sensors
def generate_water_distribution_data(num_sensors, num_records_per_sensor, start_time):
    all_data = []
    for sensor_id in range(1, num_sensors + 1):
        sensor_data = generate_sensor_data(f"Sensor_{sensor_id}", start_time, num_records_per_sensor)
        all_data.extend(sensor_data)
    
    columns = ['Sensor ID', 'Timestamp', 'Flow_Rate', 'Pressure', 'Temperature', 'pH', 'Turbidity']
    df = pd.DataFrame(all_data, columns=columns)
    return df

# Parameters for the data generation
num_sensors = 3  # Number of sensors in the system
num_records_per_sensor = 14400  # Number of records per sensor (e.g., 144 records for 1 day at 10-minute intervals)
start_time = datetime.now() - timedelta(days=100)  # Start time set to 24 hours ago

# Generate the synthetic data
df = generate_water_distribution_data(num_sensors, num_records_per_sensor, start_time)

# Save the data to a CSV file
df.to_csv('water_distribution_monitoring_data.csv', index=False)


In [3]:
df.head()

Unnamed: 0,Sensor ID,Timestamp,Flow_Rate,Pressure,Temperature,pH,Turbidity
0,Sensor_1,2024-08-03 19:13:04.347273,38.99,52.9,21.92,7.73,2.08
1,Sensor_1,2024-08-03 19:23:04.347273,47.99,83.14,19.64,7.73,4.73
2,Sensor_1,2024-08-03 19:33:04.347273,52.12,51.77,17.35,7.87,7.25
3,Sensor_1,2024-08-03 19:43:04.347273,51.55,79.7,20.05,6.53,1.39
4,Sensor_1,2024-08-03 19:53:04.347273,51.23,56.14,26.0,6.44,0.77


In [4]:
import dash
from dash.dependencies import Input, Output
import dash_core_components as dcc
import dash_html_components as html
import plotly.express as px

# Assuming you have a DataFrame 'df' containing your water monitoring data

app = dash.Dash(__name__)

app.layout = html.Div([
    html.H1('Water Monitoring Dashboard'),
    
    # Dropdown for selecting the sensor ID
    html.Div([
        html.Label("Select Sensor ID"),
        dcc.Dropdown(
            id='sensor-dropdown',
            options=[{'label': i, 'value': i} for i in df['Sensor ID'].unique()],
            value=df['Sensor ID'].unique()[0],
            placeholder="Select Sensor ID"
        )
    ], style={'width': '48%', 'display': 'inline-block'}),
    
    # Dropdown for selecting variables for the time series graph
    html.Div([
        html.Label("Select Variables for Time Series"),
        dcc.Dropdown(
            id='variable-dropdown',
            options=[{'label': col, 'value': col} for col in df.columns if col not in ['Sensor ID', 'Timestamp']],
            multi=True,
            placeholder="Select Variables for Time Series"
        ),
    ], style={'width': '48%', 'display': 'inline-block'}),

    # Time series graph
    dcc.Graph(id='time-series-graph'),

    # Histogram feature selection and graph
    html.Div([
        html.Label("Select Feature for Histogram"),
        dcc.Dropdown(
            id='histogram-feature-dropdown',
            options=[{'label': col, 'value': col} for col in df.columns if col not in ['Sensor ID', 'Timestamp']],
            value='Flow_Rate',  # Default value for histogram
            placeholder="Select Feature for Histogram"
        ),
        dcc.Graph(id='histogram-graph')
    ], style={'width': '48%', 'display': 'inline-block'}),

    # Box plot feature selection and graph
    html.Div([
        html.Label("Select Feature for Box Plot"),
        dcc.Dropdown(
            id='boxplot-feature-dropdown',
            options=[{'label': col, 'value': col} for col in df.columns if col not in ['Sensor ID', 'Timestamp']],
            value='Turbidity',  # Default value for box plot
            placeholder="Select Feature for Box Plot"
        ),
        dcc.Graph(id='box-plot-graph')
    ], style={'width': '48%', 'display': 'inline-block'})
])

@app.callback(
    [Output('time-series-graph', 'figure'),
     Output('histogram-graph', 'figure'),
     Output('box-plot-graph', 'figure')],
    [Input('sensor-dropdown', 'value'),
     Input('variable-dropdown', 'value'),
     Input('histogram-feature-dropdown', 'value'),
     Input('boxplot-feature-dropdown', 'value')]
)
def update_graphs(sensor_id, selected_variables, histogram_feature, boxplot_feature):
    filtered_df = df[df['Sensor ID'] == sensor_id]

    # Time Series Plot (Limit data points, allow selection)
    fig1 = px.line(filtered_df.iloc[:500], x='Timestamp', y=selected_variables)
    
    # Histogram Plot for selected histogram feature
    fig2 = px.histogram(filtered_df, x=histogram_feature)
    
    # Box Plot for selected box plot feature
    fig3 = px.box(filtered_df, x=boxplot_feature)

    return fig1, fig2, fig3

if __name__ == '__main__':
    app.run_server(debug=True)


<center><h1>Training the Model and Testing for Anamolies</h1></center>

##  Custom Isolation Forest
Here's a general outline:

**1. Label Most Data as Normal**: Assign a label (e.g., label = 0) to the majority of the data, assuming most of it is normal.<br>
**2. Randomly Sample and Label Anomalies**: Randomly select a small subset of the data and label them as 1 (anomalies). The exact proportion can be adjusted based on your expectations for anomaly prevalence.<br>
**3. Train a Random Forest Classifier** using these labeled data points to distinguish anomalies based on decision boundaries.

In [5]:
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml import Pipeline
from pyspark.sql import functions as F
from pyspark.sql import DataFrame
from pyspark.sql import SparkSession
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml import Pipeline
from pyspark.sql.functions import col

# Start Spark session
spark = SparkSession.builder.appName("AnomalyDetection").getOrCreate()
# Step 1: Assemble features
data_spark = spark.createDataFrame(df)

def create_isolation_forest_approximation(df: DataFrame, feature_columns, contamination=0.05):
    # Define VectorAssembler to assemble features
    assembler = VectorAssembler(inputCols=feature_columns, outputCol="features")
    data = assembler.transform(df)
    
    # Label data: Assuming most data is normal
    normal_df = data.withColumn("label", F.lit(0))

    # Sample a small fraction as anomalies
    anomaly_df = normal_df.sample(fraction=contamination).withColumn("label", F.lit(1))
    training_df = normal_df.union(anomaly_df)  # Combine labeled data
    
    # Train the Random Forest model
    rf = RandomForestClassifier(labelCol="label", featuresCol="features", numTrees=50)
    model = rf.fit(training_df)
    
    # Predict anomalies in the original data
    predictions = model.transform(data)
    return predictions

feature_columns = ["Flow_Rate", "Turbidity" ,"Pressure" ,"Temperature"]

predictions = create_isolation_forest_approximation(data_spark,feature_columns)

# Filter to show only rows identified as anomalies
anomalies = predictions.filter(F.col("prediction") == 1)

# Show anomalies
anomalies.show(truncate=False)

24/11/11 19:18:17 WARN Utils: Your hostname, ishan-HP-Laptop-15s-fq5xxx resolves to a loopback address: 127.0.1.1; using 192.168.1.145 instead (on interface wlo1)
24/11/11 19:18:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/11/11 19:18:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
                                                                                

+---------+---------+---------+--------+-----------+---+---------+--------+-------------+-----------+----------+
|Sensor ID|Timestamp|Flow_Rate|Pressure|Temperature|pH |Turbidity|features|rawPrediction|probability|prediction|
+---------+---------+---------+--------+-----------+---+---------+--------+-------------+-----------+----------+
+---------+---------+---------+--------+-----------+---+---------+--------+-------------+-----------+----------+



Unable to detect any anamolies

## Custom Clustering-Based Isolation Using KMeans

Here's a general outline:

**1. Cluster the data** into `K` clusters using KMeans.<br>
**2. Calculate distance** of each point from the nearest cluster center.<br>
**3. Flag outliers** as those points whose distance from the centroid is greater than a specified threshold (e.g., based on standard deviation or percentile).

In [6]:
from pyspark.ml.clustering import KMeans
from pyspark.ml.feature import VectorAssembler
from pyspark.sql.functions import col, sqrt
from pyspark.sql.types import DoubleType

data_spark = spark.createDataFrame(df)
# Assemble features
assembler = VectorAssembler(inputCols=["Flow_Rate", "Turbidity"], outputCol="features")
data = assembler.transform(data_spark)

# Train KMeans model
kmeans = KMeans(k=5, featuresCol="features", predictionCol="cluster")
model = kmeans.fit(data)

# Get cluster centers and add distances to the DataFrame
centers = model.clusterCenters()
data = model.transform(data)

# Define a UDF to compute Euclidean distance to the nearest center
def calculate_distance(features, cluster):
    center = centers[cluster]
    return float(sum((features[i] - center[i]) ** 2 for i in range(len(center))) ** 0.5)

distance_udf = F.udf(calculate_distance, DoubleType())
data = data.withColumn("distance", distance_udf("features", "cluster"))

# Define threshold based on the distribution of distances
threshold = data.selectExpr("percentile(distance, 0.95)").collect()[0][0]  # 95th percentile as threshold
anomalies = data.filter(col("distance") > threshold)

anomalies.show()

24/11/11 19:18:43 WARN InstanceBuilder: Failed to load implementation from:dev.ludovic.netlib.blas.JNIBLAS
                                                                                

+---------+--------------------+---------+--------+-----------+----+---------+-------------+-------+------------------+
|Sensor ID|           Timestamp|Flow_Rate|Pressure|Temperature|  pH|Turbidity|     features|cluster|          distance|
+---------+--------------------+---------+--------+-----------+----+---------+-------------+-------+------------------+
| Sensor_1|2024-08-03 21:33:...|    33.93|   43.02|      18.85|6.97|    15.02|[33.93,15.02]|      1|12.916060578920192|
| Sensor_1|2024-08-04 06:13:...|    42.91|   45.96|      17.26|6.77|     8.47| [42.91,8.47]|      0| 6.291805322146821|
| Sensor_1|2024-08-04 11:23:...|    72.89|   46.26|      25.75|6.84|      0.6|  [72.89,0.6]|      4|5.8424332484811226|
| Sensor_1|2024-08-04 22:33:...|    74.77|   37.93|      11.69|6.59|     3.99| [74.77,3.99]|      4| 7.705392975456011|
| Sensor_1|2024-08-05 00:03:...|    25.72|   30.39|      25.68| 7.2|     1.07| [25.72,1.07]|      1| 7.025247990928396|
| Sensor_1|2024-08-05 01:53:...|    38.0

## Mahalanobis Distance for Multivariate Outlier Detection

Using **Mahalanobis distance** on multiple features can identify anomalies, especially when features are correlated. It measures the distance from the center (mean vector) scaled by the covariance.

**1. Compute the mean vector** and **covariance matrix** of the features.<br>
**2. Calculate Mahalanobis distance** for each point.<br>
3.Set a threshold (such as the Chi-square critical value) to label points as anomalies.

In [7]:
from pyspark.ml.feature import VectorAssembler
from pyspark.sql.functions import col
from pyspark.sql import functions as F
import numpy as np
from pyspark.sql import SparkSession
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml import Pipeline
from pyspark.sql.functions import col

# Start Spark session
spark = SparkSession.builder.appName("AnomalyDetection").getOrCreate()
# Step 1: Assemble features
data_spark = spark.createDataFrame(df)
assembler = VectorAssembler(inputCols=["Flow_Rate", "Turbidity" ,"Pressure" ,"Temperature"], outputCol="features")
data = assembler.transform(data_spark)

# Step 2: Calculate mean and covariance
mean_vector = np.array(data.select("features").rdd.map(lambda x: x[0]).mean())
cov_matrix = np.array(data.select("features").rdd.map(lambda x: np.outer(x[0] - mean_vector, x[0] - mean_vector)).mean())

# Step 3: Define Mahalanobis distance function
def mahalanobis_distance(row, mean, cov_inv):
    x = np.array(row)
    return float(np.sqrt((x - mean).T @ cov_inv @ (x - mean)))

# Invert covariance matrix
cov_matrix_inv = np.linalg.inv(cov_matrix)
mahalanobis_udf = F.udf(lambda x: mahalanobis_distance(x, mean_vector, cov_matrix_inv))

# Step 4: Add Mahalanobis distance column and filter anomalies
data = data.withColumn("mahalanobis_distance", mahalanobis_udf("features"))

# Set threshold for anomalies (e.g., Chi-square critical value for 95% confidence)
threshold = 7.815  # For 2 degrees of freedom at 95% confidence
anomalies = data.filter(col("mahalanobis_distance") > threshold)
anomalies.show()

[Stage 81:>                                                         (0 + 4) / 4]

+---------+--------------------+---------+--------+-----------+----+---------+--------------------+--------------------+
|Sensor ID|           Timestamp|Flow_Rate|Pressure|Temperature|  pH|Turbidity|            features|mahalanobis_distance|
+---------+--------------------+---------+--------+-----------+----+---------+--------------------+--------------------+
| Sensor_1|2024-08-05 16:53:...|    60.46|   57.58|      19.65| 6.9|    24.82|[60.46,24.82,57.5...|  11.913321220470868|
| Sensor_1|2024-08-07 03:03:...|    59.25|   69.36|      23.51| 7.0|    24.58|[59.25,24.58,69.3...|  11.833324250278647|
| Sensor_1|2024-08-18 15:03:...|    38.81|   58.18|       18.3|7.27|     22.4|[38.81,22.4,58.18...|  10.671856708733037|
| Sensor_1|2024-08-26 23:43:...|    52.31|   48.64|      22.64|6.66|    23.44|[52.31,23.44,48.6...|  11.177205866595084|
| Sensor_1|2024-08-27 08:13:...|    60.94|   35.59|      12.12|7.41|    17.67|[60.94,17.67,35.5...|   8.465705990339437|
| Sensor_1|2024-08-28 00:23:...|

                                                                                


# Importing Required Modules

In this cell, we import necessary modules from `kafka-python`. Specifically, we import:
- `KafkaProducer`: for sending messages to a Kafka topic.
- `KafkaConsumer`: for reading messages from a Kafka topic.
  
These modules provide the core functionality needed for Kafka interactions.


# Setting Up the Kafka Producer

Here, we create a Kafka Producer, which will allow us to send messages to a Kafka topic. We configure the producer with the following parameters:
- `bootstrap_servers`: The address of the Kafka server, typically set as "localhost:9092" for local testing.
- `value_serializer`: Serializes message data to JSON format, making it easier to handle structured data in Kafka.

The producer can now send JSON-encoded messages to Kafka.


In [9]:
import ipywidgets as widgets
from IPython.display import display, clear_output
import random
import json
import time
from kafka import KafkaProducer

# Initialize Kafka Producer
producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
    value_serializer=lambda x: json.dumps(x).encode('utf-8')
)

# Define the interactive widgets for mean and sigma
flow_rate_mean_slider = widgets.FloatSlider(value=150, min=50, max=200, step=1, description='Flow Rate Mean:')
flow_rate_sigma_slider = widgets.FloatSlider(value=10, min=1, max=20, step=1, description='Flow Rate Sigma:')

turbidity_mean_slider = widgets.FloatSlider(value=0.5, min=0, max=5, step=0.1, description='Turbidity Mean:')
turbidity_sigma_slider = widgets.FloatSlider(value=0.75, min=0.1, max=2, step=0.1, description='Turbidity Sigma:')

temperature_mean_slider = widgets.FloatSlider(value=20, min=10, max=30, step=0.5, description='Temperature Mean:')
temperature_sigma_slider = widgets.FloatSlider(value=5, min=1, max=10, step=1, description='Temperature Sigma:')

pressure_mean_slider = widgets.FloatSlider(value=4, min=2, max=6, step=0.1, description='Pressure Mean:')
pressure_sigma_slider = widgets.FloatSlider(value=0.25, min=0.1, max=1, step=0.05, description='Pressure Sigma:')

# Slider for the number of messages
num_messages_slider = widgets.IntSlider(value=50, min=10, max=1000, step=10, description='Number of Messages:')

send_button = widgets.Button(description="Send Data")

# Create layouts for each sensor parameter
flow_rate_box = widgets.VBox([flow_rate_mean_slider, flow_rate_sigma_slider])
turbidity_box = widgets.VBox([turbidity_mean_slider, turbidity_sigma_slider])
temperature_box = widgets.VBox([temperature_mean_slider, temperature_sigma_slider])
pressure_box = widgets.VBox([pressure_mean_slider, pressure_sigma_slider])

# Organize all widgets into a main layout
main_box = widgets.VBox([
    widgets.HTML("<h3>Flow Rate</h3>"),
    flow_rate_box,
    widgets.HTML("<h3>Turbidity</h3>"),
    turbidity_box,
    widgets.HTML("<h3>Temperature</h3>"),
    temperature_box,
    widgets.HTML("<h3>Pressure</h3>"),
    pressure_box,
    num_messages_slider,

    send_button
])

# Display the main layout
display(main_box)

# Define the function to send data using the slider values
def send_data(button):
    clear_output(wait=True)  # Clear previous outputs
    display(main_box)  # Redisplay the main box after clearing
    num_messages = num_messages_slider.value
    for _ in range(num_messages):
        data = {
            'sensor_id': 'sensor_1',
            'timestamp': int(time.time()),
            'flow_rate': round(random.gauss(mu=flow_rate_mean_slider.value, sigma=flow_rate_sigma_slider.value), 2),
            'turbidity': round(random.lognormvariate(mu=turbidity_mean_slider.value, sigma=turbidity_sigma_slider.value), 2),
            'temperature': round(random.gauss(mu=temperature_mean_slider.value, sigma=temperature_sigma_slider.value), 2),
            'pressure': round(random.lognormvariate(mu=pressure_mean_slider.value, sigma=pressure_sigma_slider.value), 2),
        }
        producer.send('test-topic', value=data)
        print(f"Sent data: {data}")
        time.sleep(0.1)  # Optional: add a small delay between messages

# Link the button to the function
send_button.on_click(send_data)


VBox(children=(HTML(value='<h3>Flow Rate</h3>'), VBox(children=(FloatSlider(value=150.0, description='Flow Rat…

## Sending the Trained model

In [8]:
from kafka import KafkaProducer
import json
import time
import pickle
import random

producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
)

def send_model():
    # Serialize the array
    serialized_array = pickle.dumps(np.array(mean_vector))
    
    # Send the serialized array
    producer.send('test-topic', value=serialized_array)
    producer.flush()

    # Serialize the array
    serialized_array = pickle.dumps(np.array(cov_matrix))
    
    # Send the serialized array
    producer.send('test-topic', value=serialized_array)
    producer.flush()
    
send_model()