GENERATE 1000 IOT TASKS

In [1]:
# IMPORT REQUIRED LIBRARIES
import numpy as np
import pandas as pd

# SET RANDOM SEED FOR REPRODUCIBILITY
np.random.seed(42)

# -------------------------------
# GENERATE TASK SIZE (GB)
# -------------------------------
# USE LOGNORMAL DISTRIBUTION TO MIMIC REAL-LIFE FILE SIZES
# THEN SCALE VALUES BETWEEN 0.001 GB AND 500 GB
raw_sizes = np.random.lognormal(mean=0.5, sigma=1.2, size=1000)
task_sizes = 0.001 + (raw_sizes / raw_sizes.max()) * (500 - 0.001)

# -------------------------------
# GENERATE RESOURCE UTILIZATION (CPU DEMAND UNITS)
# -------------------------------
# RANGE 1 TO 100, USING A SLIGHTLY RIGHT-SKEWED DISTRIBUTION
cpu_util = np.random.gamma(shape=2.0, scale=15.0, size=1000)
# CLIP TO THE PROPER RANGE
cpu_util = np.clip(cpu_util, 1, 100)  

# -------------------------------
# GENERATE EXECUTION TIME (MS)
# -------------------------------
# RANGE 0 TO 50 MS, USING A CONTROLLED NORMAL DISTRIBUTION
exec_time = np.random.normal(loc=25, scale=10, size=1000)
exec_time = np.clip(exec_time, 0, 50)

# -------------------------------
# CREATE DATAFRAME
# -------------------------------
df_tasks = pd.DataFrame({
    "Task_ID": np.arange(1, 1001),
    "Task_Size_GB": task_sizes,
    "CPU_Demand_Units": cpu_util,
    "Exec_Time_ms": exec_time
})

# -------------------------------
# SAVE TO CSV
# -------------------------------
df_tasks.to_csv("IoT_TASKS.csv", index=False)

# SHOW FIRST FEW ROWS
df_tasks.head()

Unnamed: 0,Task_ID,Task_Size_GB,CPU_Demand_Units,Exec_Time_ms
0,1,8.912881,63.068579,40.628888
1,2,4.160568,47.520374,22.260069
2,3,10.682941,26.172607,22.56616
3,4,30.538766,14.448242,22.001618
4,5,3.70845,46.650909,44.041366


# GENERATED 1000 SYNTHETIC IOT TASKS

THE CELL ABOVE CREATES A REALISTIC DATASET OF IOT TASKS USING THE FOLLOWING CHARACTERISTICS:

- **TASK SIZE (0.001 GB TO 500 GB)**  
  - GENERATED USING A LOGNORMAL DISTRIBUTION  
  - MIMICS REAL-WORLD FILE SIZE VARIABILITY  

- **CPU DEMAND (1 TO 100 UNITS)**  
  - SLIGHTLY RIGHT-SKEWED USING A GAMMA DISTRIBUTION  
  - REPRESENTS COMPUTATIONAL LOAD OF A TASK  

- **EXECUTION TIME (0 TO 50 MS)**  
  - NORMAL DISTRIBUTION CLIPPED TO REALISTIC LIMITS  

THE RESULTING 1000-TASK DATASET IS SAVED AS **`IoT_TASKS.csv`**,  
WHICH WILL BE USED TO TRAIN THE LINEAR REGRESSION MODEL FOR EXECUTION TIME PREDICTION.


TRAIN LINEAR REGRESSION (LR) ON SYNTHETIC IOT DATA

# TRAIN LINEAR REGRESSION MODEL FOR EXECUTION TIME PREDICTION

THIS SECTION LOADS THE GENERATED DATASET, PERFORMS NECESSARY PREPROCESSING, TRAINS A LINEAR REGRESSION MODEL, AND EVALUATES ITS PERFORMANCE USING STANDARD REGRESSION METRICS.

THE TARGET VARIABLE IS:
- `EXEC_TIME_MS`

THE INPUT FEATURES ARE:
- `TASK_SIZE_GB`
- `CPU_DEMAND_UNITS`

PREPROCESSING STEPS INCLUDE:
- TRAIN-TEST SPLIT
- OPTIONAL SCALING (MINMAX OR STANDARD SCALER)
- MODEL FITTING
- PREDICTIONS AND EVALUATION

In [3]:
# IMPORT REQUIRED LIBRARIES
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import matplotlib.pyplot as plt

# -------------------------------
# LOAD THE GENERATED DATASET
# -------------------------------
df = pd.read_csv("IOT_TASKS.csv")

# -------------------------------
# SELECT FEATURES & TARGET
# -------------------------------
X = df[["Task_Size_GB", "CPU_Demand_Units"]]
y = df["Exec_Time_ms"]

# -------------------------------
# TRAIN-TEST SPLIT (80/20)
# -------------------------------
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# -------------------------------
# FEATURE SCALING USING STANDARD SCALER
# -------------------------------
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# -------------------------------
# TRAIN THE LINEAR REGRESSION MODEL
# -------------------------------
lr = LinearRegression()
lr.fit(X_train_scaled, y_train)

# -------------------------------
# PREDICT ON TRAIN AND TEST SETS
# -------------------------------
y_pred_train = lr.predict(X_train_scaled)
y_pred_test = lr.predict(X_test_scaled)

# -------------------------------
# EVALUATE MODEL PERFORMANCE
# -------------------------------
mae = mean_absolute_error(y_test, y_pred_test)
mse = mean_squared_error(y_test, y_pred_test)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred_test)

# -------------------------------
# PRINT MODEL PERFORMANCE
# -------------------------------
print("MODEL PERFORMANCE:")
print(f"Mean Absolute Error (MAE): {mae}")
print(f"Root Mean Squared Error (RMSE): {rmse}")
print(f"R² Score: {r2}")

MODEL PERFORMANCE:
Mean Absolute Error (MAE): 7.892800235740287
Root Mean Squared Error (RMSE): 10.028657530039087
R² Score: -0.03704793262113104
