# AI, Cloud, Edge & IoT: Systems, Simulations and Applications

**Goal:** Show how AI scales from devices to cloud: simulate IoT data, run a lightweight anomaly detector (edge inference), compare edge vs cloud latency, and visualise an end-to-end pipeline.

**What you'll get:**
- concise conceptual notes (IoT ↔ Edge ↔ Cloud)
- reproducible Python demos (simulated device stream, anomaly detection using a small ML model, latency comparison)
- pipeline diagram 



## Learning objectives

By the end of this notebook you will be able to:
- Simulate a stream of IoT device data and explore basic preprocessing.
- Implement a lightweight anomaly detector suitable for edge deployment.
- Measure and compare naive edge vs cloud inference latency (simulated) and discuss trade-offs.
- Draw and reason about an IoT → Edge → Cloud data pipeline and realistic applications (smart cities, predictive maintenance).

## 1) Short conceptual primer — how the pieces fit

- **IoT device:** sensor + small compute (example: thermostat, camera, vibration sensor). Produces a continuous stream of telemetry.
- **Edge:** local inference point close to devices. Runs lightweight models for low-latency decisions (alerts, actuation).
- **Cloud:** centralized storage, heavy training, long-term analytics, dashboards and model management.
- **Why two layers?** Edge reduces latency & bandwidth; cloud provides scale, model training, and business insights.

Typical pipeline: **Device → Edge (filter/aggregate/infer) → Cloud (store/train/visualize) → Dashboard/Operator**

## 2) Simulate IoT device data (temperature + vibration + battery) — small reproducible stream

We'll generate a time-series stream that mimics devices sending telemetry every second. A few values will be injected as anomalies to demonstrate detection.

In [1]:
import time
import random
import math
from collections import deque
import numpy as np
import pandas as pd

# Parameters
SEED = 42
random.seed(SEED)
np.random.seed(SEED)

def simulate_device(n=200, anomaly_indices=None):
    """Return a pandas DataFrame with simulated telemetry."""
    if anomaly_indices is None:
        anomaly_indices = set(random.sample(range(20, n-10), k=max(3, n//50)))
    rows = []
    base_temp = 35.0  # degrees
    base_vibration = 0.2  # g
    for t in range(n):
        # normal diurnal-ish variation + small noise
        temp = base_temp + 2.5 * math.sin(2 * math.pi * (t / 144.0)) + np.random.normal(0,0.4)
        vib = base_vibration + np.random.normal(0,0.05)
        battery = 100 - 0.02 * t + np.random.normal(0,0.1)
        label = 0
        if t in anomaly_indices:
            # anomaly: sudden temp spike or vibration spike
            if random.random() < 0.6:
                temp += random.uniform(6, 12)
            else:
                vib += random.uniform(0.8, 1.5)
            label = 1
        rows.append({'t': t, 'temperature': round(temp,3), 'vibration': round(vib,3), 'battery': round(battery,2), 'is_anomaly': label})
    return pd.DataFrame(rows)

# quick demo
df = simulate_device(200)
df.head()

Unnamed: 0,t,temperature,vibration,battery,is_anomaly
0,0,35.199,0.193,100.06,0
1,1,35.718,0.188,99.96,0
2,2,35.85,0.238,99.91,0
3,3,35.543,0.177,99.89,0
4,4,35.531,0.104,99.75,0


## 3) Lightweight anomaly detection for edge: feature engineering + IsolationForest demo

- On-device (edge) constraints: small memory, low CPU, single-shot inference.
- We'll show a simple approach: rolling features + an IsolationForest trained on the first N points (assume normal), then used to score the stream.
- Note: In production you'd use model quantization, TinyML, or rule-based checks for extremely constrained devices.

In [2]:
# Minimal edge-style anomaly detector using sklearn's IsolationForest (for demo only)
from sklearn.ensemble import IsolationForest

def make_features(df, window=5):
    df = df.copy()
    df['temp_mean_w'] = df['temperature'].rolling(window, min_periods=1).mean()
    df['vib_mean_w'] = df['vibration'].rolling(window, min_periods=1).mean()
    df['temp_std_w'] = df['temperature'].rolling(window, min_periods=1).std().fillna(0)
    return df

df_feats = make_features(df, window=5)
train_df = df_feats.iloc[:80]  # pretend first 80 points are normal for training
test_df = df_feats.iloc[80:]

features = ['temperature','vibration','temp_mean_w','vib_mean_w','temp_std_w']
X_train = train_df[features].values
X_test = test_df[features].values

clf = IsolationForest(n_estimators=100, contamination=0.02, random_state=SEED)
clf.fit(X_train)
scores = clf.decision_function(X_test)  # higher means more normal
preds = clf.predict(X_test)  # -1 anomaly, 1 normal

res = test_df[['t','temperature','vibration','is_anomaly']].copy()
res['score'] = scores
res['pred_anomaly'] = (preds == -1).astype(int)
res.head(12)

Unnamed: 0,t,temperature,vibration,is_anomaly,score,pred_anomaly
80,80,33.828,0.194,0,0.112663,0
81,81,34.39,0.14,0,0.090469,0
82,82,33.753,0.167,0,0.067738,0
83,83,34.008,0.137,0,0.078307,0
84,84,34.599,0.252,0,0.111129,0
85,85,33.463,0.263,0,0.071909,0
86,86,33.744,0.239,0,0.104767,0
87,87,33.454,0.038,0,0.058275,0
88,88,33.292,0.138,0,0.06952,0
89,89,32.739,0.178,0,0.100868,0


### Quick evaluation (precision-oriented) — small confusion summary


In [3]:
from sklearn.metrics import precision_score, recall_score, f1_score
y_true = res['is_anomaly'].values
y_pred = res['pred_anomaly'].values
print('Precision:', precision_score(y_true, y_pred, zero_division=0))
print('Recall   :', recall_score(y_true, y_pred, zero_division=0))
print('F1       :', f1_score(y_true, y_pred, zero_division=0))

res.sample(8)

Precision: 0.5
Recall   : 0.5
F1       : 0.5


Unnamed: 0,t,temperature,vibration,is_anomaly,score,pred_anomaly
121,121,33.052,0.235,0,0.079929,0
152,152,35.812,0.148,0,0.201994,0
83,83,34.008,0.137,0,0.078307,0
172,172,37.55,0.151,0,0.228059,0
95,95,32.111,0.192,0,0.081419,0
165,165,37.199,0.148,0,0.197825,0
135,135,34.292,0.147,0,0.07024,0
150,150,35.622,0.248,0,0.194617,0


## 4) Edge vs Cloud latency simulation

We can't measure real network latency here, but we can simulate the effect of **local** inference (edge) vs **remote** inference (cloud) by adding synthetic network delays.

- **Edge inference:** model runs on-device → immediate response (small compute time)
- **Cloud inference:** device sends data → network transit → cloud runs heavy model → returns result → device receives response (network + compute time)

We'll run a tiny timing loop to compare average times.

In [4]:
import time
import statistics

def edge_infer(sample, local_compute_ms=10):
    # simulate small local compute
    t0 = time.time()
    time.sleep(local_compute_ms/1000.0)
    t1 = time.time()
    return (t1-t0)

def cloud_infer(sample, network_ms=60, cloud_compute_ms=80, network_return_ms=60):
    # simulate send, cloud compute, return
    t0 = time.time()
    time.sleep(network_ms/1000.0)        # uplink
    time.sleep(cloud_compute_ms/1000.0)  # cloud work
    time.sleep(network_return_ms/1000.0) # downlink
    t1 = time.time()
    return (t1-t0)

samples = df.sample(20, random_state=SEED).to_dict('records')
edge_times = [edge_infer(s, local_compute_ms=8 + random.random()*6) for s in samples]
cloud_times = [cloud_infer(s, network_ms=20+random.random()*100, cloud_compute_ms=50+random.random()*60, network_return_ms=20+random.random()*100) for s in samples]

print('Edge (avg ms):', round(statistics.mean(edge_times)*1000, 1))
print('Cloud (avg ms):', round(statistics.mean(cloud_times)*1000, 1))
print('Edge is ~{:.1f}x faster (simulated)'.format(statistics.mean(cloud_times)/statistics.mean(edge_times)))

Edge (avg ms): 11.0
Cloud (avg ms): 223.6
Edge is ~20.3x faster (simulated)


### Discussion — tradeoffs
- **Latency:** Edge wins (milliseconds) — crucial for safety and control loops.
- **Compute & Model Complexity:** Cloud can host large models (LLMs, big vision networks); edge must use tiny models or offload.
- **Bandwidth & Cost:** Edge reduces uplink bandwidth and cloud ingestion costs by pre-filtering.
- **Freshness / Retraining:** Cloud is the central place for model training and long-term analytics; edge gets periodic model updates.

## 5) Pipeline diagram (ASCII / simple graph) + what to store in cloud

```
Device(s) --> Edge Gateway (filter/aggregate/infer) --> Message Broker (MQTT/Kafka) --> Cloud Ingest --> Data Lake / Feature Store
                                                              |--> Model Training (cloud GPU)
                                                              |--> Monitoring & Dashboard
                                                              |--> Model Registry --> Edge Deployment
```

What to keep in the cloud: long-term telemetry, labeled events, retraining datasets, aggregated KPIs, model artifacts, audit logs, and dashboards.

## 6) Example production extensions (short bullets)
- **Smart city traffic lights:** edge cameras detect congestion (local), cloud aggregates city-wide flow and optimizes signal timings.
- **Predictive maintenance (manufacturing):** vibration sensor anomalies detected at edge trigger immediate shutdown; cloud performs root-cause analytics and schedules maintenance.
- **Healthcare wearables:** on-device alerts (arrhythmia), cloud stores history for physician review & model updates.
- **Retail inventory:** edge computer vision counts shelf stock in real time; cloud centralizes inventory across stores.

## 7) Operational considerations (short checklist)
- Model monitoring & drift detection
- Secure OTA model updates + signing
- Data privacy: keep PII on-device if possible
- Fault tolerance: graceful fallback when cloud unreachable
- Logging & cost controls for cloud inference pipelines