
If you want, I can also **generate the full notebook file (.ipynb)** for download.

---

# ðŸ““ **Jupyter Notebook â€” Practice Exercises for `generated_his_system_metrics.csv`**

---

## ## **ðŸ“˜ Section A â€” Basic EDA**

---

### **### 1. Inspecting Data Structure**

```python
import pandas as pd

# Load dataset
df = pd.read_csv("generated_his_system_metrics.csv")

# 1. Basic properties
df.info()

# Missing values per column
df.isna().sum()
```

---

### **### 2. Summary Statistics**

```python
df[['cpu_usage','memory_usage','latency_ms']].describe()
```

**Prompt:**
Write 1â€“2 sentences interpreting an interesting observation from the summary statistics.

---

---

# ðŸ“˜ **Section B â€” Visualizations**

*(Assuming Matplotlib/Seaborn installed.)*

```python
import matplotlib.pyplot as plt
import seaborn as sns
```

---

### **### 3. Time Series Plot (CPU Usage)**

```python
plt.figure(figsize=(14,5))
plt.plot(df['timestamp'], df['cpu_usage'])
plt.title("CPU Usage Over Time")
plt.xlabel("Time")
plt.ylabel("CPU Usage (%)")
plt.show()

# Identify intervals > 85%
high_cpu = df[df['cpu_usage'] > 85]
high_cpu.head()
```

---

### **### 4. Scatterplot â€” Active Users vs Latency**

```python
plt.figure(figsize=(7,5))
sns.scatterplot(data=df, x="active_users", y="latency_ms")
plt.title("Active Users vs Latency")
plt.show()
```

Add a short written interpretation below the graph.

---

### **### 5. Correlation Heatmap**

```python
metrics = ['cpu_usage','memory_usage','disk_io_read','disk_io_write','latency_ms']

plt.figure(figsize=(8,6))
sns.heatmap(df[metrics].corr(), annot=True, cmap="coolwarm")
plt.title("Correlation Matrix for Key System Metrics")
plt.show()
```

Write which variables are most correlated.

---

---

# ðŸ“˜ **Section C â€” Grouping, Filtering, Anomaly Detection**

---

### **### 6. Peak Load Analysis**

```python
peak = df[(df['active_users'] > 200) & (df['cpu_usage'] > 75)]
len(peak)
```

---

### **### 7. Outlier Detection for Latency (IQR)**

```python
Q1 = df['latency_ms'].quantile(0.25)
Q3 = df['latency_ms'].quantile(0.75)
IQR = Q3 - Q1

lower = Q1 - 1.5 * IQR
upper = Q3 + 1.5 * IQR

outliers = df[(df['latency_ms'] < lower) | (df['latency_ms'] > upper)]

lower, upper, len(outliers)
```

---

### **### 8. Daily Network Load**

```python
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['date'] = df['timestamp'].dt.date

daily = df.groupby('date')[['network_in','network_out']].sum()
daily['total_network'] = daily['network_in'] + daily['network_out']

daily.sort_values("total_network", ascending=False).head(1)
```

---

---

# ðŸ“˜ **Section D â€” Interpretive Questions (Markdown Cells)**

---

### **### 9. Performance Bottleneck (Short Answer)**

**Write your answer:**

> Based on the pattern of high latency + normal CPU + high disk activity, the likely bottleneck isâ€¦

---

### **### 10. Early-Warning Indicator (Short Answer)**

**Write your answer:**

> The best leading indicator of high latency appears to beâ€¦

---
