# **Welcome to IEEE x AIS Workshop!**

In this notebook, we will:
1. Create an artificial dataset
2. Import and explore the data
3. Visualize relationships using graphs
4. Preprocess the data
5. Train a Machine Learning Model
6. Evaluate the model's performance


In [None]:
# --- CELL 1: INSTALL & IMPORT REQUIRED LIBRARIES ---
!pip install seaborn scikit-learn

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import mean_squared_error, r2_score

## **Step 1: Creating an Artificial Dataset**
To demonstrate data science concepts, we generate a synthetic dataset mimicking electronic component failure.

In [None]:
# --- CELL 2: GENERATE ARTIFICIAL DATASET ---
n_samples = 1000
np.random.seed(42)

data = pd.DataFrame({
    "Voltage_Tolerance": np.random.normal(5, 0.5, n_samples),
    "Temperature_Cycle": np.random.randint(0, 500, n_samples),
    "Humidity_Level": np.random.uniform(10, 90, n_samples),
    "Manufacturing_Defect": np.random.choice([0, 1], n_samples, p=[0.99, 0.1]),
    "Vibration_Exposure": np.random.uniform(0, 5, n_samples),
    "Component_Age": np.random.randint(1, 10, n_samples),
    "Failure_Label": np.random.choice([0, 1], n_samples, p=[0.8, 0.2])
})

print("Sample of Dataset:")
print(data.head())

## **Step 2: Exploring the Dataset**
Before training a model, we need to understand the dataset.

In [None]:
# --- CELL 3: DATA EXPLORATION ---
print("Dataset Overview:")
print(data.describe())

print("\nClass Distribution:")
print(data["Failure_Label"].value_counts())

## **Step 3: Visualizing the Data**
We'll use seaborn and matplotlib to analyze feature relationships.


In [None]:
# --- CELL 4: DATA VISUALIZATION ---
plt.figure(figsize=(10,5))
sns.heatmap(data.corr(), annot=True, cmap="coolwarm", linewidths=0.5)
plt.title("Feature Correlation Heatmap")
plt.show()

# TODO: Replace ___ with the correct argument for pairplot
sns.pairplot(data, hue="Failure_Label")
plt.show()

## **Step 4: Preprocessing Data**
We need to split the data into training and test sets and normalize it.

In [None]:
# --- CELL 5: PREPROCESSING DATA ---
X = data.drop(columns=["Failure_Label"])
y = data["Failure_Label"]

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Normalize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## **Step 5: Training a Machine Learning Model**
We'll use a Random Forest Classifier to predict failures.

In [None]:
# --- CELL 6: MODEL TRAINING ---
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train_scaled, y_train)

## **Step 6: Evaluating the Model**
Checking accuracy and performance of the model.


In [None]:
# --- CELL 7: MODEL EVALUATION ---
y_pred = model.predict(X_test_scaled)

print("Model Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))


## **Step 7: Visualizing Performance**
Confusion matrix visualization to interpret model results.

In [None]:
# --- CELL 8: CONFUSION MATRIX ---
conf_matrix = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(5,4))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=["Reliable", "Failure"], yticklabels=["Reliable", "Failure"])
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()
