Think of an AI/ML system like a car engine.
If one small part breaks‚Äîlike a loose bolt‚Äîthe entire car can stop running.
Similarly, in AI systems, even a tiny mistake (wrong data, bad input, or a training issue) can make the whole model fail.

To keep the ‚ÄúAI engine‚Äù running smoothly, we need error handling, which works like:

Seatbelts ‚Üí protect the system from crashing (input validation)

Warning lights ‚Üí tell us when something goes wrong (error logging)

Shock absorbers ‚Üí help the system recover instead of breaking (exception handling)

With good error handling, AI systems become safer, more reliable, and easier to fix.

By the end of this section, you will learn how to:

Find common errors in machine learning pipelines

Validate inputs to avoid bad data

Catch and handle errors during model training

Log errors so you know what went wrong

Test your system to make sure it handles problems correctly

In short: error handling makes your ML system tougher, smarter, and more reliable‚Äîjust like safety systems do for a car.


Create and save a toy dataset

In [1]:
import pandas as pd
import numpy as np

# Set seed so the random data is the same every run
np.random.seed(42)

# Create a toy dataset with 100 rows
data = {
    'feature1': np.random.choice(['error_A', 'error_B', 'error_C'], size=100),
    'feature2': np.random.choice(['severity_high', 'severity_low'], size=100),
    'solution': np.random.choice(['restart', 'update', 'contact_support'], size=100)
}

df = pd.DataFrame(data)

# Save dataset to a CSV file
df.to_csv('toy_data.csv', index=False)

print("Toy dataset created and saved as toy_data.csv")


Toy dataset created and saved as toy_data.csv


2. Load & Explore the Dataset

This is like opening the file in Excel and checking the first few rows.

In [2]:
import pandas as pd

# Load the dataset
df = pd.read_csv('toy_data.csv')

# Explore the dataset
print(df.info())     # shows column names, data types, row count
print(df.head())     # shows the first 5 rows


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   feature1  100 non-null    object
 1   feature2  100 non-null    object
 2   solution  100 non-null    object
dtypes: object(3)
memory usage: 2.5+ KB
None
  feature1       feature2         solution
0  error_C   severity_low          restart
1  error_A   severity_low  contact_support
2  error_C   severity_low           update
3  error_C  severity_high          restart
4  error_A  severity_high          restart


Data Validation With Error Handling

Think of this like a quality check:

‚ùå Is the input NOT a DataFrame? ‚Üí Error

‚ùå Are there missing values? ‚Üí Error

‚úîÔ∏è Otherwise ‚Üí It‚Äôs valid

In [3]:
def validate_data(data):
    try:
        if not isinstance(data, pd.DataFrame):
            raise ValueError("Input must be a pandas DataFrame.")
        
        if data.isnull().values.any():
            raise ValueError("Missing values detected in the dataset.")
        
        print("Data validation successful.")
    
    except ValueError as e:
        print(f"Data validation error: {e}")

# Validate the dataset
validate_data(df)


Data validation successful.


4. Split Data & Train a Model With Error Handling
üëâ We split data into features (X) and solution labels (y):

X = the symptoms

y = the predicted solution

Then we train a Decision Tree model.

In [4]:
from sklearn.model_selection import train_test_split

# Convert text into numbers using one-hot encoding
X = pd.get_dummies(df[['feature1', 'feature2']])
y = df['solution']

# Split into train/test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


Model training with error handling

In [5]:
from sklearn.tree import DecisionTreeClassifier

def train_model(X_train, y_train):
    try:
        model = DecisionTreeClassifier()
        model.fit(X_train, y_train)
        print("Model trained successfully.")
        return model
    except ValueError as e:
        print(f"Model training error: {e}")

# Train the model
model = train_model(X_train, y_train)


Model trained successfully.


5. Error Logging

Instead of just printing errors, we also write them to a file.

Think of this like a black box recorder in an airplane.

In [6]:
import logging

# Set up logging to file
logging.basicConfig(filename='ml_errors.log', level=logging.ERROR)

def validate_data_with_logging(data):
    try:
        if not isinstance(data, pd.DataFrame):
            raise ValueError("Input must be a pandas DataFrame.")
        
        if data.isnull().values.any():
            raise ValueError("Missing values detected in the dataset.")
        
        print("Data validation successful.")
    
    except ValueError as e:
        logging.error(f"Data validation error: {e}")
        print(f"Logged error: {e}")

# Validate normally
validate_data_with_logging(df)


Data validation successful.


test error handling

In [7]:
# Create a damaged dataset with missing values
df_with_missing = df.copy()
df_with_missing.iloc[0, 0] = None  # remove one value

# Validate again
validate_data_with_logging(df_with_missing)


Logged error: Missing values detected in the dataset.
