# Five Tribes of Machine Learning: Iris Classification
## An Educational Demonstration

Welcome! This notebook demonstrates how the five tribes of machine learning (from Pedro Domingos' book "The Master Algorithm") each approach the classic Iris flower classification problem.

### The Five Tribes:
- üå≥ **Symbolists** - Learn through logical rules
- üß† **Connectionists** - Learn by mimicking the brain
- üß¨ **Evolutionaries** - Learn through simulated evolution
- üìä **Bayesians** - Learn through probabilistic inference
- üìè **Analogizers** - Learn by recognizing similarity

### What You'll Learn:
1. How different ML paradigms approach the same problem
2. The philosophical differences between approaches
3. When to use each type of algorithm
4. Working implementations you can modify and experiment with

## Table of Contents
1. [Introduction](#introduction)
2. [Problem Setup](#problem-setup)
3. [üå≥ Symbolists: Decision Trees](#symbolists)
4. [üß† Connectionists: Neural Networks](#connectionists)
5. [üß¨ Evolutionaries: Genetic Programming](#evolutionaries)
6. [üìä Bayesians: Naive Bayes](#bayesians)
7. [üìè Analogizers: k-Nearest Neighbors](#analogizers)
8. [Comparison & Conclusion](#comparison)
9. [Glossary](#glossary)

In [None]:
# Configure Keras to use JAX backend (must be set before importing keras)
import os
os.environ['KERAS_BACKEND'] = 'jax'

# Standard library imports
import warnings
warnings.filterwarnings('ignore')

# Data manipulation
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine learning - General
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.preprocessing import StandardScaler

# Machine learning - Tribe specific
from sklearn.tree import DecisionTreeClassifier, plot_tree, export_text
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

# Neural networks - Keras 3.x with JAX backend
import keras
from keras import layers

# Genetic algorithms
from deap import base, creator, tools, algorithms
import random

# Set random seeds for reproducibility
np.random.seed(42)
keras.utils.set_random_seed(42)
random.seed(42)

# Plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

print("All imports successful! ‚úì")
print(f"Using Keras backend: {keras.backend.backend()}")

<a id="introduction"></a>
## Introduction

Machine learning isn't just one thing‚Äîit's a collection of fundamentally different approaches to learning from data. Pedro Domingos, in his book "The Master Algorithm," identifies five major "tribes" of machine learning, each with its own philosophy and techniques.

**Why does this matter?** Because understanding these different paradigms helps you:
- Choose the right algorithm for your problem
- Understand why an algorithm works (or doesn't)
- Combine approaches for better results
- Think more deeply about what "learning" really means

In this notebook, we'll see how each tribe tackles the same problem: classifying iris flowers based on their physical measurements. By the end, you'll understand not just *that* different algorithms exist, but *why* they approach problems differently.

<a id="problem-setup"></a>
## Problem Setup: The Iris Dataset

The Iris dataset is the "Hello World" of machine learning. It contains measurements of 150 iris flowers from three species:
- **Setosa**
- **Versicolor**
- **Virginica**

For each flower, we have four measurements:
1. Sepal length (cm)
2. Sepal width (cm)
3. Petal length (cm)
4. Petal width (cm)

**Our task:** Given these four measurements, predict which species the flower belongs to.

**Why Iris?** It's simple enough to understand but complex enough to be non-trivial. Perfect for comparing different approaches!

In [None]:
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names
target_names = iris.target_names

# Create a DataFrame for easier exploration
df = pd.DataFrame(X, columns=feature_names)
df['species'] = pd.Categorical.from_codes(y, target_names)

print(f"Dataset shape: {X.shape}")
print(f"Number of samples: {len(X)}")
print(f"Number of features: {X.shape[1]}")
print(f"Number of classes: {len(target_names)}")
print(f"\nClass distribution:")
print(df['species'].value_counts())
print(f"\nFirst 5 samples:")
df.head()

In [None]:
# Visualize the data
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

for idx, feature in enumerate(feature_names):
    row = idx // 2
    col = idx % 2
    for species_idx, species in enumerate(target_names):
        species_data = df[df['species'] == species][feature]
        axes[row, col].hist(species_data, alpha=0.6, label=species, bins=15)
    axes[row, col].set_xlabel(feature)
    axes[row, col].set_ylabel('Frequency')
    axes[row, col].legend()
    axes[row, col].set_title(f'Distribution of {feature}')

plt.tight_layout()
plt.show()

print("üí° Notice how some features (like petal length) separate the species better than others!")

In [None]:
# Split into training and testing sets
# We'll use the same split for all five tribes to ensure fair comparison
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training set size: {len(X_train)} samples")
print(f"Test set size: {len(X_test)} samples")
print(f"\nTraining set class distribution:")
print(pd.Series(y_train).value_counts().sort_index())
print(f"\nTest set class distribution:")
print(pd.Series(y_test).value_counts().sort_index())

<a id="symbolists"></a>
## üå≥ Symbolists: Decision Trees

### Philosophy

Symbolists believe that learning is the **inverse of deduction**. Just as you can deduce specific conclusions from general rules, Symbolists learn by inducing general rules from specific examples.

**Real-world analogy:** Think of how a detective works. They see clues (data) and build up a theory (rules) that explains all the evidence. "If the footprint is larger than 12 inches AND the suspect is over 6 feet tall, THEN consider this person of interest."

**Master Algorithm:** Inverse deduction

### Key Concepts

- **Logic and Rules**: Learning produces human-readable "if-then" rules
- **Interpretability**: You can understand exactly why the algorithm made a decision
- **Tree Structure**: Rules are organized hierarchically like a flowchart
- **Greedy Splitting**: At each step, choose the split that best separates the classes
- **Decision Boundaries**: Creates rectangular decision regions in feature space

In [None]:
# Create and train a decision tree
tree_model = DecisionTreeClassifier(
    max_depth=3,  # Limit depth for interpretability
    random_state=42
)

tree_model.fit(X_train, y_train)

# Make predictions
y_pred_tree = tree_model.predict(X_test)

# Evaluate
accuracy_tree = accuracy_score(y_test, y_pred_tree)
print(f"Decision Tree Accuracy: {accuracy_tree:.3f}")
print(f"\nClassification Report:")
print(classification_report(y_test, y_pred_tree, target_names=target_names))

In [None]:
# Visualize the decision tree
plt.figure(figsize=(20, 10))
plot_tree(
    tree_model,
    feature_names=feature_names,
    class_names=target_names,
    filled=True,
    rounded=True,
    fontsize=10
)
plt.title("Decision Tree Structure", fontsize=16, pad=20)
plt.show()

# Show the rules in text format
print("\nüìã Decision Rules (text format):")
print("="*50)
tree_rules = export_text(tree_model, feature_names=feature_names)
print(tree_rules)

### Results & Interpretation

The decision tree creates a flowchart of questions about the flowers' measurements. Notice how it:

1. **Starts with the most informative feature** (usually petal-related measurements)
2. **Creates simple yes/no questions** at each node
3. **Produces human-readable rules** you could write down on paper

**Strengths of the Symbolist approach:**
- ‚úÖ Highly interpretable‚Äîyou can explain every decision
- ‚úÖ No data preprocessing needed (no scaling required)
- ‚úÖ Handles both numerical and categorical data
- ‚úÖ Automatically does feature selection

**Weaknesses:**
- ‚ùå Can overfit if not constrained (tree too deep)
- ‚ùå Unstable‚Äîsmall data changes can produce different trees
- ‚ùå Creates axis-aligned boundaries (can't capture diagonal patterns well)

**When to use:** When you need to explain your model's decisions to stakeholders, or when interpretability is crucial (medical diagnosis, loan approval, etc.)

<a id="connectionists"></a>
## üß† Connectionists: Neural Networks

### Philosophy

Connectionists believe that intelligence emerges from **networks of simple units** working together, just like neurons in the brain. Learning happens by adjusting the connections between these units.

**Real-world analogy:** Think of learning to ride a bike. You don't learn explicit rules‚Äîinstead, your brain's neural connections gradually adjust through practice until the right patterns emerge. You can't explain *how* you balance, but your brain knows.

**Master Algorithm:** Backpropagation

### Key Concepts

- **Neurons and Layers**: Simple processing units organized in layers
- **Weights and Biases**: Connections between neurons have adjustable strengths
- **Activation Functions**: Non-linear transformations that enable complex patterns
- **Gradient Descent**: Learning by following the slope of the error downhill
- **Backpropagation**: Efficiently computing how to adjust weights to reduce error

In [None]:
# Neural networks work better with scaled data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("Data scaled for neural network training ‚úì")
print(f"Original range: [{X_train.min():.2f}, {X_train.max():.2f}]")
print(f"Scaled range: [{X_train_scaled.min():.2f}, {X_train_scaled.max():.2f}]")

In [None]:
# Build a simple neural network
nn_model = keras.Sequential([
    layers.Dense(16, activation='relu', input_shape=(4,), name='hidden_layer_1'),
    layers.Dense(8, activation='relu', name='hidden_layer_2'),
    layers.Dense(3, activation='softmax', name='output_layer')
])

# Compile the model
nn_model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Display architecture
print("Neural Network Architecture:")
print("="*50)
nn_model.summary()

In [None]:
# Train the neural network
history = nn_model.fit(
    X_train_scaled, y_train,
    epochs=100,
    batch_size=16,
    validation_split=0.2,
    verbose=0
)

# Make predictions
y_pred_probs = nn_model.predict(X_test_scaled, verbose=0)
y_pred_nn = np.argmax(y_pred_probs, axis=1)

# Evaluate
accuracy_nn = accuracy_score(y_test, y_pred_nn)
print(f"Neural Network Accuracy: {accuracy_nn:.3f}")
print(f"\nClassification Report:")
print(classification_report(y_test, y_pred_nn, target_names=target_names))

In [None]:
# Visualize training history
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot accuracy
axes[0].plot(history.history['accuracy'], label='Training Accuracy')
axes[0].plot(history.history['val_accuracy'], label='Validation Accuracy')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Accuracy')
axes[0].set_title('Model Accuracy Over Time')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot loss
axes[1].plot(history.history['loss'], label='Training Loss')
axes[1].plot(history.history['val_loss'], label='Validation Loss')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Loss')
axes[1].set_title('Model Loss Over Time')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("üí° Notice how the model learns: loss decreases and accuracy increases over epochs!")

### Results & Interpretation

The neural network learned by repeatedly adjusting its weights through backpropagation. Notice how:

1. **Learning is gradual** - accuracy improves smoothly over epochs
2. **It's a black box** - we can't easily explain individual decisions
3. **It learns non-linear patterns** - the hidden layers capture complex relationships

**Strengths of the Connectionist approach:**
- ‚úÖ Can learn very complex, non-linear patterns
- ‚úÖ Scales well to large datasets
- ‚úÖ Can be extended (add more layers) for harder problems
- ‚úÖ Works well with raw data (images, audio, text)

**Weaknesses:**
- ‚ùå Black box‚Äîhard to interpret why it made a decision
- ‚ùå Requires lots of data to avoid overfitting
- ‚ùå Needs careful tuning (learning rate, architecture, etc.)
- ‚ùå Computationally expensive to train

**When to use:** When you have lots of data, complex patterns to learn, and don't need to explain individual predictions (image recognition, speech recognition, etc.)

<a id="evolutionaries"></a>
## üß¨ Evolutionaries: Genetic Algorithms

### Philosophy

Evolutionaries believe that learning is **simulated evolution**. Just as species evolve through natural selection, algorithms can evolve through mutation, crossover, and survival of the fittest.

**Real-world analogy:** Think of breeding dogs. You start with a diverse population, select the best ones (fastest, strongest, friendliest), breed them to create offspring with mixed traits, and occasionally get random mutations. Over generations, the population gets better at whatever you're selecting for.

**Master Algorithm:** Genetic programming

### Key Concepts

- **Population**: Multiple candidate solutions compete
- **Fitness**: How well each candidate performs
- **Selection**: Better candidates more likely to reproduce
- **Crossover**: Combine two parents to create offspring
- **Mutation**: Random changes to maintain diversity
- **Evolution**: Populations improve over generations

**Note:** For simplicity, we'll use genetic algorithms to optimize the parameters of a k-NN classifier rather than full genetic programming.

In [None]:
# Set up DEAP for genetic algorithm
# We'll evolve a simple classifier by optimizing feature weights

# Create fitness and individual classes
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMax)

toolbox = base.Toolbox()

# Each gene is a feature weight between 0 and 1
toolbox.register("attr_float", random.uniform, 0, 1)
toolbox.register("individual", tools.initRepeat, creator.Individual,
                 toolbox.attr_float, n=4)  # 4 features
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

def eval_individual(individual, X_train, y_train, X_val, y_val):
    """Evaluate fitness by weighting features and testing accuracy"""
    weights = np.array(individual)
    
    # Apply feature weights
    X_train_weighted = X_train * weights
    X_val_weighted = X_val * weights
    
    # Train simple k-NN classifier
    knn = KNeighborsClassifier(n_neighbors=3)
    knn.fit(X_train_weighted, y_train)
    
    # Return accuracy as fitness
    accuracy = knn.score(X_val_weighted, y_val)
    return (accuracy,)

# Create validation split
X_train_evo, X_val_evo, y_train_evo, y_val_evo = train_test_split(
    X_train, y_train, test_size=0.2, random_state=42
)

# Register genetic operators
toolbox.register("evaluate", eval_individual,
                 X_train=X_train_evo, y_train=y_train_evo,
                 X_val=X_val_evo, y_val=y_val_evo)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutGaussian, mu=0, sigma=0.2, indpb=0.2)
toolbox.register("select", tools.selTournament, tournsize=3)

print("Genetic algorithm components initialized ‚úì")

In [None]:
# Run the genetic algorithm
population_size = 50
num_generations = 40

# Create initial population
pop = toolbox.population(n=population_size)

# Track statistics
fitness_over_time = []
best_fitness_over_time = []

print("Starting evolution...")
print("="*50)

for gen in range(num_generations):
    # Evaluate all individuals
    fitnesses = list(map(toolbox.evaluate, pop))
    for ind, fit in zip(pop, fitnesses):
        ind.fitness.values = fit
    
    # Track progress
    fits = [ind.fitness.values[0] for ind in pop]
    fitness_over_time.append(np.mean(fits))
    best_fitness_over_time.append(np.max(fits))
    
    if gen % 10 == 0:
        print(f"Generation {gen}: Avg Fitness = {np.mean(fits):.3f}, "
              f"Best Fitness = {np.max(fits):.3f}")
    
    # Select and breed next generation
    offspring = toolbox.select(pop, len(pop))
    offspring = list(map(toolbox.clone, offspring))
    
    # Apply crossover
    for child1, child2 in zip(offspring[::2], offspring[1::2]):
        if random.random() < 0.7:
            toolbox.mate(child1, child2)
            del child1.fitness.values
            del child2.fitness.values
    
    # Apply mutation
    for mutant in offspring:
        if random.random() < 0.2:
            toolbox.mutate(mutant)
            del mutant.fitness.values
    
    pop[:] = offspring

# Get best individual
best_ind = tools.selBest(pop, 1)[0]
best_weights = np.array(best_ind)

print("\n" + "="*50)
print(f"Evolution complete!")
print(f"Best feature weights: {best_weights}")

In [None]:
# Test the evolved solution
X_test_weighted = X_test * best_weights

# Train final model with evolved weights
knn_evo = KNeighborsClassifier(n_neighbors=3)
knn_evo.fit(X_train * best_weights, y_train)

# Predict
y_pred_evo = knn_evo.predict(X_test_weighted)

# Evaluate
accuracy_evo = accuracy_score(y_test, y_pred_evo)
print(f"Genetic Algorithm Accuracy: {accuracy_evo:.3f}")
print(f"\nClassification Report:")
print(classification_report(y_test, y_pred_evo, target_names=target_names))

print(f"\nüí° Feature importance (evolved weights):")
for feature, weight in zip(feature_names, best_weights):
    print(f"  {feature}: {weight:.3f}")

In [None]:
# Visualize evolution progress
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot fitness over generations
axes[0].plot(fitness_over_time, label='Average Fitness', linewidth=2)
axes[0].plot(best_fitness_over_time, label='Best Fitness', linewidth=2)
axes[0].set_xlabel('Generation')
axes[0].set_ylabel('Fitness (Accuracy)')
axes[0].set_title('Evolution of Population Fitness')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot feature weights
axes[1].bar(range(len(feature_names)), best_weights, color='green', alpha=0.7)
axes[1].set_xticks(range(len(feature_names)))
axes[1].set_xticklabels(feature_names, rotation=45, ha='right')
axes[1].set_ylabel('Weight')
axes[1].set_title('Evolved Feature Weights')
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("üí° Notice how the population improves over generations through selection and variation!")

### Results & Interpretation

The genetic algorithm evolved feature weights through natural selection. Notice how:

1. **Fitness improves over generations** - the population adapts to the problem
2. **Diversity matters** - mutation prevents premature convergence
3. **No gradients needed** - works even when we can't compute derivatives

**Strengths of the Evolutionary approach:**
- ‚úÖ Works on any fitness function (no need for gradients)
- ‚úÖ Can optimize discrete or continuous parameters
- ‚úÖ Good at avoiding local optima (thanks to diversity)
- ‚úÖ Naturally parallelizable (evaluate population in parallel)

**Weaknesses:**
- ‚ùå Computationally expensive (many fitness evaluations)
- ‚ùå Slow convergence compared to gradient-based methods
- ‚ùå Many hyperparameters to tune (population size, mutation rate, etc.)
- ‚ùå No guarantees of finding global optimum

**When to use:** When you can't compute gradients, have a complex search space, or need to optimize discrete structures (network architectures, rule sets, etc.)