# LEARNING

This notebook serves as supporting material for topics covered in **Chapter 18 - Learning from Examples** , **Chapter 19 - Knowledge in Learning**, **Chapter 20 - Learning Probabilistic Models** from the book *Artificial Intelligence: A Modern Approach*. This notebook uses implementations from [learning.py](https://github.com/aimacode/aima-python/blob/master/learning.py). Let's start by importing everything from the module:

In [None]:
import math

from learning import *
from notebook import *

## 0. CONTENTS

* Machine Learning Overview
* Datasets
* Iris Visualization
* Distance Functions
* k-Nearest Neighbours
* Decision Tree Learner
* Linear Learner
* Logistic Linear Learner
* Model Evaluation & Comparison
* Hands-on Exercises

## 1. MACHINE LEARNING OVERVIEW

In this notebook, we learn about agents that can improve their behavior through diligent study of their own experiences.

An agent is **learning** if it improves its performance on future tasks after making observations about the world.

There are three types of feedback that determine the three main types of learning:

### **Supervised Learning**:

In Supervised Learning the agent observes some example input-output pairs and learns a function that maps from input to output.

**Example**: Let's think of an agent to classify images containing cats or dogs. If we provide an image containing a cat or a dog, this agent should output a string "cat" or "dog" for that particular image. To teach this agent, we will give a lot of input-output pairs like {cat image-"cat"}, {dog image-"dog"} to the agent. The agent then learns a function that maps from an input image to one of those strings.

### **Unsupervised Learning**:

In Unsupervised Learning the agent learns patterns in the input even though no explicit feedback is supplied. The most common type is **clustering**: detecting potential useful clusters of input examples.

**Example**: A taxi agent would develop a concept of *good traffic days* and *bad traffic days* without ever being given labeled examples.

### **Reinforcement Learning**:

In Reinforcement Learning the agent learns from a series of reinforcementsâ€”rewards or punishments.

**Example**: Let's talk about an agent to play the popular Atari gameâ€”[Pong](http://www.ponggame.org). We will reward a point for every correct move and deduct a point for every wrong move from the agent. Eventually, the agent will figure out its actions prior to reinforcement were most responsible for it.

## 2. DATASETS

For the following tutorials we will use a range of datasets, to better showcase the strengths and weaknesses of the algorithms. The datasests are the following:

* [Fisher's Iris](https://github.com/aimacode/aima-data/blob/a21fc108f52ad551344e947b0eb97df82f8d2b2b/iris.csv): Each item represents a flower, with four measurements: the length and the width of the sepals and petals. Each item/flower is categorized into one of three species: Setosa, Versicolor and Virginica.

* [Zoo](https://github.com/aimacode/aima-data/blob/a21fc108f52ad551344e947b0eb97df82f8d2b2b/zoo.csv): The dataset holds different animals and their classification as "mammal", "fish", etc. The new animal we want to classify has the following measurements: 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 4, 1, 0, 1 (don't concern yourself with what the measurements mean).

To make using the datasets easier, we have written a class, `DataSet`, in `learning.py`. The tutorials found here make use of this class.

Let's have a look at how it works before we get started with the algorithms.

### 2.1. Intro

A lot of the datasets we will work with are .csv files (although other formats are supported too). We have a collection of sample datasets ready to use [on aima-data](https://github.com/aimacode/aima-data/tree/a21fc108f52ad551344e947b0eb97df82f8d2b2b). Two examples are the datasets mentioned above (*iris.csv* and *zoo.csv*). You can find plenty datasets online, and a good repository of such datasets is [UCI Machine Learning Repository](https://archive.ics.uci.edu/). Other notable platforms are [OpenML](https://www.openml.org/) and [Kaggle](https://www.kaggle.com/).

In such files, each line corresponds to one item/measurement. Each individual value in a line represents a *feature* and usually there is a value denoting the *class* of the item.

You can find the code for the dataset here:

In [None]:
psource(DataSet)

### 2.2. Class Attributes

* **examples**: Holds the items of the dataset. Each item is a list of values.
* **attrs**: The indexes of the features (by default in the range of [0,f), where *f* is the number of features). For example, `item[i]` returns the feature at index *i* of *item*.
* **attr_names**: An optional list with attribute names. For example, `item[s]`, where *s* is a feature name, returns the feature of name *s* in *item*.
* **target**: The attribute a learning algorithm will try to predict. By default the last attribute.
* **inputs**: This is the list of attributes without the target.
* **values**: A list of lists which holds the set of possible values for the corresponding attribute/feature. If initially `None`, it gets computed (by the function `setproblem`) from the examples.
* **distance**: The distance function used in the learner to calculate the distance between two items. By default `mean_boolean_error`.
* **name**: Name of the dataset.
* **source**: The source of the dataset (url or other). Not used in the code.
* **exclude**: A list of indexes to exclude from `inputs`. The list can include either attribute indexes (attrs) or names (attr_names).

### 2.3. Class Helper Functions

These functions help modify a `DataSet` object to your needs.

* **sanitize**: Takes as input an example and returns it with non-input (target) attributes replaced by `None`. Useful for testing. Keep in mind that the example given is not itself sanitized, but instead a sanitized copy is returned.

* **classes_to_numbers**: Maps the class names of a dataset to numbers. If the class names are not given, they are computed from the dataset values. Useful for classifiers that return a numerical value instead of a string.

* **remove_examples**: Removes examples containing a given value. Useful for removing examples with missing values, or for removing classes (needed for binary classifiers).

### 2.4. Importing a Dataset

#### Importing from aima-data

Datasets uploaded on aima-data can be imported with the following line:

In [None]:
iris = DataSet(name="iris")

To check that we imported the correct dataset, we can do the following:

In [None]:
print(iris.examples[0])
print(iris.inputs)

Which correctly prints the first line in the csv file and the list of attribute indexes.

When importing a dataset, we can specify to exclude an attribute (for example, at index 1) by setting the parameter `exclude` to the attribute index or name.

In [None]:
iris2 = DataSet(name="iris",exclude=[1])
print(iris2.inputs)

### 2.5. Attributes

Here we showcase the attributes.

First we will print the first three items/examples in the dataset.

In [None]:
print(iris.examples[:3])

Then we will print `attrs`, `attrnames`, `target`, `input`. Notice how `attrs` holds values in [0,4], but since the fourth attribute is the target, `inputs` holds values in [0,3].

In [None]:
print("attrs:", iris.attrs)
print("attr_names (by default same as attrs):", iris.attr_names)
print("target:", iris.target)
print("inputs:", iris.inputs)

Now we will print all the possible values for the first feature/attribute.

In [None]:
print(iris.values[0])

Finally we will print the dataset's name and source. Keep in mind that we have not set a source for the dataset, so in this case it is empty.

In [None]:
print("name:", iris.name)
print("source:", iris.source)

A useful combination of the above is `dataset.values[dataset.target]` which returns the possible values of the target. For classification problems, this will return all the possible classes. Let's try it:

In [None]:
print(iris.values[iris.target])

### 2.6. Helper Functions

We will now take a look at the auxiliary functions found in the class.

First we will take a look at the `sanitize` function, which sets the non-input values of the given example to `None`.

In this case we want to hide the class of the first example, so we will sanitize it.

Note that the function doesn't actually change the given example; it returns a sanitized *copy* of it.

In [None]:
print("Sanitized:",iris.sanitize(iris.examples[0]))
print("Original:",iris.examples[0])

Currently the `iris` dataset has three classes, setosa, virginica and versicolor. We want though to convert it to a binary class dataset (a dataset with two classes). The class we want to remove is "virginica". To accomplish that we will utilize the helper function `remove_examples`.

In [None]:
iris2 = DataSet(name="iris")

iris2.remove_examples("virginica")
print(iris2.values[iris2.target])

We also have `classes_to_numbers`. For a lot of the classifiers in the module (like the Neural Network), classes should have numerical values. With this function we map string class names to numbers.

In [None]:
print("Class of first example:",iris2.examples[0][iris2.target])
iris2.classes_to_numbers()
print("Class of first example:",iris2.examples[0][iris2.target])

As you can see "setosa" was mapped to 0.

Finally, we take a look at `find_means_and_deviations`. It finds the means and standard deviations of the features for each class.

In [None]:
means, deviations = iris.find_means_and_deviations()

print("Setosa feature means:", means["setosa"])
print("Versicolor mean for first feature:", means["versicolor"][0])

print("Setosa feature deviations:", deviations["setosa"])
print("Virginica deviation for second feature:",deviations["virginica"][1])

## 3. IRIS VISUALIZATION

Since we will use the iris dataset extensively in this notebook, below we provide a visualization tool that helps in comprehending the dataset and thus how the algorithms work.

We plot the dataset in a 3D space using `matplotlib` and the function `show_iris` from `notebook.py`. The function takes as input three parameters, *i*, *j* and *k*, which are indicises to the iris features, "Sepal Length", "Sepal Width", "Petal Length" and "Petal Width" (0 to 3). By default we show the first three features.

In [None]:
iris = DataSet(name="iris")

show_iris()
show_iris(0, 1, 3)
show_iris(1, 2, 3)

You can play around with the values to get a good look at the dataset.

## 4. DISTANCE FUNCTIONS

In a lot of algorithms, there is a need to compare items, finding how *similar* or *close* they are. For that we have many different functions at our disposal. Below are the functions implemented in the module:

### 4.1. Manhattan Distance (`manhattan_distance`)

One of the simplest distance functions. It calculates the difference between the coordinates/features of two items. To understand how it works, imagine a 2D grid with coordinates *x* and *y*. In that grid we have two items, at the squares positioned at `(1,2)` and `(3,4)`. The difference between their two coordinates is `3-1=2` and `4-2=2`. If we sum these up we get `4`. That means to get from `(1,2)` to `(3,4)` we need four moves; two to the right and two more up. The function works similarly for n-dimensional grids.

In [None]:
def manhattan_distance(X, Y):
    return sum([abs(x - y) for x, y in zip(X, Y)])


distance = manhattan_distance([1,2], [3,4])
print("Manhattan Distance between (1,2) and (3,4) is", distance)

### 4.2. Euclidean Distance (`euclidean_distance`)

Probably the most popular distance function. It returns the square root of the sum of the squared differences between individual elements of two items.

In [None]:
def euclidean_distance(X, Y):
    return math.sqrt(sum([(x - y)**2 for x, y in zip(X,Y)]))


distance = euclidean_distance([1,2], [3,4])
print("Euclidean Distance between (1,2) and (3,4) is", distance)

### 4.3. Hamming Distance (`hamming_distance`)

This function counts the number of differences between single elements in two items. For example, if we have two binary strings "111" and "011" the function will return 1, since the two strings only differ at the first element. The function works the same way for non-binary strings too.

In [None]:
def hamming_distance(X, Y):
    return sum(x != y for x, y in zip(X, Y))


distance = hamming_distance(['a','b','c'], ['a','b','b'])
print("Hamming Distance between 'abc' and 'abb' is", distance)

### 4.4. Mean Boolean Error (`mean_boolean_error`)

To calculate this distance, we find the ratio of different elements over all elements of two items. For example, if the two items are `(1,2,3)` and `(1,4,5)`, the ration of different/all elements is 2/3, since they differ in two out of three elements.

In [None]:
def mean_boolean_error(X, Y):
    return mean(int(x != y) for x, y in zip(X, Y))


distance = mean_boolean_error([1,2,3], [1,4,5])
print("Mean Boolean Error Distance between (1,2,3) and (1,4,5) is", distance)

### 4.5. Mean Error (`mean_error`)

This function finds the mean difference of single elements between two items. For example, if the two items are `(1,0,5)` and `(3,10,5)`, their error distance is `(3-1) + (10-0) + (5-5) = 2 + 10 + 0 = 12`. The mean error distance therefore is `12/3=4`.

In [None]:
def mean_error(X, Y):
    return mean([abs(x - y) for x, y in zip(X, Y)])


distance = mean_error([1,0,5], [3,10,5])
print("Mean Error Distance between (1,0,5) and (3,10,5) is", distance)

### 4.6. Mean Square Error (`ms_error`)

This is very similar to the `Mean Error`, but instead of calculating the difference between elements, we are calculating the *square* of the differences.

In [None]:
def ms_error(X, Y):
    return mean([(x - y)**2 for x, y in zip(X, Y)])


distance = ms_error([1,0,5], [3,10,5])
print("Mean Square Distance between (1,0,5) and (3,10,5) is", distance)

### 4.7. Root of Mean Square Error (`rms_error`)

This is the square root of `Mean Square Error`.

In [None]:
def rms_error(X, Y):
    return math.sqrt(ms_error(X, Y))


distance = rms_error([1,0,5], [3,10,5])
print("Root of Mean Error Distance between (1,0,5) and (3,10,5) is", distance)

## 5. K-NEAREST NEIGHBOURS CLASSIFIER

### 5.1. Overview
The k-Nearest Neighbors algorithm is a non-parametric method used for classification and regression. We are going to use this to classify Iris flowers. More about kNN on [Scholarpedia](http://www.scholarpedia.org/article/K-nearest_neighbor).

![kNN plot](images/knn_plot.png)

Let's see how kNN works with a simple plot shown in the above picture.

We have co-ordinates (we call them **features** in Machine Learning) of this red star and we need to predict its class using the kNN algorithm. In this algorithm, the value of **k** is arbitrary. **k** is one of the **hyper parameters** for kNN algorithm. We choose this number based on our dataset and choosing a particular number is known as **hyper parameter tuning/optimising**. We learn more about this in coming topics.

Let's put **k = 3**. It means you need to find 3-Nearest Neighbors of this red star and classify this new point into the majority class. Observe that smaller circle which contains three points other than **test point** (red star). As there are two violet points, which form the majority, we predict the class of red star as **violet- Class B**.

Similarly if we put **k = 5**, you can observe that there are three yellow points, which form the majority. So, we classify our test point as **yellow- Class A**.

In practical tasks, we iterate through a bunch of values for k (like [1, 3, 5, 10, 20, 50, 100]), see how it performs and select the best one. 

### 5.2. Implementation

Below follows the implementation of the kNN algorithm:

In [None]:
psource(NearestNeighborLearner)

It takes as input a dataset and k (default value is 1) and it returns a function, which we can later use to classify a new item.

To accomplish that, the function uses a heap-queue, where the items of the dataset are sorted according to their distance from *example* (the item to classify). We then take the k smallest elements from the heap-queue and we find the majority class. We classify the item to this class.

### 5.3. Example

We measured a new flower with the following values: 5.1, 3.0, 1.1, 0.1. We want to classify that item/flower in a class. To do that, we write the following:

In [None]:
iris = DataSet(name="iris")

kNN = NearestNeighborLearner(iris,k=3)
print(kNN([5.1,3.0,1.1,0.1]))

The output of the above code is "setosa", which means the flower with the above measurements is of the "setosa" species.

## 6. DECISION TREE LEARNER

### 6.1. Overview

#### 6.1.1. Decision Trees
A decision tree is a flowchart that uses a tree of decisions and their possible consequences for classification. At each non-leaf node of the tree an attribute of the input is tested, based on which corresponding branch leading to a child-node is selected. At the leaf node the input is classified based on the class label of this leaf node. The paths from root to leaves represent classification rules based on which leaf nodes are assigned class labels.
![perceptron](images/decisiontree_fruit.jpg)

#### 6.1.2. Decision Tree Learning
Decision tree learning is the construction of a decision tree from class-labeled training data. The data is expected to be a tuple in which each record of the tuple is an attribute used for classification. The decision tree is built top-down, by choosing a variable at each step that best splits the set of items. There are different metrics for measuring the "best split". These generally measure the homogeneity of the target variable within the subsets.

#### 6.1.3. Gini Impurity
Gini impurity of a set is the probability of a randomly chosen element to be incorrectly labeled if it was randomly labeled according to the distribution of labels in the set.

$$I_G(p) = \sum{p_i(1 - p_i)} = 1 - \sum{p_i^2}$$

We select a split which minimizes the Gini impurity in child nodes.

#### 6.1.4. Information Gain
Information gain is based on the concept of entropy from information theory. Entropy is defined as:

$$H(p) = -\sum{p_i \log_2{p_i}}$$

Information Gain is difference between entropy of the parent and weighted sum of entropy of children. The feature used for splitting is the one which provides the most information gain.

### 6.2. Implementation
The nodes of the tree constructed by our learning algorithm are stored using either `DecisionFork` or `DecisionLeaf` based on whether they are a parent node or a leaf node respectively.

In [None]:
psource(DecisionFork)

`DecisionFork` holds the attribute, which is tested at that node, and a dict of branches. The branches store the child nodes, one for each of the attribute's values. Calling an object of this class as a function with input tuple as an argument returns the next node in the classification path based on the result of the attribute test.

In [None]:
psource(DecisionLeaf)

The leaf node stores the class label in `result`. All input tuples' classification paths end on a `DecisionLeaf` whose `result` attribute decide their class.

In [None]:
psource(DecisionTreeLearner)

The implementation of `DecisionTreeLearner` provided in [learning.py](https://github.com/aimacode/aima-python/blob/master/learning.py) uses information gain as the metric for selecting which attribute to test for splitting. The function builds the tree top-down in a recursive manner. Based on the input it makes one of the four choices:
<ol>
<li>If the input at the current step has no training data we return the mode of classes of input data received in the parent step (previous level of recursion).</li>
<li>If all values in training data belong to the same class it returns a `DecisionLeaf` whose class label is the class which all the data belongs to.</li>
<li>If the data has no attributes that can be tested we return the class with highest plurality value in the training data.</li>
<li>We choose the attribute which gives the highest amount of entropy gain and return a `DecisionFork` which splits based on this attribute. Each branch recursively calls `decision_tree_learning` to construct the sub-tree.</li>
</ol>

### 6.3. Example

We will now use the Decision Tree Learner to classify a sample with values: 5.1, 3.0, 1.1, 0.1.

In [None]:
iris = DataSet(name="iris")

DTL = DecisionTreeLearner(iris)
print(DTL([5.1, 3.0, 1.1, 0.1]))

As expected, the Decision Tree learner classifies the sample as "setosa" as seen in the previous section.

## 7. LINEAR LEARNER

### 7.1. Overview

Linear Learner is a model that assumes a linear relationship between the input variables x and the single output variable y. More specifically, that y can be calculated from a linear combination of the input variables x. Linear learner is a quite simple model as the representation of this model is a linear equation.  

The linear equation assigns one scaler factor to each input value or column, called a coefficients or weights. One additional coefficient is also added, giving additional degree of freedom and is often called the intercept or the bias coefficient.   
For example :  y = ax1 + bx2 + c .  

### 7.2. Implementation

Below mentioned is the implementation of Linear Learner.

In [None]:
psource(LinearLearner)

This algorithm first assigns some random weights to the input variables and then based on the error calculated updates the weight for each variable. Finally the prediction is made with the updated weights.  

### 7.3. Example

We will now use the Linear Learner to classify a sample with values: 5.1, 3.0, 1.1, 0.1.

In [None]:
print(f"Number of features: {len(iris.inputs)}")
print(f"Example shape: {len(iris.examples[0])}")
print(f"First example: {iris.examples[0]}")

In [None]:
iris = DataSet(name="iris")
iris.classes_to_numbers()

linear_learner = LinearLearner(iris)
print(linear_learner([5, 3, 1, 0.1]))

## 8. LOGISTIC LINEAR LEARNER

### 8.1. Overview

While the Linear Learner we just explored is great for regression (predicting continuous values), it's not ideal for classification tasks. The main problem is that linear regression can output any value from negative infinity to positive infinity, but for classification we want outputs between 0 and 1 that we can interpret as probabilities.

Logistic Linear Learner solves this by using the sigmoid function to "squash" the linear output into the range [0,1]:

$$\text{probability} = \text{sigmoid}(w_0 + w_1 x_1 + w_2 x_2 + ... + w_n x_n)$$

Where the sigmoid function is:
$$\text{sigmoid}(z) = \frac{1}{1 + e^{-z}}$$

### 8.2. Why Use Sigmoid?

The sigmoid function has some very useful properties:
- Output range: Always between 0 and 1, perfect for probabilities
- S-shaped curve: Smooth transition from 0 to 1
- Interpretable: Output can be read as "confidence" in the prediction

Let's visualize the sigmoid function:

In [None]:
# Let's visualize the sigmoid function
import numpy as np
import matplotlib.pyplot as plt

# Create a range of z values
z = np.linspace(-10, 10, 100)
# Apply sigmoid function: 1 / (1 + e^(-z))
sigmoid_values = 1 / (1 + np.exp(-z))

plt.figure(figsize=(10, 6))
plt.plot(z, sigmoid_values, 'b-', linewidth=2, label='Sigmoid Function')
plt.grid(True, alpha=0.3)
plt.xlabel('Input (z)', fontsize=12)
plt.ylabel('Output (sigmoid(z))', fontsize=12)
plt.title('The Sigmoid Function: Converting Any Number to [0,1]', fontsize=14)
plt.axhline(y=0.5, color='r', linestyle='--', alpha=0.7, label='Decision Boundary (0.5)')
plt.axvline(x=0, color='g', linestyle='--', alpha=0.7, label='z = 0')
plt.legend()
plt.ylim(-0.1, 1.1)
plt.show()

print("Key properties of the sigmoid function:")
print(f"sigmoid(-10) = {1/(1+np.exp(10)):.6f} â‰ˆ 0")
print(f"sigmoid(0) = {1/(1+np.exp(0)):.1f}")  
print(f"sigmoid(10) = {1/(1+np.exp(-10)):.6f} â‰ˆ 1")

### 8.3. Implementation

Let's look at the Logistic Linear Learner implementation:

In [None]:
psource(LogisticLinearLearner)

### 8.4. Example: Binary Classification

For this example, we'll create a binary classification problem from the iris dataset by removing one of the classes. This will help us see how logistic regression works with a simpler two-class problem.

In [None]:
# Create a binary classification dataset
iris_binary = DataSet(name="iris")
iris_binary.remove_examples("virginica")  # Remove virginica class
iris_binary.classes_to_numbers()  # Convert classes to numbers (setosa=0, versicolor=1)

print("Binary iris dataset:")
print(f"Classes: {iris_binary.values[iris_binary.target]}")
print(f"Number of examples: {len(iris_binary.examples)}")
print(f"First few examples:")
for i in range(3):
    print(f"  {iris_binary.examples[i]}")

# Train logistic regression
logistic_learner = LogisticLinearLearner(iris_binary, learning_rate=0.1, epochs=1000)

# Test with the same flower as before
test_flower = [5.1, 3.0, 1.1, 0.1]
probability = logistic_learner(test_flower)

print(f"\nLogistic Regression Prediction:")
print(f"Flower features: {test_flower}")
print(f"Probability of being 'versicolor': {probability:.4f}")
print(f"Probability of being 'setosa': {1-probability:.4f}")
print(f"Predicted class: {'versicolor' if probability > 0.5 else 'setosa'}")

## 9. MODEL EVALUATION & COMPARISON

### 9.1. Why Do We Need Model Evaluation?

So far we've looked at individual algorithms, but how do we know which one is best for our problem? We need systematic ways to:

1. Measure performance: How accurate are our predictions?
2. Compare algorithms: Which algorithm works better for this specific dataset?
3. Avoid overfitting: Make sure our model works on new, unseen data

### 9.2. Train-Test Split

The most fundamental concept in machine learning evaluation is never test on the same data you trained on Think of it like this:

- Training data: Like study materials for an exam
- Test data: Like the actual exam questions (should be unseen!)

If you memorize the answers to practice questions, that doesn't mean you understand the subject. Similarly, if a model memorizes the training data, it might not work on new data.

In [None]:
# Let's implement a simple train-test split function
def train_test_split(dataset, test_ratio=0.3):
    """
    Split dataset into training and testing sets.
    
    Args:
        dataset: DataSet object
        test_ratio: Fraction of data to use for testing (default 0.3 = 30%)
    
    Returns:
        train_data, test_data: Two DataSet objects
    """
    import random
    import copy
    
    # Make a copy of the dataset to avoid modifying the original
    all_examples = copy.deepcopy(dataset.examples)
    random.shuffle(all_examples)  # Randomize the order
    
    # Calculate split point
    total_examples = len(all_examples)
    test_size = int(total_examples * test_ratio)
    
    # Split the data
    test_examples = all_examples[:test_size]
    train_examples = all_examples[test_size:]
    
    # Create new DataSet objects
    train_data = copy.deepcopy(dataset)
    train_data.examples = train_examples
    
    test_data = copy.deepcopy(dataset)
    test_data.examples = test_examples
    
    return train_data, test_data

# Example: Split the iris dataset
iris_full = DataSet(name="iris")
train_data, test_data = train_test_split(iris_full, test_ratio=0.3)

print(f"Original dataset: {len(iris_full.examples)} examples")
print(f"Training set: {len(train_data.examples)} examples (70%)")
print(f"Test set: {len(test_data.examples)} examples (30%)")
print(f"First training example: {train_data.examples[0]}")
print(f"First test example: {test_data.examples[0]}")

### 9.3. Measuring Accuracy

**Accuracy** is the simplest performance metric: what percentage of predictions were correct?

$$\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}}$$

In [None]:
def calculate_accuracy(learner, test_data):
    """
    Calculate the accuracy of a learner on test data.
    
    Args:
        learner: Trained learning function
        test_data: DataSet object with test examples
        
    Returns:
        accuracy: Float between 0 and 1
    """
    correct = 0
    total = len(test_data.examples)
    
    for example in test_data.examples:
        # Get input features (exclude target)
        features = [example[i] for i in test_data.inputs]
        # Get true label
        true_label = example[test_data.target]
        # Get prediction
        prediction = learner(features)
        
        # Check if correct
        if prediction == true_label:
            correct += 1
    
    accuracy = correct / total
    return accuracy

# Example usage: Let's test this function
# First train a simple k-NN on the training data
knn_learner = NearestNeighborLearner(train_data, k=3)

# Calculate accuracy
accuracy = calculate_accuracy(knn_learner, test_data)
print(f"k-NN (k=3) Accuracy on test set: {accuracy:.2%}")
print(f"This means the model got {accuracy*100:.1f}% of predictions correct!")

### 9.4. Algorithm Comparison

Now let's compare all our algorithms on the same dataset to see which performs best:

In [None]:
# Compare different algorithms on the iris dataset
print("ALGORITHM COMPARISON ON IRIS DATASET")

# Prepare data
iris_for_comparison = DataSet(name="iris")
train_set, test_set = train_test_split(iris_for_comparison, test_ratio=0.3)

algorithms = {}
accuracies = {}

# 1. k-Nearest Neighbors (different k values)
print("\nk-Nearest Neighbors:")
for k in [1, 3, 5, 7]:
    learner = NearestNeighborLearner(train_set, k=k)
    accuracy = calculate_accuracy(learner, test_set)
    algorithms[f"k-NN (k={k})"] = learner
    accuracies[f"k-NN (k={k})"] = accuracy
    print(f"   k={k}: {accuracy:.2%}")

# 2. Decision Tree
print("\nDecision Tree:")
dt_learner = DecisionTreeLearner(train_set)
dt_accuracy = calculate_accuracy(dt_learner, test_set)
algorithms["Decision Tree"] = dt_learner
accuracies["Decision Tree"] = dt_accuracy
print(f"   Accuracy: {dt_accuracy:.2%}")

# 3. Linear Learner (for numerical output)
print("\nLinear Learner:")
# Note: Linear learner works better with numerical classes
train_set_numeric = DataSet(name="iris")
train_set_numeric.examples = train_set.examples[:]
train_set_numeric.classes_to_numbers()

test_set_numeric = DataSet(name="iris") 
test_set_numeric.examples = test_set.examples[:]
test_set_numeric.classes_to_numbers()

linear_learner = LinearLearner(train_set_numeric)

# For classification, we need to round the output to nearest integer
def linear_classifier(features):
    raw_output = linear_learner(features)
    # Round to nearest integer and clip to valid range
    return max(0, min(2, round(raw_output)))

# Calculate accuracy manually since we need the special classifier function
correct = 0
for example in test_set_numeric.examples:
    features = [example[i] for i in test_set_numeric.inputs]
    true_label = example[test_set_numeric.target]
    prediction = linear_classifier(features)
    if prediction == true_label:
        correct += 1

linear_accuracy = correct / len(test_set_numeric.examples)
algorithms["Linear Learner"] = linear_classifier
accuracies["Linear Learner"] = linear_accuracy
print(f"   Accuracy: {linear_accuracy:.2%}")

# Summary
print("\nRESULTS SUMMARY:")
print("-" * 30)
sorted_results = sorted(accuracies.items(), key=lambda x: x[1], reverse=True)
for i, (algorithm, accuracy) in enumerate(sorted_results, 1):
    medal = "First!" if i == 1 else "Second!" if i == 2 else "Third!" if i == 3 else "  "
    print(f"{medal} {algorithm}: {accuracy:.2%}")
    
best_algorithm = sorted_results[0][0]
print(f"\nBest performing algorithm: {best_algorithm}")
print(f"Remember: Results may vary with different random splits!")

### 9.5. Cross-Validation: A More Robust Evaluation

The train-test split we used above has a problem: the results depend on which examples randomly end up in the test set. Cross-validation solves this by testing multiple times with different splits.

How k-fold cross-validation works:

1. Split the data into k equal parts (folds)
2. Train on k-1 folds, test on the remaining fold
3. Repeat k times, each time using a different fold for testing
4. Average the results

Let's use the built-in cross-validation function from our learning module:

In [None]:
# Cross-validation comparison
print("CROSS-VALIDATION COMPARISON (5-fold)")
print("=" * 45)

iris_cv = DataSet(name="iris")

# Test different algorithms with cross-validation
cv_results = {}

# k-NN with different k values
print("\nk-Nearest Neighbors:")
for k in [1, 3, 5]:
    def make_knn_learner(dataset):
        return NearestNeighborLearner(dataset, k=k)
    
    error_rate, std_dev = cross_validation(make_knn_learner, iris_cv, k=5)
    accuracy = 1 - error_rate  # Convert error rate to accuracy
    cv_results[f"k-NN (k={k})"] = accuracy
    print(f"   k={k}: {accuracy:.3f} Â± {std_dev:.3f}")

# Decision Tree
print("\nDecision Tree:")
dt_error, dt_std = cross_validation(DecisionTreeLearner, iris_cv, k=5)
dt_accuracy = 1 - dt_error  # Convert error rate to accuracy
cv_results["Decision Tree"] = dt_accuracy
print(f"   Accuracy: {dt_accuracy:.3f} Â± {dt_std:.3f}")

print("\nCross-Validation Results Summary:")
print("-" * 35)
for algorithm, accuracy in sorted(cv_results.items(), key=lambda x: x[1], reverse=True):
    print(f"{algorithm}: {accuracy:.1%}")

print(f"\nCross-validation gives us more reliable estimates!")
print(f"These results are averaged over 5 different train/test splits.")

## 10. HANDS-ON EXERCISES

Now it's time to apply what you've learned! These exercises will help you understand the concepts better through practice.

### Exercise 1: Exploring the Zoo Dataset

The zoo dataset contains information about different animals and their characteristics. Let's explore it and build a classifier!

In [None]:
# Exercise 1: Load and explore the zoo dataset
zoo = DataSet(name="zoo")

print("ZOO DATASET EXPLORATION")
print("=" * 30)
print(f"Number of animals: {len(zoo.examples)}")
print(f"Number of features: {len(zoo.inputs)}")
print(f"Animal types: {zoo.values[zoo.target]}")

print(f"\nFirst few animals:")
for i in range(3):
    animal = zoo.examples[i]
    features = [animal[j] for j in zoo.inputs]
    animal_type = animal[zoo.target]
    print(f"  Animal {i+1}: features={features}, type='{animal_type}'")

print(f"\nFeature information:")
print(f"Each animal has {len(zoo.inputs)} binary features (0 or 1)")
print(f"These represent characteristics like 'has hair', 'has feathers', etc.")

# ðŸŽ¯ YOUR TASK: 
print(f"\nYOUR TASK:")
print(f"1. Try different k values for k-NN on the zoo dataset")
print(f"2. Which k value works best?") 
print(f"3. How does Decision Tree perform compared to k-NN?")

# Starter code for your experiments:
print(f"\nSTARTER CODE:")
print(f"# zoo_train, zoo_test = train_test_split(zoo, test_ratio=0.3)")
print(f"# knn_zoo = NearestNeighborLearner(zoo_train, k=?)")
print(f"# accuracy = calculate_accuracy(knn_zoo, zoo_test)")
print(f"# print(f'Accuracy: {{accuracy:.2%}}')")

# Uncomment and modify the lines below to start your experiments:
# zoo_train, zoo_test = train_test_split(zoo, test_ratio=0.3)
# knn_zoo = NearestNeighborLearner(zoo_train, k=1)
# accuracy = calculate_accuracy(knn_zoo, zoo_test)
# print(f"k-NN (k=1) accuracy on zoo: {accuracy:.2%}")

### Exercise 2: Distance Function Impact

Different distance functions can significantly affect k-NN performance. Let's explore this!

In [None]:
# Exercise 2: Test different distance functions
print("DISTANCE FUNCTION COMPARISON")

# Create some test points
point1 = [1, 2, 3]
point2 = [4, 5, 6]

print(f"Comparing distances between {point1} and {point2}:")
print(f"Manhattan Distance: {manhattan_distance(point1, point2)}")
print(f"Euclidean Distance: {euclidean_distance(point1, point2):.3f}")
print(f"Hamming Distance: {hamming_distance(point1, point2)}")

# Test with iris dataset using different distance functions
iris_test = DataSet(name="iris")

# Create a k-NN learner with hamming distance
iris_test.distance = hamming_distance
knn_hamming = NearestNeighborLearner(iris_test, k=3)

# Test the same flower as before
test_prediction = knn_hamming([5.1, 3.0, 1.1, 0.1])

print(f"\nPrediction with Hamming distance: {test_prediction}")

print(f"\nYOUR TASK:")
print(f"1. Create different datasets with different distance functions")
print(f"2. Compare their performance using cross-validation")
print(f"3. Which distance function works best for the iris dataset?")

# Try this:
# iris_euclidean = DataSet(name=\"iris\")
# iris_euclidean.distance = euclidean_distance
# knn_euclidean = NearestNeighborLearner(iris_euclidean, k=3)
# accuracy_euclidean = cross_validation(lambda d: NearestNeighborLearner(d, k=3), iris_euclidean, k=5)

### Exercise 3: Create Your Own Classifier

Now it's time to put everything together! Your challenge is to build the best possible classifier for a dataset of your choice.

In [None]:
# Exercise 3: Build your best classifier
print("CLASSIFIER CHALLENGE")

print("YOUR MISSION:")
print("Build the best possible classifier for the dataset of your choice!")
print()
print("REQUIREMENTS:")
print("1. Choose a dataset (iris, zoo, or restaurant)")
print("2. Try at least 3 different algorithms")
print("3. Experiment with different parameters (k values, distance functions)")
print("4. Use cross-validation to evaluate performance")
print("5. Report your best result!")

print("\nMETHODOLOGY:")
print("1. Split your data or use cross-validation")
print("2. Try multiple algorithms:")
print("   - k-NN with different k values")
print("   - Decision Tree")
print("   - Linear/Logistic Learner (if appropriate)")
print("3. Compare results and pick the best")

print("\nTEMPLATE CODE:")
print("""
# Step 1: Choose your dataset
my_dataset = DataSet(name="???")  # Fill in: iris, zoo, or restaurant

# Step 2: Create comparison function
def compare_algorithms(dataset):
    results = {}
    
    # Try k-NN with different k values
    for k in [1, 3, 5, 7]:
        accuracy = cross_validation(lambda d: NearestNeighborLearner(d, k=k), dataset, k=5)
        results[f'k-NN (k={k})'] = accuracy
    
    # Try Decision Tree
    dt_accuracy = cross_validation(DecisionTreeLearner, dataset, k=5)
    results['Decision Tree'] = dt_accuracy
    
    # Add more algorithms here!
    
    return results

# Step 3: Run comparison
# results = compare_algorithms(my_dataset)

# Step 4: Find the best
# best_algorithm = max(results.items(), key=lambda x: x[1])
# print(f"Best algorithm: {best_algorithm[0]} with {best_algorithm[1]:.2%} accuracy")
""")

print("\nBONUS CHALLENGES:")
print("- Try different distance functions for k-NN")
print("- Create an ensemble (combine multiple algorithms)")
print("- Analyze which features are most important")
print("- Visualize your results")

print("\nStart coding below this cell!")

## 11. SUMMARY & KEY TAKEAWAYS

You've explored the fundamental concepts of machine learning. 

### Algorithms Covered
1. k-Nearest Neighbors (k-NN): Simple, intuitive, works well with small datasets
2. Decision Trees: Easy to interpret, automatically finds important features
3. Linear Learner: Fast, works well for linearly separable data
4. Logistic Linear Learner: Like linear learner but outputs probabilities

### Key Insights
- No single algorithm is always best - performance depends on the dataset
- Evaluation is crucial - always test on unseen data
- Parameters matter - choosing the right k, distance function, etc. affects performance
- Cross-validation gives more reliable estimates than a single train-test split

### When to Use Which Algorithm

| Algorithm | Best For | Pros | Cons |
|-----------|----------|------|------|
| k-NN | Small datasets, complex boundaries | Simple, no training needed | Slow for large data, sensitive to irrelevant features |
| Decision Tree | Interpretable models, mixed data types | Easy to understand, handles categorical data | Can overfit, unstable |
| Linear Learner | Large datasets, simple relationships | Fast, stable | Only works for linear relationships |
| Logistic Learner | Binary classification, probability outputs | Outputs probabilities, regularizable | Limited to linear boundaries |

### Next Steps
- Try these algorithms on your own datasets
- Learn about more advanced techniques (neural networks, ensemble methods)
- Explore feature engineering and data preprocessing
- Study more evaluation metrics (precision, recall, F1-score)

### Remember
> "The goal is not to find the perfect algorithm, but to find the algorithm that works best for your specific problem and data."


In [None]:
# EXPERIMENT CELL - Try your own ideas here!
print("EXPERIMENT ZONE")
print("Use this cell to try your own experiments!")
print()
print("Some ideas to get you started:")
print("- Test different datasets")
print("- Try unusual k values for k-NN")  
print("- Compare training vs test accuracy")
print("- Create visualizations of your results")
print()
print("Remember: The best way to learn is by doing!")

# Your experimentation code goes here:
# Example: Quick comparison of iris vs zoo performance
#
# print("Quick comparison:")
# iris_acc = cross_validation(lambda d: NearestNeighborLearner(d, k=3), DataSet(name="iris"), k=5)
# zoo_acc = cross_validation(lambda d: NearestNeighborLearner(d, k=3), DataSet(name="zoo"), k=5)
# print(f"k-NN on Iris: {iris_acc:.2%}")
# print(f"k-NN on Zoo: {zoo_acc:.2%}")