# Regular Programming vs Machine Learning

In regular programming we process inputs and transform them to obtain outputs. For example, if I want to convert Celsius degrees to Fahrenheit degrees, I can implement the formula:

Fahrenheit = Celsius * 1.8 + 32

Then, for any value of temperature in Celsius degree, I can obtain Fahrenheit degrees.

In [None]:
def celsius_to_fahrenheit(v):
    return v * 1.8 + 32

celsius_to_fahrenheit(42)

Now, in machine learning we do not know the formula or algorithm or transformation that transforms one value into the other. All we have is examples of inputs and corresponding outputs

In [None]:
data = [
    (-40, -40.0),
    (-10, 14.0),
    (0, 32.0),
    (8, 46.4),
    (15, 59.0),
    (22, 71.6),
    (38, 100.4)
]

Then, we selected a model (in this example, a linear one), and fit it to the data. 

Note: For now, ignore all the implementation details.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Data
data = [
    (-40, -40.0),
    (-10, 14.0),
    (0, 32.0),
    (8, 46.4),
    (15, 59.0),
    (22, 71.6),
    (38, 100.4)
]

# Convert data to tensors
inputs = torch.tensor([data_point[0] for data_point in data], dtype=torch.float32).view(-1, 1)  # Celsius
targets = torch.tensor([data_point[1] for data_point in data], dtype=torch.float32).view(-1, 1)  # Fahrenheit

# Define the model
class LinearRegression(nn.Module):
    def __init__(self):
        super(LinearRegression, self).__init__()
        self.linear = nn.Linear(1, 1)  # 1 input feature, 1 output feature (Fahrenheit)

    def forward(self, x):
        return self.linear(x)

model = LinearRegression()

# Loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.1)

# Training loop
num_epochs = 1000
lossi = []
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(inputs)
    loss = criterion(outputs, targets)
    lossi.append(loss.item())

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Print progress
    if (epoch+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Test the model
test_input = torch.tensor([[30.0]])  # Celsius value to convert
predicted_fahrenheit = model(test_input).item()
print(f'Predicted Fahrenheit for 30°C: {predicted_fahrenheit:.1f}')


Lets take a look to how well is the network doing on each epoch

In [None]:
import matplotlib.pyplot as plt
plt.xlabel('# Epoch')
plt.ylabel('loss')
plt.plot(lossi)
plt.show()

Now, lets make some predictions

In [None]:
with torch.no_grad():
    result = model(torch.tensor([100.0]).unsqueeze(1))
print(result)

In [None]:
celsius_to_fahrenheit(100)

Lets find the errors with some random celsius values

In [None]:
celsius_test = torch.randint(-30, 150, (12,))
celsius_test

In [None]:
fahrenheit_test = [celsius_to_fahrenheit(c) for c in celsius_test]
print(fahrenheit_test)

In [None]:
with torch.no_grad():
    inferred_fahrenheit = model(celsius_test.to(torch.float).unsqueeze(1))
print(inferred_fahrenheit)

In [None]:
mean_square_error = sum((real-inferr)**2 for real, inferr in zip(fahrenheit_test, inferred_fahrenheit))
mean_square_error

Lets see what the network learned so far:

In [None]:
model.linear.weight, model.linear.bias

Compare it with the actual equation:

Fahrenheit = Celsius * 1.8 + 32

In conclusion, we manage to build a **model** that correctly maps between Celsius degrees to Fahrenheit degrees based only in examples.

**This machine learning task is knows as Regression and we use a Linear Model**

Lets see another example with a different task. We have some rules that allow us to classify the type of an animal based on some characteristics:

- Weight > 100 kg:
    - Yes: Hair?
        - Yes: Bear
        - No: Leave on water?
            - Yes: Whale
            - No: Anaconda
    - No: Fly?
        - Yes: Eagle
        - No: Cat

This is Biology, so if I have some animal description, I can automatically find which animal it is.

In [None]:
def get_animal(weight, hair, water, fly):
    if weight > 100:
        if hair:
            return "Bear"
        else:
            if water:
                return "Whale"
            else:
                return "Anaconda"
    else:
        if fly:
            return "Eagle"
        else:
            return "Cat"

In [None]:
get_animal(120, True, False, False)

In [None]:
get_animal(5, True, False, False)

The machile learning problem appears when I have some animal descriptions together with the current type of animal, and I want to create a model that, based on the descriptions, can infer the type of animal.

First, generate the data.

In [None]:
import random

def generate_random_animals(n):    
    result = []
    for _ in range(n):
        weight = random.randint(1, 200)
        hair = random.random() > 0.5
        water = random.random() > 0.5
        fly = random.random() > 0.5
        animal = get_animal(weight, hair, water, fly)
        result.append((weight, hair, water, fly, animal))
    return result    
        
for weight, hair, water, fly, animal in generate_random_animals(5):
    print(weight, hair, water, fly, "** Animal:",  animal)

In [None]:
data = generate_random_animals(1000)

In [None]:
# Load libraries
import pandas as pd
from sklearn.tree import DecisionTreeClassifier # Import Decision Tree Classifier
from sklearn.model_selection import train_test_split # Import train_test_split function
from sklearn import metrics #Import scikit-learn metrics module for accuracy calculation

In [None]:
df = pd.DataFrame(data, columns=['weight', 'hair', 'water', 'fly', 'animal'])
df.head(3)

In [None]:
#split dataset in features and target variable
feature_cols = ['weight', 'hair', 'water', 'fly']
X = df[feature_cols] # Features
y = df.animal # Target variable

In [None]:
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) # 70% training and 30% test

In [None]:
# Create Decision Tree classifer object
clf = DecisionTreeClassifier()

# Train Decision Tree Classifer
clf = clf.fit(X_train,y_train)

#Predict the response for test dataset
y_pred = clf.predict(X_test)

In [None]:
y_pred

In [None]:
[(r, p) for r, p in zip(y_test, y_pred)]

In [None]:
# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Lets take a look to the model built

In [None]:
from sklearn import tree
text_representation = tree.export_text(clf, feature_names=['weight', 'hair', 'water', 'fly'])
print(text_representation)

**This machine learning task is known as suppervised classification**

Lets see another example. Here we are trying to estimate the correct number from its handwritten image 

In [None]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim

# Define transformations to apply to the data
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert images to tensors
    transforms.Normalize((0.5,), (0.5,))  # Normalize the tensor image with mean and standard deviation
])

# Download and load the MNIST dataset
trainset = torchvision.datasets.MNIST(root='data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

testset = torchvision.datasets.MNIST(root='data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)



In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Function to show images with labels
def imshow_with_labels(img, labels):
    img = img / 2 + 0.5  # Unnormalize
    npimg = img.numpy()
    plt.figure(figsize=(10, 10))
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

    # Print labels near each image
    num_images = len(labels)
    grid_size = int(np.sqrt(num_images))
    for i in range(num_images):
        plt.text((i % grid_size) * 30, (i // grid_size) * 30, str(labels[i].item()), color='red', fontsize=12)

    plt.axis('off')
    plt.show()

# Get some random training images and their labels
dataiter = iter(trainloader)
images, labels = dataiter.next()

# Show images with labels
imshow_with_labels(torchvision.utils.make_grid(images), labels)


In [None]:
trainset.data.shape

In [None]:
trainset.targets[0], trainset.data[0], 

For solving this classification problem in a traditional way, we need to figure out and then calculate some "features" that differenciate from one class to the other. For example, one can be the total sum of elements on each image.

In [None]:
import pandas as pd

data = pd.DataFrame({
    'feature': trainset.targets,
    'value': torch.sum(trainset.data, axis=(1, 2)).numpy()
})

plt.figure(figsize=(10, 6))
data.boxplot(column='value', by='feature')
plt.show()

We can see that we can use this feature to differentiate between categories, but there is a lot of overlapping. Lets manufacture another feature, for example, the symmetries.

In [None]:
# Vertical Symmetry

data = pd.DataFrame({
    'feature': trainset.targets,
    'value': (torch.sum(trainset.data[:, :, :14], axis=(1, 2)) - 
              torch.sum(trainset.data[:, :, 14:], axis=(1, 2))
              ).numpy()
})

# Create the box plot
plt.figure(figsize=(10, 6))
data.boxplot(column='value', by='feature')
plt.show()

In [None]:
data = pd.DataFrame({
    'feature': trainset.targets,
    'value': (torch.sum(trainset.data[:, :14, :], axis=(1, 2)) - 
              torch.sum(trainset.data[:, 14:, :], axis=(1, 2))
              ).numpy()
})

plt.figure(figsize=(10, 6))
data.boxplot(column='value', by='feature')
plt.show()

We can now try to find some rules that allow us to classify each image, using the three manufactured features.

The machine learning way is different. We create a model that can learn the relations, fit the model using available data, and then use the fitted model to classify digits.

In [None]:
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = x.view(x.shape[0], -1)  # Flatten the input tensor
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = SimpleNN()

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

In [None]:
epochs = 5
for epoch in range(epochs):
    running_loss = 0.0
    for images, labels in trainloader:
        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass and optimize
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    print(f"Epoch {epoch+1}, Loss: {running_loss/len(trainloader)}")
print('Finished Training')


In [None]:
correct = 0
total = 0
with torch.no_grad():
    for images, labels in testloader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct / total}%')


In [None]:
dataiter = iter(testloader)
images, labels = dataiter.next()

outputs = model(images)
_, predicted = torch.max(outputs, 1)

for l, p in zip(labels, predicted):
    print(l, p)


You can see that the problem was almost perfectly solved, with no need to manually engineering the features.

**This task is also a suppervised classification, but using images as inputs**

Lets see a final example. The manager of a chain of stores wants to understand the behavior of their customers in order to direct advertising campaigns to similar subsets of customers.

The available data includes: gender, age, annual income, and a score between 0 and 100 that evaluates the magnitude of purchases.

In [None]:
df = pd.read_csv("data/Mall_Customers.csv", index_col='CustomerID')
df.head(5)

First, we simplify the name of some columns

In [None]:
df.rename(index=str, columns={'Annual Income (k$)': 'Income',
                              'Spending Score (1-100)': 'Score'}, inplace=True)
df.head()

Now, we will try to understand data using pairplots

In [None]:
import matplotlib.pyplot as plt
import seaborn as sn

sn.pairplot(df, hue='Gender', aspect=1.5)
plt.show()

We conclude that gender is not important and can be removed from data

In [None]:
X = df.drop(['Gender'], axis=1)
X.head()

We will use the KMeans clustering algorithm, which decomposes the dataframe into groups of objects that are very similar to each other and dissimilar to the objects in other groups. As a result, we also obtain a representative object for each group, which is the object most similar to the others.

This algorithm takes the number of desired groups (k) as a parameter. Lets try with k=3

In [None]:
from sklearn.cluster import KMeans

km3 = KMeans(n_clusters=3, n_init="auto", random_state=314).fit(X)

X['Labels'] = km3.labels_
plt.figure(figsize=(12, 8))
sn.scatterplot(x=X['Income'], y=X['Score'], hue=X['Labels'], 
                palette=sn.color_palette('hls', 3))
plt.title('KMeans with 3 Clusters')
plt.show()

Lets test what happen with k values from  2 to 10

In [None]:
clusters = []

for i in range(2, 11):
    km = KMeans(n_clusters=i, n_init="auto", random_state=314).fit(X)
    clusters.append(km.inertia_)
    
fig, ax = plt.subplots(figsize=(12, 8))
sn.lineplot(x=list(range(2, 11)), y=clusters, ax=ax)
ax.set_title('Searching for Elbow')
ax.set_xlabel('Clusters')
ax.set_ylabel('Inertia')

# Annotate arrow
ax.annotate('Possible Elbow Point', xy=(3, 140000), xytext=(3, 50000), xycoords='data',          
             arrowprops=dict(arrowstyle='->', connectionstyle='arc3', color='blue', lw=2))

ax.annotate('Possible Elbow Point', xy=(5, 80000), xytext=(5, 150000), xycoords='data',          
             arrowprops=dict(arrowstyle='->', connectionstyle='arc3', color='blue', lw=2))

plt.show()

So, lets test with k = 5.

In [None]:
km5 = KMeans(n_clusters=5, n_init="auto", random_state=314).fit(X)

X['Labels'] = km5.labels_
plt.figure(figsize=(12, 8))
sn.scatterplot(x=X['Income'], y=X['Score'], hue=X['Labels'], 
                palette=sn.color_palette('hls', 5))
plt.title('KMeans with 3 Clusters')
plt.show()

The results with k=5 are better that with k=3. 

**Note** the highly subjective nature of this evaluation, as unlike previous examples, here we do not have prior knowledge to perform an objective evaluation.

The 5 obtained clusters can be explained as follows:

    Label 0: high income and low expenses
    Label 1: low income and expenses
    Label 2: high income and expenses
    Label 3: average income and expenses
    Label 4: low income and high expenses

In conclusion, the client can notice that there is a segment with high income and low expenses, to which they could direct a more aggressive advertising strategy and potentially achieve good results.

Another conclusion is that there is a segment that spends more than their income, which is interesting to consider.

**This type of machine learning task is known as clustering**

## Conclusion
- In regular programming, we provide the algorithm or formula to transform the inputs in outputs
- In machine learning, we provide examples of inputs and its corresponding outputs, and make the algorithms to figure out a good model for doing the transformation
    - There are different models for performing each task
    - Every model has different parameters that impacts the quality of the results
    - Selecting the best option is a combination of experience and a trial-and-error strategy.