<a href="https://colab.research.google.com/github/brendanpshea/computing_concepts_python/blob/main/IntroCS_12_NeuralNets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A Voyage Through the History of Neural Networks: From Perceptrons to Transformers
### Computing Concepts With Python | Brendan Shea, PhD (Brendan.Shea@rctc.ed).



In the previous chapter, we discussed two main approaches dominated the field: Good Old-Fashioned AI (GOFAI) and statistical machine learning. GOFAI focuses on symbolic reasoning, using handcrafted rules and logic to simulate intelligent behavior. Statistical machine learning, on the other hand, relies on extracting patterns from data using techniques like Bayesian inference and decision trees.

However, both these approaches have limitations. GOFAI struggles with handling the ambiguity and complexity of real-world situations, while statistical machine learning is limited in its ability to learn rich, hierarchical representations from raw data.

Over the past 60+ years, these issues (among others) have led researchers to seek inspiration from another source: the brain. The human brain, with its intricate network of billions of interconnected neurons, is capable of remarkable feats of perception, reasoning, and creativity. What if we could create artificial neural networks that mimic the brain's structure and function?

### Biological Neurons: Nature's Computing Devices

To understand the motivation behind artificial neural networks, let's first look at how biological neurons work. A **neuron** is a specialized cell that processes and transmits information through electrical and chemical signals. It consists of a cell body, dendrites that receive input from other neurons, and an axon that sends output to other neurons.

When a neuron receives sufficient input from other neurons, it fires an electrical impulse (called an action potential) down its axon. This impulse is then converted into a chemical signal at synapses, the junctions between neurons, and transmitted to the dendrites of the next neuron. Through this process, neurons form complex networks that enable the brain to process information and control behavior.

This neural computation is not unique to humans. Whales, with their large and complex brains, rely on neural networks for navigation, communication, and social behavior. Even simpler creatures like seagulls use neural computation for tasks like visual recognition and motor control.

The key insight behind artificial neural networks is that we can simulate this process in computers. By creating networks of artificial neurons that take input, perform computations, and produce output, we can enable machines to learn and perform intelligent tasks.

### The Quest for the White Whale

Imagine embarking on a quest, like Captain Ahab in Herman Melville's epic novel "Moby-Dick," seeking the elusive white whale of human-level artificial intelligence. Our journey begins with the earliest artificial neural networks and follows their evolution over decades, as researchers pursued the dream of creating machines that can think and learn like humans (and whales, and seagulls).

From simple perceptrons to complex recurrent architectures, the history of neural networks is a story of ingenuity, perseverance, and the tireless pursuit of a revolutionary idea. By drawing inspiration from the brain, researchers have created powerful tools for perception, reasoning, and creativity.

In the following sections, we'll explore the key milestones in this journey, from the birth of the perceptron to the rise of multi-layer networks and the development of recurrent architectures. We'll see how each innovation brought us closer to the white whale of human-level AI, and consider the challenges and opportunities that lie ahead.


## The Dawn of Perceptrons

Our story starts in the 1950s with the birth of the perceptron, a simple artificial neuron devised by Frank Rosenblatt. Inspired by the biological neurons in the brain, Rosenblatt sought to create a mathematical model that could learn and make decisions based on input data.

A **perceptron** consists of three main components: input weights, a bias, and an activation function. The **input weights** determine the importance of each input feature, while the **bias** is an additional parameter that helps shift the decision boundary. The **activation** function determines the output of the perceptron based on the weighted sum of the inputs and the bias.

Imagine a perceptron tasked with determining whether a sailor should go fishing based on two inputs: the weather condition (sunny or rainy) and the wind speed (high or low). The perceptron assigns weights to each input, representing their importance in the decision. For example, the weather condition might have a higher weight than the wind speed, as it is more critical in determining the success of the fishing trip.

To train the perceptron, we feed it examples of situations where a sailor decided to go fishing or not, along with the corresponding input values. Let's consider a simplified training process:

1.  Initialize the perceptron with random weights (e.g., 0.5 for weather and 0.2 for wind speed) and a random bias (e.g., 0.1).
2.  Feed the first training example: a sailor went fishing on a sunny day with low wind speed. The input values are represented as (1, 0), where 1 indicates sunny weather and 0 indicates low wind speed. The target output is 1, indicating the sailor went fishing.
3.  The perceptron computes the weighted sum of the inputs and the bias (0.5 × 1 + 0.2 × 0 + 0.1 = 0.6) and applies the activation function (e.g., a step function that outputs 1 if the sum is greater than or equal to 0, and 0 otherwise). In this case, the output is 1, indicating the perceptron predicts the sailor should go fishing.
4.  Compare the perceptron's output to the actual outcome (1 for going fishing). In this example, the perceptron's prediction matches the actual outcome, so no weight adjustment is needed.
5.  Repeat steps 2-4 for more training examples, adjusting the weights and bias whenever the perceptron's output differs from the actual outcome. The adjustment is done using the learning rate, which determines the size of the weight update. For instance, if the perceptron predicts the sailor should go fishing (output 1) but the actual outcome was not going fishing (target 0), the weights and bias would be decreased proportionally to the learning rate to reduce the likelihood of predicting going fishing in similar situations.

After training, the perceptron can be deployed to make decisions on new, unseen data. Given the weather condition and wind speed, the perceptron computes the weighted sum, adds the bias, and applies the activation function to determine whether the sailor should go fishing or not.

### Code Example: Perceptron
Let's create a simple Python class for a Perceptron that can learn to predict whether a sailor should go fishing based on two inputs: the weather condition and the wind speed. We'll train the Perceptron on a small dataset, deploy it to make predictions, and interpret the output.

In [1]:
import random

class Perceptron:
    def __init__(self, learning_rate=0.1):
        self.weights = [random.random() for _ in range(2)]
        self.bias = random.random()
        self.learning_rate = learning_rate

    def predict(self, inputs):
        weighted_sum = sum(w * x for w, x in zip(self.weights, inputs)) + self.bias
        return 1 if weighted_sum >= 0 else 0

    def train(self, inputs, target):
        output = self.predict(inputs)
        error = target - output
        self.weights = [w + self.learning_rate * error * x for w, x in zip(self.weights, inputs)]
        self.bias += self.learning_rate * error

# Training data
training_data = [
    ((1, 0), 1),  # (sunny, low wind) -> go fishing
    ((1, 1), 0),  # (sunny, high wind) -> don't go fishing
    ((0, 0), 0),  # (rainy, low wind) -> don't go fishing
    ((0, 1), 0)   # (rainy, high wind) -> don't go fishing
]

# Create a Perceptron
perceptron = Perceptron()

# Train the Perceptron
epochs = 10
for _ in range(epochs):
    for inputs, target in training_data:
        perceptron.train(inputs, target)

# Deploy the Perceptron
while True:
    weather = int(input("Enter the weather condition (1 for sunny, 0 for rainy): "))
    wind = int(input("Enter the wind speed (1 for high, 0 for low): "))

    prediction = perceptron.predict([weather, wind])
    if prediction == 1:
        print("The Perceptron suggests: Go fishing! 🎣")
    else:
        print("The Perceptron suggests: Don't go fishing. ⚓️")

    continue_prediction = input("Do you want to make another prediction? (yes/no): ")
    if continue_prediction.lower() != 'yes':
        break

Enter the weather condition (1 for sunny, 0 for rainy): 1
Enter the wind speed (1 for high, 0 for low): 0
The Perceptron suggests: Go fishing! 🎣
Do you want to make another prediction? (yes/no): no


In this code block:

1.  We define a `Perceptron` class with an `__init__` method that initializes the weights and bias randomly and sets the learning rate.
2.  The `predict` method calculates the weighted sum of the inputs and the bias. If the sum is greater than or equal to 0, it returns 1; otherwise, it returns 0.
3.  The `train` method updates the weights and bias based on the prediction error and the learning rate.
4.  We create a small training dataset where each data point consists of two inputs (weather condition and wind speed) and a target output (1 for go fishing, 0 for don't go fishing).
5.  We instantiate a `Perceptron` object and train it on the training data for a specified number of epochs.
6.  After training, we deploy the Perceptron to make predictions. The user is prompted to enter the weather condition and wind speed, and the Perceptron predicts whether to go fishing or not.
7.  The Perceptron's output is interpreted as a suggestion to either go fishing (output 1) or not go fishing (output 0), displayed with corresponding emojis.
8.  The user can choose to make multiple predictions by entering 'yes' when prompted, or exit the program by entering any other input.

### Exercise
Make a copy of the Perceptron code from above, and see what happens when you do the following:

1.  Modify the `learning_rate` parameter in the `Perceptron` constructor to a different value, e.g., `learning_rate=0.5`.
    -   Run the code and observe how the Perceptron's learning speed and accuracy change with a higher or lower learning rate.
2. Add or remove training examples in the `training_data` list to see how the Perceptron adapts to different scenarios.
    -   Add `((0, 1), 1)` to the `training_data` list to represent a scenario where the sailor goes fishing on a rainy day with high wind speed.
    -   Run the code and notice how the Perceptron's predictions change based on the updated training data.
3.   Change the value of the `epochs` variable to a higher number, e.g., `epochs = 20`.
    -   Run the code and observe how increasing the number of training epochs affects the Perceptron's performance and accuracy.

## The Rise of Multi-Layer Perceptrons and Backpropagation

As Ishmael's sister, Isabel, eagerly awaits his letters from aboard the Pequod, she finds herself struggling to decipher his sloppy handwriting. Determined to unravel the secrets hidden within the smudged ink, Isabel turns to the power of **Multi-Layer Perceptrons (MLPs)** to create a program that can learn to read Ishmael's script.

MLPs are the next step in the evolution of artificial neural networks, building upon the foundations laid by perceptrons. The key to overcoming the limitations of perceptrons is to stack them in multiple layers, creating a more sophisticated network capable of learning complex patterns and relationships.

Imagine Isabel's MLP as a crew of diligent decipherers, each member responsible for a specific task. The first layer, the **input layer**, receives the raw data - the curves, loops, and lines of Ishmael's handwriting. This information is then passed on to the **hidden layers**, where teams of perceptrons work together to identify patterns and features, like the shape of a letter 'a' or the slant of an 'e'. Finally, the output layer combines these findings to make a final decision, decoding the handwritten word.

To train her MLP, Isabel needs a dataset of Ishmael's handwriting samples, along with their correct interpretations. She carefully compiles a collection of his letters, meticulously labeling each word to create a comprehensive training set.

The training process itself is a journey of trial and error, much like the Pequod's hunt for the elusive white whale. Isabel's MLP learns through a process called **backpropagation**, which adjusts the weights of the connections between perceptrons based on the errors made during prediction.

Here's a simplified overview of how Isabel's MLP learns to read Ishmael's handwriting:

1.  Initialize the MLP with random weights.
2.  Feed a handwriting sample (e.g., the word "whale") through the network, computing the outputs of each layer.
3.  Compare the final output (e.g., the predicted word) to the correct label (the actual word).
4.  Calculate the error and backpropagate it through the network, adjusting the weights to minimize the error.
5.  Repeat steps 2-4 for the entire training set, iterating until the MLP's predictions improve.

With each iteration, Isabel's MLP grows more adept at deciphering Ishmael's handwriting, like a crew becoming more skilled at navigating the treacherous waters of the ocean. The hidden layers learn to identify increasingly complex features, from individual letters to entire words and phrases.

However, training MLPs is computationally expensive, and Isabel soon realizes that she needs more advanced techniques to tackle the challenges posed by Ishmael's penmanship. The curse of dimensionality looms over her project like the shadow of Moby Dick, threatening to overwhelm her modest computational resources.

Undeterred, Isabel presses on, exploring techniques like regularization and dropout to prevent overfitting and improve the MLP's generalization. She experiments with different activation functions and optimization algorithms, each a new harpoon in her quest to conquer the white whale of illegible handwriting.

In the end, Isabel's MLP proves to be a valuable tool, allowing her to decipher Ishmael's letters with increasing accuracy. The once-inscrutable scribbles now yield their secrets, revealing tales of adventure, camaraderie, and the relentless pursuit of the legendary white whale.

As Isabel reads the decoded letters, she can't help but feel a sense of kinship with the Pequod's crew, having embarked on her own journey of discovery and perseverance. The rise of Multi-Layer Perceptrons has not only unlocked the mysteries of Ishmael's handwriting but has also opened the door to a new era of artificial intelligence, one where machines can learn to navigate the complexities of the human world.

### In the Real World...
In the real world, Multi-Layer Perceptrons (MLPs) have been successfully applied to a variety of handwriting recognition tasks, much like Isabel's quest to decipher Ishmael's letters. One of the most notable early successes was the recognition of handwritten zip codes by the United States Postal Service (USPS) in the late 1980s and early 1990s.

The USPS faced a daunting challenge: efficiently sorting and routing millions of pieces of mail each day based on handwritten zip codes. In 1989, the USPS implemented a system called the Remote Computer Reader (RCR), which used MLPs to interpret handwritten zip codes on envelopes.

The RCR system was trained on a vast dataset of handwritten digits, learning to recognize the unique patterns and variations in each number. By breaking down the input images into smaller features and processing them through multiple layers, the MLPs learned to accurately classify the handwritten digits, even in the presence of noise, distortions, and individual writing styles.

The success of the RCR system demonstrated the potential of MLPs to tackle real-world problems involving complex pattern recognition. The system achieved impressive accuracy rates, successfully reading a significant portion of handwritten zip codes and greatly reducing the need for manual sorting.

This early success paved the way for further advancements in handwriting recognition, such as the recognition of cursive scripts and the digitization of historical documents. MLPs, along with other neural network architectures, continue to play a crucial role in these applications, enabling machines to interpret and understand human-generated text with increasing accuracy and efficiency.

In [None]:
# Copy the Perceptron code here