<a href="https://colab.research.google.com/github/brendanpshea/computing_concepts_python/blob/main/IntroCS_12_ArtificialIntelligence.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Artificial Intelligence: What is AI?

Artificial Intelligence (AI) refers to computer systems that can perform tasks that typically require human intelligence. These systems are designed to mimic human cognitive functions such as learning, problem-solving, and pattern recognition. In today's world, AI has become increasingly integrated into our daily lives, from voice assistants on our phones to recommendation systems on streaming platforms.

* **Artificial Intelligence (AI)** is the field of computer science focused on creating machines that can perform tasks that would normally require human intelligence.
* AI systems can learn from experience, adjust to new inputs, and perform human-like tasks without being explicitly programmed for each specific task.
* Modern AI applications include voice recognition, image classification, natural language processing, and game playing.
* The goal of AI research is to create systems that can reason, learn, plan, and communicate in ways similar to humans.

## Types of AI

| Type | Description | Examples |
|------|-------------|----------|
| **Narrow AI** | Designed for a specific task | Voice assistants, chess programs |
| **General AI** | Hypothetical AI with human-level intelligence across many domains | Not yet achieved |
| **Machine Learning** | AI systems that improve through experience | Recommendation systems, spam filters |
| **Deep Learning** | Advanced machine learning using neural networks | Image recognition, language models |

# A Brief History of AI: Key Milestones

The journey of artificial intelligence spans decades of research, breakthroughs, and occasional setbacks. Understanding this history helps contextualize modern AI developments and appreciate the significant progress that has brought us to today's capabilities. The field has evolved from theoretical concepts to practical applications that impact our daily lives.

* The term **Artificial Intelligence** was first coined at the Dartmouth Conference in 1956, which is considered the founding event of AI as a field.
* Early AI focused on symbolic approaches where human knowledge was encoded as rules and facts.
* **Neural networks**, inspired by the human brain, were proposed in the 1940s but gained significant traction only decades later.
* AI research has experienced cycles of optimism ("AI summers") and disappointment ("AI winters") when progress didn't meet expectations.

## Key AI Milestones

* 1950: Alan Turing proposes the **Turing Test** as a measure of machine intelligence.
* 1957: Frank Rosenblatt invents the **Perceptron**, the first implementation of a neural network algorithm.
* 1997: IBM's Deep Blue defeats chess champion Garry Kasparov.
* 2012: Deep learning breakthrough when AlexNet wins the ImageNet competition.
* 2014-2023: Rapid advances in deep learning, with models like GPT and DALL-E demonstrating impressive capabilities in language and image generation.

# The Perceptron: The Building Block of Neural Networks

A perceptron is the fundamental building block of neural networks, designed to mimic how a single neuron works in the human brain. Created by Frank Rosenblatt in 1957, this computational model takes multiple inputs, processes them, and produces a single output. The perceptron represents the simplest form of a neural network capable of learning from examples.

* A **perceptron** is a mathematical model of a biological neuron that takes multiple inputs and produces a single output.
* Each input has an associated **weight** that represents its importance to the decision-making process.
* The perceptron applies a mathematical function (called an **activation function**) to determine whether to "fire" (output a 1) or not (output a 0).
* Perceptrons can learn by adjusting their weights based on training examples.

## Components of a Perceptron

* **Inputs**: Values that the perceptron receives (x₁, x₂, ..., xₙ)
* **Weights**: Values that determine the importance of each input (w₁, w₂, ..., wₙ)
* **Bias**: An additional parameter that allows the perceptron to fit the data better
* **Summation**: The weighted sum of all inputs plus the bias
* **Activation Function**: Determines the output based on the summation

In [None]:
# @title
%%html
<svg viewBox="0 0 600 300" xmlns="http://www.w3.org/2000/svg">
  <!-- Inputs -->
  <circle cx="100" cy="80" r="20" fill="#6ab7ff" stroke="#005cbf" stroke-width="2"/>
  <circle cx="100" cy="150" r="20" fill="#6ab7ff" stroke="#005cbf" stroke-width="2"/>
  <circle cx="100" cy="220" r="20" fill="#6ab7ff" stroke="#005cbf" stroke-width="2"/>

  <!-- Input Labels -->
  <text x="100" y="85" text-anchor="middle" font-family="Arial" font-size="16">x₁</text>
  <text x="100" y="155" text-anchor="middle" font-family="Arial" font-size="16">x₂</text>
  <text x="100" y="225" text-anchor="middle" font-family="Arial" font-size="16">x₃</text>

  <!-- Weights Labels -->
  <text x="180" y="60" text-anchor="middle" font-family="Arial" font-size="14">w₁</text>
  <text x="180" y="130" text-anchor="middle" font-family="Arial" font-size="14">w₂</text>
  <text x="180" y="200" text-anchor="middle" font-family="Arial" font-size="14">w₃</text>

  <!-- Connection Lines -->
  <line x1="120" y1="80" x2="280" y2="150" stroke="#005cbf" stroke-width="2"/>
  <line x1="120" y1="150" x2="280" y2="150" stroke="#005cbf" stroke-width="2"/>
  <line x1="120" y1="220" x2="280" y2="150" stroke="#005cbf" stroke-width="2"/>

  <!-- Bias -->
  <circle cx="200" cy="250" r="20" fill="#ffd166" stroke="#ca8a04" stroke-width="2"/>
  <text x="200" y="255" text-anchor="middle" font-family="Arial" font-size="16">Bias</text>
  <line x1="220" y1="250" x2="280" y2="150" stroke="#ca8a04" stroke-width="2"/>

  <!-- Summation -->
  <circle cx="300" cy="150" r="30" fill="#ff7a96" stroke="#be123c" stroke-width="2"/>
  <text x="300" y="155" text-anchor="middle" font-family="Arial" font-size="16">Σ</text>

  <!-- Activation Function -->
  <rect x="360" y="120" width="80" height="60" rx="10" fill="#86efac" stroke="#16a34a" stroke-width="2"/>
  <text x="400" y="155" text-anchor="middle" font-family="Arial" font-size="14">Activation</text>

  <!-- Output -->
  <circle cx="500" cy="150" r="20" fill="#d8b4fe" stroke="#7e22ce" stroke-width="2"/>
  <text x="500" y="155" text-anchor="middle" font-family="Arial" font-size="16">Output</text>

  <!-- Connection Lines -->
  <line x1="330" y1="150" x2="360" y2="150" stroke="#000" stroke-width="2"/>
  <line x1="440" y1="150" x2="480" y2="150" stroke="#000" stroke-width="2"/>

  <!-- Title -->
  <text x="300" y="30" text-anchor="middle" font-family="Arial" font-size="20" font-weight="bold">Perceptron Model</text>
</svg>

# Understanding Inputs, Weights, and Outputs

The perceptron processes information through a series of mathematical steps that transform inputs into an output. Each component plays a specific role in this transformation, and understanding these components is crucial to building a working perceptron. This section explores how inputs, weights, and the activation function work together to make decisions.

* **Inputs (x)** are the values that the perceptron receives, such as features from data (e.g., pixel values in an image or test scores for students).
* **Weights (w)** determine how important each input is to the final decision, with larger weights giving more importance to their associated inputs.
* The **weighted sum** is calculated as: (w₁ × x₁) + (w₂ × x₂) + ... + (wₙ × xₙ) + bias.
* The **activation function** transforms the weighted sum into the final output, typically using a step function for basic perceptrons.

## How a Perceptron Makes Decisions

For a simple binary classification perceptron:

1. Calculate the weighted sum of all inputs: sum = w₁x₁ + w₂x₂ + ... + wₙxₙ + bias
2. Apply the activation function:
   * If sum > 0, output = 1 (positive class)
   * If sum ≤ 0, output = 0 (negative class)

| Example: AND Logic Gate | x₁ | x₂ | Weighted Sum (w₁=1, w₂=1, bias=-1.5) | Output |
|-------------------------|----|----|-------------------------------------|--------|
| Case 1                  | 0  | 0  | 0×1 + 0×1 - 1.5 = -1.5              | 0      |
| Case 2                  | 0  | 1  | 0×1 + 1×1 - 1.5 = -0.5              | 0      |
| Case 3                  | 1  | 0  | 1×1 + 0×1 - 1.5 = -0.5              | 0      |
| Case 4                  | 1  | 1  | 1×1 + 1×1 - 1.5 = 0.5               | 1      |

# Building a Perceptron in Python: Class Structure

Now that we understand how a perceptron works conceptually, let's implement one in Python. We'll use object-oriented programming to create a Perceptron class that will help us predict whether a student will pass a test based on their study hours and previous quiz score. This simple example will make the perceptron's function easy to understand.

* Our Perceptron class will need to store the **weights** and **bias** for our model.
* We'll need methods to **predict** outputs for given inputs and to **train** the perceptron.
* The training will involve **adjusting weights** based on the error between predicted and actual outcomes.
* We'll start with a simple practical example: predicting if a student will pass a test (1) or fail (0).

## Example Training Data: Study Success Predictor

| Student | Study Hours | Previous Quiz (0-10) | Passed Test? |
|---------|------------|----------------------|--------------|
| Alex    | 2          | 6                    | 0 (No)       |
| Bailey  | 8          | 9                    | 1 (Yes)      |
| Casey   | 1          | 4                    | 0 (No)       |
| Dana    | 5          | 8                    | 1 (Yes)      |
| Ellis   | 7          | 7                    | 1 (Yes)      |

## Perceptron Class Implementation

In [None]:
import random

class Perceptron:
    def __init__(self, input_size):
        """Initialize the perceptron with random weights and bias"""
        # Initialize weights with small random values
        self.weights = [random.uniform(-1, 1) for _ in range(input_size)]
        # Initialize bias to 0
        self.bias = 0
        # Set learning rate
        self.learning_rate = 0.1

    def activation_function(self, x):
        """Step activation function"""
        return 1 if x > 0 else 0

    def predict(self, inputs):
        """Calculate the output for given inputs"""
        # Calculate the weighted sum manually
        weighted_sum = self.bias  # Start with bias
        for i in range(len(inputs)):
            weighted_sum += inputs[i] * self.weights[i]

        # Apply activation function
        return self.activation_function(weighted_sum)

This basic structure sets up our perceptron with random weights and provides methods for the activation function and making predictions. In the next slide, we'll implement the training method.

# Training Our Perceptron: The Learning Process

Training a perceptron involves showing it examples and adjusting its weights to improve its predictions. This process, known as supervised learning, requires a dataset with inputs and their correct outputs (labels). The perceptron learns by comparing its predictions with the actual labels and making small adjustments to reduce the error.

* **Training data** consists of input features and their corresponding correct outputs (labels).
* The **learning rate** determines how quickly the perceptron's weights are adjusted during training (smaller values mean slower but more stable learning).
* **Error** is calculated as the difference between the predicted output and the actual output.
* **Weight updates** are proportional to the error and the input values.

## The Perceptron Learning Rule

The perceptron learning algorithm follows these steps:

1. Initialize weights and bias with small random values
2. For each training example:
   * Make a prediction using current weights
   * Calculate the error: error = actual_output - predicted_output
   * Update each weight: weight_i = weight_i + learning_rate * error * input_i
   * Update bias: bias = bias + learning_rate * error
3. Repeat step 2 for multiple epochs (complete passes through the training data)

This algorithm adjusts weights more when errors are larger and in proportion to the input values, gradually improving the perceptron's ability to correctly classify inputs.

# Coding a Simple Training Algorithm

Let's implement the training method for our Perceptron class to predict student test success. The code will adjust the perceptron's weights and bias based on the error between predicted and actual outcomes, helping the perceptron learn from examples of past student performance.

* The `train` method will process one student example at a time, updating weights based on whether the prediction was correct.
* We'll keep the code simple with a manual implementation of the perceptron learning rule.
* Our goal is to adjust weights so that more study hours and higher previous quiz scores correctly predict passing the test.
* We'll use a simple approach where we train multiple times on our examples until the perceptron can correctly classify all students.

## Implementation of Training Methods

In [None]:
def train(self, inputs, target):
    """Train the perceptron on a single example (one student)"""
    # Make prediction with current weights
    prediction = self.predict(inputs)

    # Calculate error (difference between what we expected and what we got)
    error = target - prediction

    # Update weights if there was an error
    if error != 0:  # If prediction was wrong
        # Update each weight
        for i in range(len(self.weights)):
            self.weights[i] = self.weights[i] + self.learning_rate * error * inputs[i]

        # Update bias
        self.bias = self.bias + self.learning_rate * error

        return True  # Weights were updated

    return False  # No update needed

def train_multiple_epochs(self, training_data, epochs=10):
    """Train on multiple examples for multiple epochs"""
    for epoch in range(epochs):
        mistakes = 0

        # Process all training examples (students)
        for inputs, target in training_data:
            # Update weights and count if a mistake was made
            updated = self.train(inputs, target)
            if updated:
                mistakes += 1

        # Print progress
        print(f"Epoch {epoch+1}: {mistakes} mistakes")

        # Stop if perfect prediction is achieved
        if mistakes == 0:
            print("All students correctly classified!")
            break

## add methods to Perceptron
Perceptron.train = train
Perceptron.train_multiple_epochs = train_multiple_epochs


With this training algorithm, our perceptron will learn to predict student success based on study hours and previous quiz scores.

# Testing and Deploying Your Perceptron

After training your perceptron to predict student test success, it's essential to test it on new student data it hasn't seen before. This helps validate that your perceptron has learned general patterns rather than just memorizing specific examples.

* **Testing** involves trying your perceptron on new students to see if it correctly predicts their test outcomes.
* **Deployment** means using your trained perceptron to help future students understand if they're studying enough.
* You can **save your model's weights** in a file so you can use your trained perceptron later without retraining.
* The perceptron creates a **decision boundary** - in our case, a line that separates students who will pass from those who will fail.

## Example: Using Our Student Success Predictor

In [None]:
# Create a perceptron with 2 inputs (study hours and quiz score)
student_predictor = Perceptron(input_size=2)

# Training data: [study_hours, quiz_score], pass_or_fail
training_data = [
    ([2, 6], 0),  # Alex: 2 hours, score 6, failed
    ([8, 9], 1),  # Bailey: 8 hours, score 9, passed
    ([1, 4], 0),  # Casey: 1 hour, score 4, failed
    ([5, 8], 1),  # Dana: 5 hours, score 8, passed
    ([7, 7], 1)   # Ellis: 7 hours, score 7, passed
]

# Train the perceptron
student_predictor.train_multiple_epochs(training_data, epochs=20)

# Test on new students
new_students = [
    ([3, 7], "Finley"),  # 3 hours, score 7
    ([6, 5], "Gale"),    # 6 hours, score 5
    ([2, 9], "Harper")   # 2 hours, score 9
]

print("Predictions for new students:")
for inputs, name in new_students:
    prediction = student_predictor.predict(inputs)
    result = "pass" if prediction == 1 else "fail"
    print(f"{name} (Study: {inputs[0]}h, Quiz: {inputs[1]}/10): Predicted to {result}")

Epoch 1: 3 mistakes
Epoch 2: 3 mistakes
Epoch 3: 1 mistakes
Epoch 4: 0 mistakes
All students correctly classified!
Predictions for new students:
Finley (Study: 3h, Quiz: 7/10): Predicted to pass
Gale (Study: 6h, Quiz: 5/10): Predicted to pass
Harper (Study: 2h, Quiz: 9/10): Predicted to fail


# From Single Neurons to Neural Networks

While our student success predictor perceptron is useful, it can only make simple yes/no predictions based on a straight line. For more complex problems, we need to connect multiple perceptrons together to form neural networks that can learn more complex patterns.

* A **neural network** is a collection of interconnected perceptrons (or neurons) organized in layers.
* The **input layer** receives the data (like study hours, quiz scores, attendance).
* The **hidden layers** help the network learn more complex relationships between inputs and outputs.
* The **output layer** produces the final predictions (like passing the test or getting a high grade).

## Neural Network Architecture

Extending our student example, a neural network could:
* Take more inputs (study hours, quiz scores, attendance, homework completion)
* Make multiple predictions (will pass test, will get A grade, needs tutoring)
* Learn complex patterns (some students do well with less study but perfect attendance)

* Unlike a single perceptron that can only separate data with a straight line, neural networks can create curved and complex decision boundaries.
* Neural networks can solve problems that single perceptrons cannot, like the classic XOR problem (which would be like students who either study a lot OR do well on quizzes passing, but those who do poorly on both OR well on both failing).

In [None]:
# @title
%%html
<svg viewBox="0 0 800 400" xmlns="http://www.w3.org/2000/svg">
  <!-- Title -->
  <text x="400" y="40" text-anchor="middle" font-family="Arial" font-size="24" font-weight="bold">Simple Neural Network</text>

  <!-- Labels for layers -->
  <text x="150" y="80" text-anchor="middle" font-family="Arial" font-size="18">Input Layer</text>
  <text x="400" y="80" text-anchor="middle" font-family="Arial" font-size="18">Hidden Layer</text>
  <text x="650" y="80" text-anchor="middle" font-family="Arial" font-size="18">Output Layer</text>

  <!-- Input Layer Neurons -->
  <circle cx="150" cy="150" r="25" fill="#6ab7ff" stroke="#005cbf" stroke-width="2"/>
  <circle cx="150" cy="250" r="25" fill="#6ab7ff" stroke="#005cbf" stroke-width="2"/>
  <circle cx="150" cy="350" r="25" fill="#6ab7ff" stroke="#005cbf" stroke-width="2"/>

  <!-- Hidden Layer Neurons -->
  <circle cx="400" cy="130" r="25" fill="#ff7a96" stroke="#be123c" stroke-width="2"/>
  <circle cx="400" cy="210" r="25" fill="#ff7a96" stroke="#be123c" stroke-width="2"/>
  <circle cx="400" cy="290" r="25" fill="#ff7a96" stroke="#be123c" stroke-width="2"/>
  <circle cx="400" cy="370" r="25" fill="#ff7a96" stroke="#be123c" stroke-width="2"/>

  <!-- Output Layer Neurons -->
  <circle cx="650" cy="200" r="25" fill="#d8b4fe" stroke="#7e22ce" stroke-width="2"/>
  <circle cx="650" cy="300" r="25" fill="#d8b4fe" stroke="#7e22ce" stroke-width="2"/>

  <!-- Input to Hidden connections -->
  <!-- From input 1 -->
  <line x1="175" y1="150" x2="375" y2="130" stroke="#005cbf" stroke-width="1.5"/>
  <line x1="175" y1="150" x2="375" y2="210" stroke="#005cbf" stroke-width="1.5"/>
  <line x1="175" y1="150" x2="375" y2="290" stroke="#005cbf" stroke-width="1.5"/>
  <line x1="175" y1="150" x2="375" y2="370" stroke="#005cbf" stroke-width="1.5"/>

  <!-- From input 2 -->
  <line x1="175" y1="250" x2="375" y2="130" stroke="#005cbf" stroke-width="1.5"/>
  <line x1="175" y1="250" x2="375" y2="210" stroke="#005cbf" stroke-width="1.5"/>
  <line x1="175" y1="250" x2="375" y2="290" stroke="#005cbf" stroke-width="1.5"/>
  <line x1="175" y1="250" x2="375" y2="370" stroke="#005cbf" stroke-width="1.5"/>

  <!-- From input 3 -->
  <line x1="175" y1="350" x2="375" y2="130" stroke="#005cbf" stroke-width="1.5"/>
  <line x1="175" y1="350" x2="375" y2="210" stroke="#005cbf" stroke-width="1.5"/>
  <line x1="175" y1="350" x2="375" y2="290" stroke="#005cbf" stroke-width="1.5"/>
  <line x1="175" y1="350" x2="375" y2="370" stroke="#005cbf" stroke-width="1.5"/>

  <!-- Hidden to Output connections -->
  <!-- To output 1 -->
  <line x1="425" y1="130" x2="625" y2="200" stroke="#be123c" stroke-width="1.5"/>
  <line x1="425" y1="210" x2="625" y2="200" stroke="#be123c" stroke-width="1.5"/>
  <line x1="425" y1="290" x2="625" y2="200" stroke="#be123c" stroke-width="1.5"/>
  <line x1="425" y1="370" x2="625" y2="200" stroke="#be123c" stroke-width="1.5"/>

  <!-- To output 2 -->
  <line x1="425" y1="130" x2="625" y2="300" stroke="#be123c" stroke-width="1.5"/>
  <line x1="425" y1="210" x2="625" y2="300" stroke="#be123c" stroke-width="1.5"/>
  <line x1="425" y1="290" x2="625" y2="300" stroke="#be123c" stroke-width="1.5"/>
  <line x1="425" y1="370" x2="625" y2="300" stroke="#be123c" stroke-width="1.5"/>

  <!-- Input Labels -->
  <text x="150" y="155" text-anchor="middle" font-family="Arial" font-size="16">x₁</text>
  <text x="150" y="255" text-anchor="middle" font-family="Arial" font-size="16">x₂</text>
  <text x="150" y="355" text-anchor="middle" font-family="Arial" font-size="16">x₃</text>

  <!-- Output Labels -->
  <text x="650" y="205" text-anchor="middle" font-family="Arial" font-size="16">y₁</text>
  <text x="650" y="305" text-anchor="middle" font-family="Arial" font-size="16">y₂</text>

  <!-- Example Labels -->
  <text x="75" y="150" text-anchor="end" font-family="Arial" font-size="14">Study Hours</text>
  <text x="75" y="250" text-anchor="end" font-family="Arial" font-size="14">Quiz Score</text>
  <text x="75" y="350" text-anchor="end" font-family="Arial" font-size="14">Attendance</text>

  <text x="725" y="200" text-anchor="start" font-family="Arial" font-size="14">Pass Test</text>
  <text x="725" y="300" text-anchor="start" font-family="Arial" font-size="14">High Grade</text>
</svg>

# Convolutional Neural Networks (CNNs): Image Recognition

When you take a selfie and your phone automatically focuses on your face or when a self-driving car identifies a stop sign, you're seeing Convolutional Neural Networks (CNNs) in action. These specialized neural networks revolutionized computer vision by mimicking how our own visual system processes information.

CNNs build on our simple perceptron foundation but are organized in a specialized way that makes them exceptionally good at understanding images. Let's explore how they work and why they're so important in today's technology.

### Layers
CNNs process images through a series of specialized layers that each perform different functions:
  * **Convolutional layers** act like pattern detectors, scanning small patches of images to find specific features
  * **Pooling layers** reduce image size while preserving important information, making processing more efficient
  * **Fully connected layers** (built from perceptrons) make the final classification decision

### Feature Recognition Heirarchy
CNNs learn increasingly complex features as data moves through the network:
  * Early layers detect **basic features** like edges, colors, and simple textures
  * Middle layers combine these to recognize **composite patterns** like eyes, wheels, or doors
  * Deep layers identify **complete objects** by combining all the evidence from earlier layers

## Training CNNs: From Raw Images to Smart Recognition

Training a CNN involves showing it thousands of labeled images and letting it learn through trial and error. This process works as follows:

1. **Data Collection and Preparation**: Gather thousands or millions of images labeled with their content ("cat," "car," "face")
2. **Learning Process**:
   * The network makes predictions based on current weights
   * Errors are calculated by comparing predictions to correct labels
   * Weights are adjusted gradually to reduce errors
   * This process repeats thousands of times until accuracy is high
3. **Testing and Validation**: The trained model is tested on new images it hasn't seen before

## CNNs in Your Daily Life

| Technology | How CNNs Make It Work | Real-World Impact |
|------------|------------------------|-------------------|
| **Smartphone Cameras** | Automatically detect faces, improve lighting, apply effects | Better photos without photography skills |
| **Medical Diagnostics** | Scan X-rays and MRIs to detect abnormalities | Earlier cancer detection with 90%+ accuracy |
| **Autonomous Vehicles** | Identify road elements, pedestrians, other vehicles | Safer transportation with fewer accidents |
| **Social Media** | Filter inappropriate content, suggest photo tags | Safer online environments, easier photo sharing |
| **Augmented Reality** | Track facial movements for filters, recognize objects | Interactive games, virtual try-on for glasses |

CNNs demonstrate how our simple perceptron model can evolve into sophisticated systems that perform tasks once thought impossible for computers. The fundamental principles remain the same—weighted inputs, summation, and activation—just organized in a more specialized, powerful architecture.

### Example: Does This Image Contain a Cat?

In [None]:
# @title
%%html
<svg xmlns="http://www.w3.org/2000/svg" width="900" height="240" font-family="Arial, sans-serif">
  <defs>
    <marker id="arrow" markerWidth="8" markerHeight="8" refX="6" refY="3" orient="auto">
      <path d="M0,0 L6,3 L0,6 Z" fill="#333"/>
    </marker>
  </defs>

  <!-- Input Image -->
  <rect x="20" y="80" width="140" height="80" fill="#eeeeee" stroke="#333"/>
  <text x="90" y="105" font-size="14" text-anchor="middle"><tspan font-weight="bold">Input Image</tspan></text>
  <text x="90" y="125" font-size="12" text-anchor="middle">e.g. a photograph of a cat</text>

  <!-- Convolution + ReLU -->
  <rect x="180" y="50" width="160" height="140" fill="#bbdefb" stroke="#0288d1"/>
  <text x="260" y="85" font-size="14" text-anchor="middle"><tspan font-weight="bold">Convolution</tspan> + <tspan font-weight="bold">ReLU</tspan></text>
  <text x="260" y="105" font-size="12" text-anchor="middle">Filters detect edges and textures</text>

  <!-- Pooling -->
  <rect x="370" y="90" width="100" height="60" fill="#c8e6c9" stroke="#2e7d32"/>
  <text x="420" y="115" font-size="14" text-anchor="middle"><tspan font-weight="bold">Pooling</tspan></text>
  <text x="420" y="135" font-size="12" text-anchor="middle">Down-sample feature maps</text>

  <!-- Flatten -->
  <rect x="500" y="100" width="100" height="40" fill="#ffe082" stroke="#ef6c00"/>
  <text x="550" y="125" font-size="14" text-anchor="middle"><tspan font-weight="bold">Flatten</tspan></text>
  <text x="550" y="145" font-size="12" text-anchor="middle">To 1D feature vector</text>

  <!-- Dense Layer -->
  <rect x="630" y="50" width="140" height="140" fill="#f8bbd0" stroke="#c2185b"/>
  <text x="700" y="85" font-size="14" text-anchor="middle"><tspan font-weight="bold">Dense</tspan> + <tspan font-weight="bold">ReLU</tspan></text>
  <text x="700" y="105" font-size="12" text-anchor="middle">Combine features</text>

  <!-- Output + Sigmoid -->
  <rect x="800" y="80" width="100" height="80" fill="#ffcc80" stroke="#e65100"/>
  <text x="850" y="105" font-size="14" text-anchor="middle"><tspan font-weight="bold">Output</tspan> + <tspan font-weight="bold">Sigmoid</tspan></text>
  <text x="850" y="125" font-size="12" text-anchor="middle">Probability “Cat”</text>

  <!-- Arrows -->
  <path d="M160,120 L180,120" stroke="#333" stroke-width="1.5" marker-end="url(#arrow)"/>
  <path d="M340,120 L370,120" stroke="#333" stroke-width="1.5" marker-end="url(#arrow)"/>
  <path d="M470,120 L500,120" stroke="#333" stroke-width="1.5" marker-end="url(#arrow)"/>
  <path d="M590,120 L630,120" stroke="#333" stroke-width="1.5" marker-end="url(#arrow)"/>
  <path d="M770,120 L800,120" stroke="#333" stroke-width="1.5" marker-end="url(#arrow)"/>
</svg>


Let's suppose we want to determine whether a given image contains a cat. Our CNN does the following:
1. The photo is turned into a grid of numbers (pixels), where each number shows how bright or dark that spot is.  

2. In the **convolution** step, small sliding windows called **filters** move over the grid and pick up simple features like edges or patches of fur. A **ReLU** step then zeroes out any negative values, so only the important features remain.  

3.In the **pooling** step, the grid of features is shrunk by taking the biggest number in each small block. This makes the network focus on the strongest signals and ignore tiny shifts.  

4. The smaller grids are laid out end-to-end into one long list of numbers, called a **feature vector**, so they can be used by normal neural-network layers. This is called **flattening**.  

5. A fully connected layer mixes all those features together, learning which combinations (like whiskers + pointy ears) mean “cat.” Another **ReLU** zeroes out negatives again.  

6.A final neuron scores how much the image looks like a cat, then the **sigmoid** function turns that score into a number between 0 and 1—a probability that the picture is of a cat.

# Transformers: How AI Understands Language

When you ask a digital assistant a question, get writing help from an AI, or use automatic translation online, you're interacting with transformer-based AI. Transformers represent one of the most significant breakthroughs in artificial intelligence, enabling machines to process and generate human language in remarkably human-like ways.

Transformers evolved from our simple perceptron beginnings but with revolutionary architectural innovations that make them particularly adept at understanding context and relationships in language. Let's explore how these powerful systems work and why they've changed our technological landscape.

Transformers process language using several innovative mechanisms:
  * **Attention mechanism** allows the model to focus on relevant words while ignoring irrelevant ones
  * **Parallel processing** enables analyzing all words simultaneously rather than one at a time
  * **Encoder-decoder structure** separates understanding input (encoding) from generating output (decoding)
  * **Positional encoding** helps the model understand word order without sequential processing

Transformers understand language by:
  * Converting words to **numerical representations** (embeddings) capturing meaning
  * Computing **relationships between words** through attention scores
  * Creating **contextual understanding** based on entire sentences
  * Generating **probability distributions** over potential next words

## From Training to ChatGPT: How Large Language Models Work

Modern Large Language Models (LLMs) like ChatGPT and Claude are transformer-based systems trained through several stages:

1. **Pretraining**: The model learns language patterns by processing trillions of words from books, websites, and articles
   * It develops general understanding of grammar, facts, and reasoning
   * This phase requires enormous computing resources (thousands of specialized GPUs)
   
2. **Fine-tuning**: The model is specialized for particular abilities
   * Trained on carefully selected high-quality data
   * Learns to follow instructions and generate helpful responses
   
3. **Alignment**: The model is refined to be helpful, harmless, and honest
   * Human feedback helps the model understand good vs. poor responses
   * Safety mechanisms are integrated to prevent harmful outputs

## Transformers in Real-World Applications

| Application | How It Works | Impact on Society |
|-------------|--------------|-------------------|
| **AI Assistants** | Process questions and generate helpful responses | 24/7 information access, help with tasks |
| **Language Translation** | Convert text between hundreds of languages | Breaking down language barriers globally |
| **Content Creation** | Generate articles, stories, and creative writing | Assisting writers and content creators |
| **Code Generation** | Create programming code from descriptions | Making programming more accessible |
| **Educational Tools** | Create personalized learning materials | Adaptive tutoring for different learning styles |

The line from our simple student-predicting perceptron to these advanced language models is direct—they still use weights, inputs, and activation functions, just arranged in more sophisticated architectures and trained on vastly more data.

### Example: Predicting the Next Word (Moby Dick)

In [None]:
# @title
%%html
<svg xmlns="http://www.w3.org/2000/svg" width="900" height="260" font-family="Arial, sans-serif">
  <defs>
    <marker id="arrow" markerWidth="8" markerHeight="8" refX="6" refY="3" orient="auto">
      <path d="M0,0 L6,3 L0,6 Z" fill="#333"/>
    </marker>
  </defs>
  <!-- Input -->
  <rect x="20" y="70" width="140" height="60" fill="#ffd54f" stroke="#333"/>
  <text x="90" y="95" font-size="14" text-anchor="middle"><tspan font-weight="bold">Input Tokens</tspan></text>
  <text x="90" y="115" font-size="12" text-anchor="middle">'Call', 'me', 'Ishmael'</text>

  <!-- Embedding + PosEnc -->
  <rect x="180" y="50" width="200" height="100" fill="#81c784" stroke="#2e7d32"/>
  <text x="280" y="85" font-size="14" text-anchor="middle">
    <tspan font-weight="bold">Embedding</tspan> +<tspan x="280" dy="1.4em" font-weight="bold">Positional Encoding</tspan>
  </text>

  <!-- Encoder Stack -->
  <g>
    <rect x="420" y="40" width="120" height="140" fill="#a5d6a7" stroke="#2e7d32"/>
    <text x="480" y="75" font-size="14" text-anchor="middle"><tspan font-weight="bold">Encoder</tspan></text>
    <text x="480" y="95" font-size="12" text-anchor="middle">(×N layers)</text>
    <!-- Sublayers -->
    <rect x="435" y="110" width="90" height="20" fill="#c8e6c9" stroke="#2e7d32"/>
    <text x="480" y="125" font-size="10" text-anchor="middle">MHA → Add &amp; Norm</text>
    <rect x="435" y="140" width="90" height="20" fill="#c8e6c9" stroke="#2e7d32"/>
    <text x="480" y="155" font-size="10" text-anchor="middle">FFN → Add &amp; Norm</text>
  </g>

  <!-- Decoder Stack -->
  <g>
    <rect x="580" y="40" width="140" height="160" fill="#90caf9" stroke="#1565c0"/>
    <text x="650" y="75" font-size="14" text-anchor="middle"><tspan font-weight="bold">Decoder</tspan></text>
    <text x="650" y="95" font-size="12" text-anchor="middle">(×N layers)</text>
    <!-- Sublayers -->
    <rect x="595" y="120" width="110" height="18" fill="#bbdefb" stroke="#1565c0"/>
    <text x="650" y="132" font-size="10" text-anchor="middle">Masked MHA</text>
    <rect x="595" y="145" width="110" height="18" fill="#bbdefb" stroke="#1565c0"/>
    <text x="650" y="157" font-size="10" text-anchor="middle">Cross-Attention</text>
    <rect x="595" y="170" width="110" height="18" fill="#bbdefb" stroke="#1565c0"/>
    <text x="650" y="182" font-size="10" text-anchor="middle">FFN → Add &amp; Norm</text>
  </g>

  <!-- Output -->
  <rect x="760" y="70" width="120" height="60" fill="#ffcc80" stroke="#e65100"/>
  <text x="820" y="95" font-size="14" text-anchor="middle"><tspan font-weight="bold">Output Token</tspan></text>
  <text x="820" y="115" font-size="12" text-anchor="middle">'Some'</text>

  <!-- Arrows -->
  <path d="M160,100 L180,100" stroke="#333" stroke-width="1.5" marker-end="url(#arrow)"/>
  <path d="M380,100 L420,100" stroke="#333" stroke-width="1.5" marker-end="url(#arrow)"/>
  <path d="M540,100 L580,100" stroke="#333" stroke-width="1.5" marker-end="url(#arrow)"/>
  <path d="M720,100 L760,100" stroke="#333" stroke-width="1.5" marker-end="url(#arrow)"/>

  <!-- Attention arrows -->
  <path d="M510,80 C540,20 590,20 620,80" stroke="#f06292" stroke-width="1.2" fill="none" marker-end="url(#arrow)"/>
  <text x="565" y="35" font-size="10" fill="#f06292" text-anchor="middle"><tspan font-weight="bold">Self-Attention</tspan></text>
  <path d="M510,140 C540,200 590,200 620,140" stroke="#4db6ac" stroke-width="1.2" fill="none" marker-end="url(#arrow)"/>
  <text x="565" y="195" font-size="10" fill="#4db6ac" text-anchor="middle"><tspan font-weight="bold">Cross-Attention</tspan></text>
</svg>


To see how our transformer predicts the next word in a familiar text—“Call me Ishmael”—we follow these steps. The model’s **sample task** is to suggest what comes after those three words by processing them through its layers and attention mechanisms.

1.The input sentence is split into **tokens**—individual words or word pieces—so “Call”, “me”, and “Ishmael” each become a separate token.

2.Each token is mapped to a list of numbers called an **embedding**, capturing its meaning. A **positional encoding** (also a list of numbers) is added so the model knows the order of the tokens.

3. In the **encoder**, every token looks at all the others via **self-attention**, learning which words relate (for example, “Call” attends to “Ishmael”). These relationships are combined with a small neural network to produce an encoded representation.

4. The **decoder** takes the encoder’s outputs and, using **masked self-attention**, prevents peeking at future tokens. It then applies **cross-attention** to focus on relevant encoder outputs and runs another small neural network to produce a decoder output vector.

5. That decoder vector is multiplied by a matrix to produce one number per word in the vocabulary. A **softmax** function turns these numbers into probabilities, indicating how likely each word is to follow.

6. The model selects the highest-probability word—“Some”—appends it to the input, and repeats the decoding steps until it reaches an end-of-sequence marker.

By these steps, our transformer successfully predicts **“Some”** as the next word in _Moby Dick_, demonstrating how attention and layered processing enable fluid text generation.

# From Perceptrons to ChatGPT: The Evolution of AI

The simple perceptron we built to predict student test success might seem worlds apart from sophisticated AI systems that can write essays, create artwork, or help doctors diagnose diseases. Yet all these advanced systems grew directly from the perceptron's fundamental ideas. Let's explore this remarkable journey and understand how today's AI revolution connects to the basics we've learned.

Modern AI systems evolved through several key innovations while maintaining their perceptron roots:

* The dramatic **scale expansion** from hundreds to billions of artificial neurons has enabled much more complex pattern recognition.
* Careful **architectural specialization** organizes neurons in layers optimized for specific tasks like image processing.
* Increasingly sophisticated **advanced training methods** provide better ways to adjust weights and biases.
* Exponential **compute power growth** from simple calculations to massive data centers has made training possible.

Despite their complexity, modern systems maintain the perceptron's essential principles:

* All neural systems still use **weighted inputs** where information receives varying levels of importance.
* The fundamental **summation function** that adds weighted inputs remains at the core of even the most advanced AI.
* Non-linear **activation functions** still determine whether neurons "fire" in response to input patterns.
* The process of **learning from errors** by adjusting weights based on mistakes continues to drive improvement.

## The Growth of AI Systems Through History

| Era | Technology | Scale | Real-World Example |
|-----|------------|-------|-------------------|
| **1950s-60s** | Single Perceptron | < 100 parameters | Sorting simple patterns |
| **1980s-90s** | Multilayer Networks | 1K-100K parameters | ATM check reading |
| **2000s-2010s** | Deep Neural Networks | 1M-100M parameters | Siri voice recognition |
| **2018-2020** | Early Transformers | 100M-10B parameters | Early Google Translate |
| **2021-Present** | Modern LLMs | 10B-1T+ parameters | ChatGPT, Claude |

## AI in Your Future Career

The AI systems you'll encounter (or perhaps help build) in your future will impact virtually every field:

* In healthcare, advanced AI systems help diagnose diseases from medical images or predict patient outcomes.
* Throughout education, personalized learning assistants adapt to individual student needs.
* Creative industries are being transformed by tools that help generate artwork, music, and writing.
* The field of engineering now relies on AI-powered design tools that suggest improvements to complex problems.
* Environmental scientists use AI models to predict climate patterns with greater precision.

Understanding the perceptron gives you insight into the foundation of all these advanced systems. The weights, biases, and learning algorithms you've studied are the building blocks of the AI revolution that will shape your future careers.

# Ethical Implications of AI: The Four Principles Framework

As artificial intelligence becomes increasingly integrated into our society, understanding its ethical implications is just as important as knowing how it works technically. The Four Principles framework provides a useful lens through which to consider AI ethics.

* **Beneficence**: AI systems should be designed to benefit humanity and promote human well-being.
  * AI in healthcare to improve diagnoses and treatment plans
  * Educational AI to personalize learning for students
  * AI for environmental monitoring and climate solutions

* **Non-maleficence**: AI should avoid causing harm to individuals or society.
  * Preventing algorithmic bias in hiring, lending, and criminal justice
  * Ensuring AI-powered weapons have appropriate human oversight
  * Testing autonomous systems thoroughly before deployment

* **Autonomy**: Humans should maintain control and agency in AI systems.
  * Transparency in how AI makes decisions
  * Ability for humans to override AI decisions when necessary
  * Informed consent when personal data is used to train AI

* **Justice**: The benefits and risks of AI should be distributed fairly.
  * Equal access to AI technologies across socioeconomic boundaries
  * Preventing monopolization of powerful AI by a few corporations
  * Ensuring diverse representation in training data and development teams

## Case Study: Student Success Predictor Ethics

| Principle | Consideration | Example |
|-----------|--------------|---------|
| **Beneficence** | How does it help students? | Identifies those who need additional support early |
| **Non-maleficence** | Could it harm some students? | May discourage students predicted to fail |
| **Autonomy** | Do students have control? | Students should know how predictions are made |
| **Justice** | Is it fair to everyone? | Must avoid favoring certain demographic groups |

# Conclusion: What We've Learned

In this introduction to artificial intelligence, we've covered a remarkable journey from basic concepts to advanced applications. Let's summarize the key concepts we've explored:

## Foundations of AI
* We defined **Artificial Intelligence** as computer systems that can perform tasks typically requiring human intelligence
* We traced AI's history from its formal beginning at the Dartmouth Conference in 1956 to modern breakthroughs
* We learned the difference between narrow AI (task-specific) and general AI (hypothetical human-level intelligence)

## The Perceptron
* We understood the **perceptron** as the fundamental building block of neural networks
* We explored its components: inputs, weights, bias, summation, and activation function
* We implemented a perceptron in Python to predict student test success
* We learned how to train perceptrons by adjusting weights based on prediction errors

## From Simple to Complex Neural Networks
* We discovered how connecting multiple perceptrons creates neural networks with hidden layers
* We saw how neural networks overcome limitations of single perceptrons, such as:
  * Creating non-linear decision boundaries
  * Solving complex problems like XOR
  * Making multiple predictions simultaneously

## Advanced Neural Network Architectures
* We explored **Convolutional Neural Networks (CNNs)** for image recognition
* We examined **Transformers** for natural language processing
* We compared the capabilities of simple perceptrons to these advanced architectures

## Practical Applications
* **Perceptrons**: Simple classification tasks like our student success predictor
* **CNNs**: Image recognition, medical diagnosis, self-driving cars
* **Transformers**: Language translation, content generation, question answering

As you continue your journey in computer science, remember that all these complex AI systems build upon the simple principles we've learned. The perceptron's elegance lies in its simplicity, yet it contains the seed of today's most revolutionary AI technologies.

## Practice Your Python - Object Quest (Perceptrons!)
You can run the following cell to practice building a perceptron.

In [None]:
!wget https://github.com/brendanpshea/computing_concepts_python/raw/main/object_quest/object_quest.py -q -nc
from object_quest import QuestSystem
QuestSystem("https://github.com/brendanpshea/computing_concepts_python/raw/main/object_quest/quests_perceptron.json")

## Review With Quizlet

In [None]:
%%html
<iframe src="https://quizlet.com/1043451203/learn/embed?i=psvlh&x=1jj1" height="700" width="100%" style="border:0"></iframe>

## Glossary

| Term | Definition |
|------|------------|
| Artificial Intelligence (AI) | The field of computer science focused on creating machines that can perform tasks that would normally require human intelligence, such as learning, problem-solving, and pattern recognition. |
| Machine Learning | A subset of AI focused on systems that improve their performance through experience without being explicitly programmed for each specific task. |
| Deep Learning | An advanced form of machine learning using neural networks with multiple layers to model complex patterns in data. |
| Narrow AI | AI systems designed for specific tasks like voice recognition or chess playing, without general intelligence across multiple domains. |
| General AI | A hypothetical form of AI with human-level intelligence across many domains, capable of understanding, learning, and applying knowledge across various tasks. |
| Perceptron | The fundamental building block of neural networks, a mathematical model of a biological neuron that takes multiple inputs and produces a single output based on those inputs. |
| Neural Network | A collection of interconnected perceptrons (or neurons) organized in layers that can learn complex patterns from data. |
| Weights | Values in a neural network that determine the importance of each input to the decision-making process, adjusted during training to improve performance. |
| Bias | An additional parameter in a neural network that allows the model to fit data better by shifting the activation function. |
| Activation Function | A mathematical function in neural networks that determines whether a neuron should "fire" (output a signal) based on the weighted sum of its inputs. |
| Training | The process of adjusting the weights and biases in a neural network to improve its predictions, typically using examples with known outcomes. |
| Learning Rate | A hyperparameter that determines how quickly a neural network's weights are adjusted during training, with smaller values producing more stable but slower learning. |
| Supervised Learning | A type of machine learning where the algorithm is trained on labeled examples, learning to map inputs to correct outputs. |
| Epochs | Complete passes through the entire training dataset during the neural network training process. |
| Convolutional Neural Network (CNN) | A specialized neural network architecture designed for image processing that uses convolutional layers to detect patterns in small regions of images. |
| Convolutional Layer | A component of CNNs that applies filters to detect features in images by scanning small patches for specific patterns. |
| Pooling Layer | A component of CNNs that reduces image size while preserving important information, making processing more efficient. |
| Transformer | A neural network architecture designed for sequential data processing (particularly language) that uses attention mechanisms to focus on relevant parts of the input. |
| Attention Mechanism | A component in transformer models that allows the network to focus on relevant parts of the input while ignoring irrelevant information. |
| Large Language Model (LLM) | Advanced transformer-based AI systems trained on vast amounts of text data to understand and generate human language. |
| Pretraining | The initial training phase for large language models where they learn language patterns from trillions of words. |
| Fine-tuning | The process of specializing a pretrained model for particular abilities by training it on carefully selected data. |
| Beneficence | An ethical principle stating that AI systems should be designed to benefit humanity and promote human well-being. |
| Non-maleficence | An ethical principle stating that AI should avoid causing harm to individuals or society. |
| Decision Boundary | The line or surface that separates different classes in a classification problem, determined by a model's weights and biases. |