#### Chapter 8, Q4 (a) Do Exercise 40.8 in MacKay’s book (MacKay 2003). It is cited here as follows: Estimate in bits the total sensory experience that you have had in your life – visual information, auditory information, etc. Estimate how much information you have memorized. Estimate the information content of the works of Shakespeare. Compare these with the capacity of your brain assuming you have 10<sup>11</sup> neurons each making 1000 synaptic connections and that the (information) capacity result for one neuron (two bits per connection) applies. Is your brain full yet?

##### In order to estimate in bits the total sensory experience that you have had in my life, one would have to consider the amount of visual information perceived, auditory information heard, tactile sensations felt, etc. For now we will consider only Visual and Auditory information for the sake of simplicity. 
<li> Visual Information: The human eye can perceive a wide range of visual stimuli, including colors, shapes, textures, and movement. Estimates suggest that the human eye can distinguish around 10 million colors and perceive details down to about 0.1 degrees of visual angle. If we consider the resolution of the human eye, the amount of visual information perceived over a lifetime would be in the order of terabytes. <br> 

Let's assume an average human lifespan of 80 years, which is approximately 2.52 billion seconds. Now, the human visual system can perceive a wide range of visual stimuli, including colors, shapes, textures, and movement. 

Total Visual Information = Resolution of Human Eye * Duration of Lifetime

Let's assume each pixel requires 3 bytes to represent color information (assuming 24-bit color depth, with 8 bits per channel for red, green, and blue). This is a common representation for standard RGB color. So, the calculation becomes:

Total Visual Information in bytes = Resolution of Human Eye * Size per Pixel

Given that the resolution of both eyes is estimated at 576 million pixels and each pixel requires 3 bytes, the calculation becomes:

Total Visual Information in bytes ≈ 576 million pixels * 3 bytes/pixel

Total Visual Information in bytes ≈ **1.728 billion bytes**


<li> Auditory Information: The human ear can detect sounds across a broad range of frequencies and amplitudes. The dynamic range of human hearing is estimated to be around 120 decibels, and the auditory system can perceive subtle differences in pitch, timbre, and spatial location. Estimating the amount of auditory information processed would again be estimated in the order of terabytes. Let's make a rough estimation using a bitrate of 1 Mbps (which is conservative compared to high-quality audio formats): <br>

Total Auditory Information = Duration of Lifetime * Bitrate
= 2.52 billion seconds * 1 Mbps

This would give us a total auditory information estimate of approximately 2.52 exabytes (1 exabyte = 10<sup>18</sup> bytes).

To convert this to terabytes, we can divide by 1000 (since 1 exabyte = 1000 petabytes, and 1 petabyte = 1000 terabytes):

Total Auditory Information in Terabytes ≈ 2.52 exabytes / 1000 ≈ **2.52 million terabytes**

<li> Now, given capacity is: Number of Neurons: 10^11 neurons,
Number of Synaptic Connections per Neuron: 1000,
Information Capacity per Synaptic Connection: 2 bits (as assumed)

So, Total information capacity of the brain = Number of neurons * Number of synaptic connections per neuron * Information capacity per synaptic connection

= 10^11 neurons * 1000 connections/neuron * 2 bits/connection

= 2 * 10^14 bits

<li> Therefore, Total sensory experience ≈ 2.02 * 10^25 bits (for auditory)

Total sensory experience ≈ 1.382 * 10^10 bits (for visual)

Estimated Memorized Information ≈ 8 * 10^14 bits

It's evident that the estimated sensory experience, both auditory and visual combined, far exceeds the theoretical capacity of the brain, while the estimated memorized information is much smaller in comparison. 

<li> However, this does not necessarily mean the brain is full. Because the brain responds to change in stimuli. For example, if it sees the same information, it recognizes the visual or audio from previously stored information, instead of storing it again. Even if the stimuli is new, we are often only able to retain specific details of it and not a high fidelity recall. <br>

##### To estimate the information content of the works of Shakespeare, we need to consider the total number of words in his collected works and then convert this into bits based on some encoding scheme.

<li> Estimates suggest that Shakespeare's complete works contain around 900,000 to 1,000,000 words. Let's assume a simple encoding scheme where each word is represented by a unique code. For simplicity, we'll assume that each word requires an average of 8 bytes (64 bits) to represent, considering the variability in word length and any additional metadata.

Total information content of Shakespeare's works ≈ Number of Words * Size of Encoding per Word

= 900,000 words * 64 bits/word

= 57,600,000 bits

Comparing this with the theoretical capacity of the brain:

Total information capacity of the brain = 2 * 10^14 bits

Information content of Shakespeare's works = 57,600,000 bits

It's evident that the information content of Shakespeare's works (57,600,000 bits) is significantly smaller than the theoretical capacity of the brain (2 * 10^14 bits). Therefore, the information content of Shakespeare's works represents only a tiny fraction of the brain's theoretical capacity, indicating that the brain is far from being "full" in terms of its capacity to store information based on this comparison.

#### Chapter 8, Q4 (b) Expand Algorithm 8 to work with more than one binary classification.

In [6]:
import numpy as np
import math
def calculate_mec(data, labels):
    # Determine the number of unique classes
    num_classes = len(set(labels))
    # Initialize thresholds for each class
    thresholds = [0] * num_classes
    # Combine data sums and labels into a list of tuples for sorting
    data_sums_with_labels = [(sum(datum), label) for datum, label in zip(data, labels)]
    # Sort based on data sums
    data_sums_with_labels.sort(key=lambda x: x[0])
    
    # Update thresholds based on class transitions
    prev_label = data_sums_with_labels[0][1]
    for _, label in data_sums_with_labels:
        if label != prev_label:
            thresholds[label] += 1
            prev_label = label
    
    # Calculate the minimum number of thresholds
    total_thresholds = sum(thresholds)
    min_thresholds = math.log2(total_thresholds + 1)
    # Compute the Minimum Encoding Complexity (MEC)
    feature_dimension = len(data[0])
    mec = (min_thresholds * (feature_dimension + 1)) + (min_thresholds + 1)
    
    return mec

#Test data
data = np.array([[0, 0, 1], [1, 0, 1], [0, 1, 0], [1, 1, 1], [0, 0, 0], [1, 0, 0], [0, 1, 1]])
labels = np.array([0, 1, 0, 1, 2, 2, 2])
print("Memory Equivalent Capacity:", calculate_mec(data, labels))

Memory Equivalent Capacity: 13.92481250360578


#### Chapter 8, Q4(c) Expand Algorithm 8 to work with regression.

In [5]:
import numpy as np
from math import log2

def calculate_mec_regression(data, targets, epsilon=0.1):
    data = np.array(data)  # Convert data to a NumPy array
    targets = np.array(targets)  # Ensure targets is also a NumPy array for consistency
    
    # Sort targets and data based on the targets
    sorted_indices = np.argsort(targets)
    sorted_data = data[sorted_indices]
    sorted_targets = targets[sorted_indices]

    # Initialize segment counting
    num_segments = 0
    feature_dimension = sorted_data.shape[1]
    
    # Compare consecutive targets to count the number of segments
    last_target = sorted_targets[0]
    for target in sorted_targets:
        if abs(target - last_target) > epsilon:
            num_segments += 1
            last_target = target

    # Calculate the Minimum Encoding Complexity (MEC) based on the number of segments
    total_thresholds = num_segments
    min_thresholds = log2(total_thresholds + 1)
    mec = (min_thresholds * (feature_dimension + 1)) + (min_thresholds + 1)

    return mec

data = [
    [0.5, 0.3], [1.2, 0.8], [2.0, 1.5],
    [2.5, 2.0], [3.0, 2.5], [3.5, 3.0],
    [4.0, 3.5], [4.5, 4.0], [5.0, 4.5]
]
targets = [0.6, 1.0, 1.7, 2.3, 2.8, 3.3, 3.7, 4.2, 4.6]


mec = calculate_mec_regression(data, targets, epsilon=0.5)
print("Memory-equivalent capacity for regression:", mec)

Memory-equivalent capacity for regression: 10.287712379549449
