# 7.1 Synaptic Plasticity, Hebb's Rule, and Statistical Learning

#### **Hebbian Learning**

**Key Idea**: *"Cells that fire together, wire together."* This principle suggests that the synaptic strength between two neurons increases when they are activated simultaneously.

- **Hebb's Rule**: If neuron A repeatedly and persistently excites neuron B, the synapse between them strengthens.
- **Mathematical Formulation**: 
  $$
  \Delta w_{ij} = \eta \cdot x_i \cdot x_j
  $$
  where $ w_{ij} $ is the synaptic weight between neurons $ i $ and $ j $, $ \eta $ is the learning rate, and $ x_i $ and $ x_j $ are the activities of the neurons.
  
- **Applications**: 
  - Unsupervised learning
  - Development of associative memories
  - Principal Component Analysis (PCA) via unsupervised learning
  
#### **Synaptic Plasticity**

Synaptic plasticity refers to the ability of synapses to strengthen or weaken over time, based on the activity levels of the neurons.

**Key Types**:
1. **Long-Term Potentiation (LTP)**: A persistent strengthening of synapses based on recent patterns of activity. It is often regarded as a major cellular mechanism behind learning and memory.
   - *Induced by*: High-frequency stimulation
   - *Outcome*: Increases synaptic efficacy
   
2. **Long-Term Depression (LTD)**: The long-lasting decrease in synaptic strength, often occurring as a result of low-frequency stimulation.
   - *Induced by*: Prolonged, low-frequency stimulation
   - *Outcome*: Decreases synaptic efficacy

**Mechanisms**:
- **Spike-Timing Dependent Plasticity (STDP)**: 
  - If a presynaptic neuron fires before a postsynaptic neuron (in a short time window), synaptic strength increases (LTP).
  - If the postsynaptic neuron fires before the presynaptic neuron, synaptic strength decreases (LTD).

**Biological Mechanisms**:
- **NMDA Receptors**: Key for LTP induction; they allow calcium influx when both presynaptic glutamate release and postsynaptic depolarization occur.
- **AMPA Receptors**: LTP results in more AMPA receptors being inserted into the postsynaptic membrane, increasing synaptic strength.

#### **Mathematical Models of Synaptic Plasticity**

1. **Oja's Rule**:
   - A modification of Hebbian learning that introduces weight normalization to prevent weights from growing indefinitely.
   - **Formula**:
     $$
     \Delta w_i = \eta \cdot (y \cdot x_i - y^2 \cdot w_i)
     $$
     where $ y = \sum w_i x_i $.

2. **BCM Model** (Bienenstock, Cooper, Munro):
   - Accounts for both LTP and LTD and introduces a sliding threshold to regulate synaptic changes.
   - The threshold depends on the average postsynaptic activity.

#### **Principal Component Analysis (PCA)**

**Key Idea**: PCA reduces the dimensionality of data by projecting it onto directions that capture the most variance.

- **Relation to Hebbian Learning**:
  - Hebbian learning can be used for unsupervised learning of the principal components of the input data.
  - **Sanger’s Rule**: A version of Hebbian learning that finds the principal components in a sequential manner.
  
- **Process**:
  1. Center the data by subtracting the mean.
  2. Compute covariance matrix.
  3. Extract eigenvectors (principal components) and eigenvalues.

---

### **Short Notes for Quick Review**

- **Hebbian Learning**: Synapses strengthen when pre- and postsynaptic neurons fire together.
- **LTP**: Synaptic strengthening due to high-frequency stimulation.
- **LTD**: Synaptic weakening due to low-frequency stimulation.
- **STDP**: Timing-dependent synaptic plasticity; pre-before-post (LTP), post-before-pre (LTD).
- **NMDA receptors**: Crucial for LTP induction, require both presynaptic activity and postsynaptic depolarization.
- **Oja's Rule**: Prevents weight explosion in Hebbian learning by introducing weight normalization.
- **BCM Model**: Balances LTP and LTD with a sliding threshold.
- **PCA**: Projects data onto axes that capture the most variance; related to Hebbian learning.



# 7.2 Introduction to Unsupervised Learning

1. **Competitive Learning:**
   - In a competitive learning neural network, each neuron is associated with a weight vector, and given an input, the neuron whose weight vector is closest to the input (i.e., the one with the highest activity) is declared the "winner."
   - The winner's weight vector is updated to move closer to the input. This is similar to a form of clustering, where neurons self-organize around input data clusters.
   - Over time, neurons specialize in different regions of the input space, and the network partitions the data into clusters.

2. **Self-Organizing Maps (SOM):**
   - Similar to competitive learning, but with an important difference: not only the winner's weights are updated, but also the weights of neighboring neurons (those located close to the winner on a 2D grid).
   - The aim is to preserve the topological relationships of the input data, mapping a potentially high-dimensional dataset to a lower-dimensional grid while maintaining the relative structure of the data.

3. **Unsupervised Learning and Generative Models:**
   - Unsupervised learning assumes that input data is generated by hidden causes, and the goal is to infer these causes (often through a probabilistic model).
   - In the case of clustering, you can model the data as being generated by a mixture of Gaussians, with each cluster represented by a Gaussian distribution.
   - The task is to learn the parameters of the Gaussians (means, variances, and priors) that best describe the data.

4. **Expectation-Maximization (EM) Algorithm:**
   - EM is a batch learning algorithm used to estimate the parameters of generative models.
   - It alternates between two steps:
     - **E-step:** Calculate the posterior probability that each data point belongs to each cluster (soft assignment).
     - **M-step:** Update the parameters (mean, variance, and prior) of each Gaussian based on the soft assignments from the E-step.
   - Over multiple iterations, the estimates of the cluster parameters improve, allowing the model to better represent the data.


# 7.3 Sparse Coding and Predictive Coding

#### Principal Component Analysis (PCA) and Eigenfaces

- **PCA Overview**:
  - PCA can be used to represent natural images through eigenvectors of the input covariance matrix.
  - Eigenvectors of the covariance matrix are known as "Eigenfaces" when applied to face images.
  - Any image can be represented as a linear combination of these eigenfaces.

- **Dimensionality Reduction**:
  - Using the top M eigenvectors (principal components) allows for significant dimensionality reduction.
  - Example: An image of size $1000 \times 1000$ pixels (1 million pixels) can be represented with only a few eigenfaces (e.g., 10), achieving compression.

- **Limitations**:
  - PCA (eigenvectors) is not effective for extracting local components (e.g., eyes, nose) or for detecting features like edges in natural scenes.

#### Sparse Coding

- **Linear Model**:
  - Represent natural scenes as a linear combination of basis vectors or features.
  - Basis vectors are not limited to eigenvectors; they can be more general features.

- **Generative Model**:
  - Defined by specifying a prior probability distribution for causes and a likelihood function.
  - Likelihood function: Assumes Gaussian noise; leads to a quadratic error term in log-likelihood.

- **Sparse Representation**:
  - Assumes that only a few causes (basis vectors) are active at a time.
  - Sparse distributions (super Gaussian) have a peak at zero and heavy tails.
  - Examples: Exponential distribution, Cauchy distribution.

- **Learning Basis Vectors (Matrix G)**:
  - Use Bayesian approach: Maximize posterior probability $p(v | u)$.
  - Function $F$ combines reconstruction error and sparseness constraint.
  - Update basis vectors using gradient ascent, similar to the EM algorithm.

- **Comparison with PCA**:
  - Sparse coding provides a sparser representation than PCA.
  - Basis vectors learned resemble receptor fields in the primary visual cortex.

#### Predictive Coding Networks

- **Concept**:
  - Use feedback connections for predictions and feedforward connections for error signals.
  - Includes recurrent weights for modeling time-varying inputs.

- **Predictive Coding Model**:
  - Explains feedforward and feedback connections in the visual cortex.
  - Feedback conveys predictions; feedforward conveys the error between prediction and actual input.

- **Applications**:
  - Helps explain contextual effects, surround suppression, and other phenomena in the visual cortex.

- **Further Study**:
  - Supplementary materials and papers for more details on predictive coding networks and related concepts.