# Linear Algebra for Data Science

## Learning Objectives
* Understand vectors, matrices, dot product, matrix multiplication
* Apply concepts to PCA and neural networks

## Table of Contents
1. [Scalars, Vectors, Matrices, Tensors](#section1)
2. [Matrix Operations](#section2)
3. [Eigenvalues and Eigenvectors](#section3)
4. [Programming Exercise: Matrix Operations](#section4)
5. [Programming Exercise: PCA Implementation](#section5)

<a id='section1'></a>
## 1. Scalars, Vectors, Matrices, Tensors

### Scalars
- A scalar is a single number
- Denoted in lowercase italics: $a, b, c$
- Examples: Temperature (25°C), Weight (65 kg)

### Vectors
- A vector is an ordered array of numbers
- Can be thought of as points in space or directions
- Denoted in bold lowercase: $\mathbf{a}, \mathbf{b}, \mathbf{c}$ or with an arrow: $\vec{a}$
- Examples: 
  - 2D vector: $\mathbf{v} = [3, 4]$ (a point in 2D space)
  - 3D vector: $\mathbf{v} = [3, 4, 5]$ (a point in 3D space)

#### Vector operations
- **Addition**: Add corresponding elements
  - $\mathbf{a} + \mathbf{b} = [a_1 + b_1, a_2 + b_2, ..., a_n + b_n]$
- **Scalar multiplication**: Multiply each element by scalar
  - $c\mathbf{a} = [c \cdot a_1, c \cdot a_2, ..., c \cdot a_n]$
- **Dot product**: Sum of element-wise products
  - $\mathbf{a} \cdot \mathbf{b} = a_1b_1 + a_2b_2 + ... + a_nb_n = \sum_{i=1}^{n} a_i b_i$
- **Norm (magnitude)**: Length of the vector
  - $\|\mathbf{a}\| = \sqrt{a_1^2 + a_2^2 + ... + a_n^2} = \sqrt{\mathbf{a} \cdot \mathbf{a}}$

### Matrices
- A matrix is a 2D array of numbers
- Denoted in bold uppercase: $\mathbf{A}, \mathbf{B}, \mathbf{C}$
- Size described as rows × columns
- Example: 
  - $\mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}$ is a 2×3 matrix

#### Matrix notation
- $a_{ij}$ refers to the element in the $i$-th row and $j$-th column
- $\mathbf{A}_{i,:}$ refers to the $i$-th row of $\mathbf{A}$
- $\mathbf{A}_{:,j}$ refers to the $j$-th column of $\mathbf{A}$

### Tensors
- Tensors are generalizations of matrices to higher dimensions
- A scalar is a 0-order tensor
- A vector is a 1st-order tensor
- A matrix is a 2nd-order tensor
- Higher-order tensors have more dimensions
- Common in deep learning for representing complex data like images (3D tensors) or videos (4D tensors)

<a id='section2'></a>
## 2. Matrix Operations

### Matrix Addition and Subtraction
- Element-wise operation between matrices of the same dimensions
- $\mathbf{C} = \mathbf{A} + \mathbf{B}$ means $c_{ij} = a_{ij} + b_{ij}$

### Scalar Multiplication
- Multiply each element by a scalar
- $\mathbf{C} = c\mathbf{A}$ means $c_{ij} = c \cdot a_{ij}$

### Matrix Multiplication
- Not element-wise! Uses dot products between rows and columns
- For $\mathbf{C} = \mathbf{A} \mathbf{B}$:
  - $\mathbf{A}$ must have dimensions $m \times n$
  - $\mathbf{B}$ must have dimensions $n \times p$
  - $\mathbf{C}$ will have dimensions $m \times p$
  - Each element $c_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj}$

### Matrix Transpose
- Flips a matrix over its diagonal
- If $\mathbf{B} = \mathbf{A}^T$, then $b_{ij} = a_{ji}$
- Changes an $m \times n$ matrix to an $n \times m$ matrix

### Identity Matrix
- Denoted as $\mathbf{I}$
- Square matrix with 1's on the diagonal and 0's elsewhere
- Property: $\mathbf{A} \mathbf{I} = \mathbf{I} \mathbf{A} = \mathbf{A}$

### Matrix Inverse
- Only defined for square matrices
- $\mathbf{A}^{-1}$ is the inverse of $\mathbf{A}$ if $\mathbf{A} \mathbf{A}^{-1} = \mathbf{A}^{-1} \mathbf{A} = \mathbf{I}$
- Not all matrices have inverses (singular matrices)

### Matrix Determinant
- Only defined for square matrices
- Denoted as $\det(\mathbf{A})$ or $|\mathbf{A}|$
- A matrix is invertible if and only if its determinant is non-zero

### Trace
- Sum of diagonal elements
- $\text{tr}(\mathbf{A}) = \sum_{i} a_{ii}$

<a id='section3'></a>
## 3. Eigenvalues and Eigenvectors

### Definition
- An eigenvector $\mathbf{v}$ of a square matrix $\mathbf{A}$ is a non-zero vector that, when multiplied by $\mathbf{A}$, results in a scalar multiple of itself
- The scalar is called the eigenvalue $\lambda$
- Mathematically: $\mathbf{A}\mathbf{v} = \lambda \mathbf{v}$

### Calculating Eigenvalues and Eigenvectors
1. Rearrange to $\mathbf{A}\mathbf{v} - \lambda \mathbf{v} = \mathbf{0}$
2. Factor out $\mathbf{v}$: $(\mathbf{A} - \lambda \mathbf{I})\mathbf{v} = \mathbf{0}$
3. For non-trivial solutions, $\det(\mathbf{A} - \lambda \mathbf{I}) = 0$
4. Solve this characteristic equation to find eigenvalues
5. For each eigenvalue, solve $(\mathbf{A} - \lambda \mathbf{I})\mathbf{v} = \mathbf{0}$ to find eigenvectors

### Importance in Data Science
- Eigendecomposition: $\mathbf{A} = \mathbf{Q} \mathbf{\Lambda} \mathbf{Q}^{-1}$
  - Where $\mathbf{Q}$ contains eigenvectors as columns
  - $\mathbf{\Lambda}$ is a diagonal matrix of eigenvalues
- Applications:
  - Principal Component Analysis (PCA)
  - Dimensionality reduction
  - PageRank algorithm (Google's original ranking algorithm)
  - Covariance matrices in statistics

<a id='section4'></a>
## 4. Programming Exercise: Matrix Operations

Let's implement some basic matrix operations from scratch, without using NumPy's matrix functions.

In [None]:
# First, let's import NumPy for creating arrays, but we'll implement operations ourselves
import numpy as np

# Define some matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print("Matrix A:")
print(A)
print("\nMatrix B:")
print(B)

In [None]:
# Implement matrix addition
def matrix_add(A, B):
    # Check if dimensions match
    if A.shape != B.shape:
        raise ValueError("Matrices must have the same dimensions for addition")
    
    # Initialize result matrix with zeros
    result = np.zeros(A.shape)
    
    # Add corresponding elements
    for i in range(A.shape[0]):  # rows
        for j in range(A.shape[1]):  # columns
            result[i, j] = A[i, j] + B[i, j]
            
    return result

# Test matrix addition
C = matrix_add(A, B)
print("A + B =")
print(C)

# Verify with NumPy
print("\nVerify with NumPy:")
print(A + B)

In [None]:
# Implement matrix multiplication
def matrix_multiply(A, B):
    # Check if dimensions are compatible
    if A.shape[1] != B.shape[0]:
        raise ValueError("Number of columns in A must equal number of rows in B")
    
    # Initialize result matrix with zeros
    result = np.zeros((A.shape[0], B.shape[1]))
    
    # Multiply matrices
    for i in range(A.shape[0]):  # rows of A
        for j in range(B.shape[1]):  # columns of B
            for k in range(A.shape[1]):  # columns of A / rows of B
                result[i, j] += A[i, k] * B[k, j]
                
    return result

# Test matrix multiplication
D = matrix_multiply(A, B)
print("A × B =")
print(D)

# Verify with NumPy
print("\nVerify with NumPy:")
print(np.matmul(A, B))

In [None]:
# Implement matrix transpose
def matrix_transpose(A):
    # Initialize result matrix with zeros
    result = np.zeros((A.shape[1], A.shape[0]))
    
    # Transpose matrix
    for i in range(A.shape[0]):
        for j in range(A.shape[1]):
            result[j, i] = A[i, j]
            
    return result

# Test matrix transpose
A_T = matrix_transpose(A)
print("A^T =")
print(A_T)

# Verify with NumPy
print("\nVerify with NumPy:")
print(A.T)

In [None]:
# Implement dot product
def dot_product(v1, v2):
    # Check if dimensions match
    if len(v1) != len(v2):
        raise ValueError("Vectors must have the same dimension")
    
    # Calculate dot product
    result = 0
    for i in range(len(v1)):
        result += v1[i] * v2[i]
            
    return result

# Test dot product
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
dot = dot_product(v1, v2)
print(f"Dot product of {v1} and {v2} = {dot}")

# Verify with NumPy
print(f"Verify with NumPy: {np.dot(v1, v2)}")

<a id='section5'></a>
## 5. Programming Exercise: PCA Implementation

Principal Component Analysis (PCA) is a dimensionality reduction technique widely used in machine learning. We'll implement it from scratch to understand the underlying linear algebra concepts.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# Load dataset for testing
iris = load_iris()
X = iris.data
y = iris.target

print(f"Dataset shape: {X.shape}")
print(f"First few samples:\n{X[:3]}")

In [None]:
def pca_from_scratch(X, num_components):
    """
    Perform Principal Component Analysis (PCA) from scratch
    
    Parameters:
    X: Input data matrix (samples × features)
    num_components: Number of principal components to return
    
    Returns:
    X_transformed: Transformed data (samples × num_components)
    eigenvectors: Principal components (features × num_components)
    eigenvalues: Variance explained by each component
    """
    # 1. Standardize the data
    X_std = (X - np.mean(X, axis=0)) / np.std(X, axis=0)
    
    # 2. Calculate covariance matrix
    cov_matrix = np.cov(X_std.T)
    
    # 3. Calculate eigenvalues and eigenvectors
    eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
    
    # 4. Sort eigenvalues and eigenvectors in descending order
    idx = eigenvalues.argsort()[::-1]
    eigenvalues = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    
    # 5. Select top k eigenvectors
    top_k_eigenvectors = eigenvectors[:, :num_components]
    
    # 6. Transform the data
    X_transformed = np.dot(X_std, top_k_eigenvectors)
    
    return X_transformed, top_k_eigenvectors, eigenvalues[:num_components]

In [None]:
# Apply PCA to reduce to 2 dimensions
X_pca, components, explained_variance = pca_from_scratch(X, 2)

# Calculate the variance explained ratio
total_variance = np.sum(explained_variance)
explained_variance_ratio = explained_variance / total_variance

print(f"Explained variance ratio: {explained_variance_ratio}")
print(f"Total variance explained: {np.sum(explained_variance_ratio) * 100:.2f}%")

In [None]:
# Visualize the results
plt.figure(figsize=(10, 8))

# Plot transformed data points with class colors
colors = ['navy', 'turquoise', 'darkorange']
for i, color in enumerate(colors):
    plt.scatter(X_pca[y == i, 0], X_pca[y == i, 1], color=color, alpha=0.8, lw=2,
                label=iris.target_names[i])
    
# Add labels and title
plt.xlabel(f'PC1 ({explained_variance_ratio[0]*100:.1f}%)')
plt.ylabel(f'PC2 ({explained_variance_ratio[1]*100:.1f}%)')
plt.title('PCA of Iris Dataset (Implemented from Scratch)')
plt.legend(loc='best')
plt.grid(True)

# Show plot
plt.tight_layout()
plt.show()

### Understanding PCA Step-by-Step

1. **Standardization**: Center data by subtracting mean and scaling by standard deviation
2. **Covariance Matrix**: Calculate how features vary with each other
3. **Eigendecomposition**: Find eigenvalues and eigenvectors of covariance matrix
4. **Sorting**: Arrange eigenvalues/vectors in descending order of eigenvalues
5. **Selection**: Choose top k eigenvectors (principal components)
6. **Transformation**: Project data onto new principal component space

### Connection to Neural Networks

- Linear layers in neural networks perform matrix multiplication: $y = Wx + b$
- PCA finds directions of maximum variance, while neural networks learn representations for specific tasks
- Both rely on linear algebra operations for transforming data into more useful representations
- Auto-encoders with linear activations can learn PCA-like projections

In [None]:
# Compare with sklearn's PCA
from sklearn.decomposition import PCA

# Apply sklearn's PCA
pca = PCA(n_components=2)
X_pca_sklearn = pca.fit_transform(X)

# Plot sklearn's PCA result
plt.figure(figsize=(10, 8))

for i, color in enumerate(colors):
    plt.scatter(X_pca_sklearn[y == i, 0], X_pca_sklearn[y == i, 1], color=color, alpha=0.8, lw=2,
                label=iris.target_names[i])
    
plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]*100:.1f}%)')
plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]*100:.1f}%)')
plt.title('PCA of Iris Dataset (Using sklearn)')
plt.legend(loc='best')
plt.grid(True)

plt.tight_layout()
plt.show()

print(f"sklearn explained variance ratio: {pca.explained_variance_ratio_}")
print(f"Our explained variance ratio: {explained_variance_ratio}")

## Summary

In this notebook, we've covered:

1. **Linear Algebra Basics**: Vectors, matrices, tensors and their operations
2. **Matrix Operations**: Addition, multiplication, transpose, and other operations
3. **Eigenvalues and Eigenvectors**: Their calculation and significance in data science
4. **PCA Implementation**: From scratch using fundamental linear algebra concepts

### Practical Applications

- **Data Preprocessing**: PCA for dimensionality reduction
- **Feature Engineering**: Creating meaningful representations
- **Neural Networks**: Understanding the matrix operations behind layers
- **Computer Vision**: Image processing using matrix operations
- **Natural Language Processing**: Word embeddings as vectors in high-dimensional space

### Further Reading

1. "Deep Learning" by Goodfellow, Bengio, and Courville - Chapter 2 on Linear Algebra
2. "Mathematics for Machine Learning" by Deisenroth, Faisal, and Ong
3. "Linear Algebra and Its Applications" by Gilbert Strang