# Linear Algebra module


## Vectors and Vector Operations

A **vector** is a one-dimensional array of numbers. We can think of it as a point in space. Vectors are fundamental in machine learning and data science, as they are used to represent features, parameters, or even inputs and outputs of models.

Common vector operations include:

- **Vector addition**: The sum of two vectors of the same length is obtained by adding corresponding elements.
- **Scalar multiplication**: Multiplying a vector by a scalar means multiplying each component by the scalar.

### Mathematical Definition

Given two vectors $\textbf{v}$ and $\textbf{w}$ of the same dimension:

$$
\textbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix}, \quad \textbf{w} = \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ w_n \end{bmatrix}
$$

- **Vector addition**: The sum $\textbf{v} + \textbf{w}$ is defined as:

$$
\textbf{v} + \textbf{w} = \begin{bmatrix} v_1 + w_1 \\ v_2 + w_2 \\ \vdots \\ v_n + w_n \end{bmatrix}
$$

- **Scalar multiplication**: If $c$ is a scalar and $\textbf{v}$ is a vector, the scalar multiplication $c \textbf{v}$ is given by:

$$
c \textbf{v} = \begin{bmatrix} c v_1 \\ c v_2 \\ \vdots \\ c v_n \end{bmatrix}
$$

### Example 1: Basic Vector Operations

Let $\textbf{v} = [1, 2, 3]$ and $\textbf{w} = [4, 5, 6]$. The sum $\textbf{v} + \textbf{w}$ is calculated by adding the corresponding components:

$$
\textbf{v} + \textbf{w} = \begin{bmatrix} 1 + 4 \\ 2 + 5 \\ 3 + 6 \end{bmatrix} = \begin{bmatrix} 5 \\ 7 \\ 9 \end{bmatrix}
$$

### Example 2: Scalar Multiplication

Now, if we take $c = 3$, the scalar multiplication $c \textbf{v}$ is calculated as follows:

$$
3 \textbf{v} = \begin{bmatrix} 3 \times 1 \\ 3 \times 2 \\ 3 \times 3 \end{bmatrix} = \begin{bmatrix} 3 \\ 6 \\ 9 \end{bmatrix}
$$

### Example 3: Use Case in Machine Learning

In machine learning, vectors often represent features of a data point. For example, in image recognition, a pixel image could be represented as a vector of pixel values.

Let's say we are dealing with a dataset of images, and each image is represented as a vector of pixel intensities. A common preprocessing step is to normalize the pixel values or apply transformations such as scalar multiplication to scale the vector.

#### Additional Example: Vector Addition in a Data Science Context

In collaborative filtering (a recommender system), vector addition can be used to combine user preferences (represented as vectors) with item features to predict user ratings.



In [37]:

import numpy as np

# Example 1: Basic vector operations
v = np.array([1, 2, 3])
w = np.array([4, 5, 6])

# Vector addition
vector_sum = v + w
print("Vector sum (v + w):", vector_sum)

# Scalar multiplication
scalar_mult = 3 * v
print("Scalar multiplication (3 * v):", scalar_mult)

# Example 2: Use case - Scaling pixel intensities in an image (represented as a vector)
image_pixel_vector = np.array([0.1, 0.2, 0.3])  # Example pixel values
scaled_pixel_vector = 2 * image_pixel_vector  # Scaling by a factor of 2
print("Scaled pixel intensities:", scaled_pixel_vector)

# Example 3: Recommender system example (Collaborative filtering)
user_preferences = np.array([5, 3, 0])  # User likes items 1 and 2, indifferent to item 3
item_features = np.array([0.5, 0.2, 0.8])  # Item features based on historical data

# Predicting a rating by adding user preferences and item features
predicted_rating = user_preferences + item_features
print("Predicted rating (user + item):", predicted_rating)


Vector sum (v + w): [5 7 9]
Scalar multiplication (3 * v): [3 6 9]
Scaled pixel intensities: [0.2 0.4 0.6]
Predicted rating (user + item): [5.5 3.2 0.8]



## Dot Product

The **dot product** of two vectors $ \textbf{v} $ and $ \textbf{w} $, both of the same length, is the sum of the products of corresponding entries. The dot product is useful in calculating projections, angles between vectors, and more.

### Mathematical Definition

Given two vectors:

$$
\textbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix}, \quad \textbf{w} = \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ w_n \end{bmatrix}
$$

The dot product $ \textbf{v} \cdot \textbf{w} $ is defined as:

$$
\textbf{v} \cdot \textbf{w} = v_1 w_1 + v_2 w_2 + \cdots + v_n w_n = \sum_{i=1}^{n} v_i w_i
$$

### Example 1: Basic Dot Product Calculation

Let $ \textbf{v} = [1, 2, 3] $ and $ \textbf{w} = [4, 5, 6] $. The dot product is calculated as follows:

$$
\textbf{v} \cdot \textbf{w} = (1)(4) + (2)(5) + (3)(6) = 4 + 10 + 18 = 32
$$

### Example 2: Geometrical Interpretation

The dot product can also be used to calculate the angle between two vectors. The cosine of the angle between vectors $\textbf{v}$ and $\textbf{w}$ is given by:

$$
\cos(\theta) = \frac{\textbf{v} \cdot \textbf{w}}{\|\textbf{v}\| \|\textbf{w}\|}
$$

Where $\|\textbf{v}\|$ is the magnitude of vector $\textbf{v}$. This is widely used in computer vision and NLP to measure the similarity of data points.

### Example 3: Use Case in Machine Learning

In machine learning, the dot product is used in algorithms like linear regression and neural networks. For instance, in linear regression, the prediction is obtained by computing the dot product between feature vectors and weight vectors.

### Example 4: Cosine Similarity

In natural language processing (NLP), **cosine similarity** is a metric used to measure how similar two documents are based on their feature vectors. It uses the dot product to calculate the cosine of the angle between the vectors.

#### Formula for Cosine Similarity:

$$
\text{Cosine Similarity} = \frac{\textbf{v} \cdot \textbf{w}}{\|\textbf{v}\| \|\textbf{w}\|}
$$

This is frequently used in search engines, recommendation systems, and document clustering.


In [38]:

import numpy as np

# Example 1: Basic dot product
v = np.array([1, 2, 3])
w = np.array([4, 5, 6])
dot_product = np.dot(v, w)
print("Dot product (v · w):", dot_product)

# Example 2: Calculating cosine of the angle between two vectors
magnitude_v = np.linalg.norm(v)
magnitude_w = np.linalg.norm(w)
cos_theta = dot_product / (magnitude_v * magnitude_w)
print("Cosine of the angle between v and w:", cos_theta)

# Example 3: Use case in machine learning - Dot product in linear regression
# Assume feature vector x and weight vector w
x = np.array([2, 3, 4])
weights = np.array([0.5, 0.2, 0.1])
prediction = np.dot(x, weights)
print("Prediction (linear regression):", prediction)

# Example 4: Cosine similarity between two document vectors (example in NLP)
doc1 = np.array([1, 2, 1])
doc2 = np.array([0, 1, 2])
cosine_similarity = np.dot(doc1, doc2) / (np.linalg.norm(doc1) * np.linalg.norm(doc2))
print("Cosine similarity between doc1 and doc2:", cosine_similarity)


Dot product (v · w): 32
Cosine of the angle between v and w: 0.9746318461970762
Prediction (linear regression): 2.0
Cosine similarity between doc1 and doc2: 0.7302967433402214


## Matrix Multiplication

A **matrix** is a two-dimensional array of numbers arranged in rows and columns. **Matrix multiplication** is one of the most important operations in linear algebra, widely used in fields such as computer graphics, machine learning, physics, and economics.

### Why Do We Need Matrix Multiplication?

Matrix multiplication helps in solving systems of linear equations, transforming geometric objects (rotation, scaling), and modeling real-world phenomena. It’s essential for efficient computation in various algorithms, including those used in deep learning and optimization problems. Simple applications include:

- **Transformation of coordinates** in graphics
- **Linear systems** solutions in physics and engineering
- **Data compression** and **encryption algorithms**

### Mathematical Definition

Matrix multiplication involves taking the **dot product** of the rows of the first matrix with the columns of the second matrix. For two matrices to be multiplied, the number of columns in the first matrix must equal the number of rows in the second matrix.

Given an $m \times n$ matrix $A$ and an $n \times p$ matrix $B$, the resulting matrix $C$ will have dimensions $m \times p$. Each element $ c_{ij}$ in the resulting matrix is the dot product of the $ i $-th row of $ A$ with the $ j$-th column of $B$:

$$
C = A \cdot B, \quad C_{ij} = \sum_{k=1}^{n} A_{ik} \cdot B_{kj}
$$

### Example

Let

$$
A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, \quad B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}
$$

We can compute the matrix multiplication \( A \cdot B \) as follows:

$$
A \cdot B = \begin{bmatrix} (1)(5) + (2)(7) & (1)(6) + (2)(8) \\ (3)(5) + (4)(7) & (3)(6) + (4)(8) \end{bmatrix}
= \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}
$$

### Step-by-Step Breakdown

- First row, first column: $ (1)(5) + (2)(7) = 5 + 14 = 19 $
- First row, second column: $ (1)(6) + (2)(8) = 6 + 16 = 22 $
- Second row, first column: $ (3)(5) + (4)(7) = 15 + 28 = 43 $
- Second row, second column: $ (3)(6) + (4)(8) = 18 + 32 = 50 $

Thus, the result of the matrix multiplication is:

$$
A \cdot B = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}
$$

### Applications in Real Life

- **Machine learning**: Neural networks use matrix multiplication to propagate input data through layers of neurons.
- **Computer graphics**: Matrices are used to rotate, scale, and translate objects in 3D space.
- **Economics**: Matrices are used in input-output models to predict how industries affect each other.

In [39]:
# Define matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Matrix multiplication
matrix_product = np.dot(A, B)
matrix_product


array([[19, 22],
       [43, 50]])

## Determinant of a Matrix

The **determinant** of a square matrix is a scalar value that can be computed from its elements. The determinant plays a crucial role in linear algebra, as it provides important information about the matrix. It is particularly useful for understanding whether a matrix is invertible (non-zero determinant) or singular (zero determinant).

### Why Do We Need the Determinant?

- **Matrix Invertibility**: A matrix is invertible (has an inverse) if and only if its determinant is non-zero.
- **Area/Volume Interpretation**: The determinant can represent the scaling factor by which a transformation (represented by the matrix) scales area (in 2D) or volume (in 3D). A zero determinant indicates that the transformation collapses the space into a lower dimension (e.g., a line or a point).
- **Linear Independence**: The determinant provides insight into whether the rows or columns of a matrix are linearly independent. If the determinant is zero, the rows or columns are linearly dependent.
- **Solving Systems of Linear Equations**: Determinants are used in Cramer's Rule to solve systems of linear equations.

### Mathematical Definition

For a square $n \times n $ matrix, the determinant is a recursive function that sums products of the matrix’s elements and their minors. For small matrices, the determinant can be computed easily.

#### 2x2 Matrix

The determinant of a 2x2 matrix $ A $ is given by:

$$
A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}, \quad \text{det}(A) = ad - bc
$$

#### 3x3 Matrix

The determinant of a 3x3 matrix $ A $ is:

$$
A = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix}
$$

$$
\text{det}(A) = a(ei - fh) - b(di - fg) + c(dh - eg)
$$

### Example: Determinant of a 2x2 Matrix

Let

$$
A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}
$$

The determinant is calculated as:

$$
\text{det}(A) = (1)(4) - (2)(3) = 4 - 6 = -2
$$

Since the determinant is non-zero, the matrix is invertible.

### Example: Determinant of a 3x3 Matrix

Let

$$
B = \begin{bmatrix} 2 & 3 & 1 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}
$$

The determinant is:

$$
\text{det}(B) = 2((5)(9) - (6)(8)) - 3((4)(9) - (6)(7)) + 1((4)(8) - (5)(7))
$$

Breaking it down:

$$
\text{det}(B) = 2(45 - 48) - 3(36 - 42) + 1(32 - 35)
$$

$$
\text{det}(B) = 2(-3) - 3(-6) + 1(-3)
$$

$$
\text{det}(B) = -6 + 18 - 3 = 9
$$

Thus, $ \text{det}(B) = 9 $, indicating that matrix $ B $ is invertible.

### Applications in Real Life

- **Physics**: Determinants help describe transformations in space, such as rotations and scaling in mechanics.
- **Computer graphics**: Determinants are used in rendering techniques for determining object transformations and perspective.
- **Engineering**: Determinants are used to analyze systems of equations representing physical phenomena like electrical circuits, structural models, and mechanical systems.

Understanding the determinant allows us to assess the properties of matrices and apply them in various fields effectively.

In [40]:
# Determinant of a matrix
determinant_A = np.linalg.det(A)
determinant_A

np.float64(-2.0000000000000004)

## Inverse of a Matrix

The **inverse** of a square matrix $ A $ is denoted as $ A^{-1} $, and it's the matrix that satisfies the equation:

$$
A \cdot A^{-1} = A^{-1} \cdot A = I
$$

where $ I $ is the **identity matrix**, a matrix with 1's on the diagonal and 0's elsewhere. The identity matrix acts as the neutral element in matrix multiplication, just like 1 in scalar multiplication. Not all matrices have an inverse, and a matrix must have a **non-zero determinant** to be invertible.

### Why Do We Need the Inverse of a Matrix?

The inverse of a matrix is critical for solving systems of linear equations, finding transformations, and performing other operations in various fields like machine learning, computer graphics, and engineering:

- **Solving Linear Systems**: If $ A \textbf{x} = \textbf{b} $, then the solution can be found by multiplying both sides by $ A^{-1}$, yielding $ \textbf{x} = A^{-1} \textbf{b} $.
- **Matrix Division**: Since there is no direct "division" for matrices, we use the inverse instead.
- **Linear Transformations**: Inverse matrices reverse the effects of transformations like rotation, scaling, or shearing.

### Conditions for Invertibility

For a matrix to have an inverse:
1. It must be square (same number of rows and columns).
2. Its determinant must be non-zero.

If a matrix's determinant is zero, it is said to be **singular**, and no inverse exists.

### Formula for the Inverse of a 2x2 Matrix

For a 2x2 matrix $ A $:

$$
A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}
$$

If $ \text{det}(A) = ad - bc \neq 0 $, the inverse $ A^{-1} $ is given by:

$$
A^{-1} = \frac{1}{ad - bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}
$$

### Example: Inverse of a 2x2 Matrix

Let

$$
A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}
$$

First, compute the determinant:

$$
\text{det}(A) = (1)(4) - (2)(3) = 4 - 6 = -2
$$

Since the determinant is non-zero, the matrix is invertible. The inverse is:

$$
A^{-1} = \frac{1}{-2} \begin{bmatrix} 4 & -2 \\ -3 & 1 \end{bmatrix} = \begin{bmatrix} -2 & 1 \\ 1.5 & -0.5 \end{bmatrix}
$$

### Verifying the Inverse

To verify that $ A^{-1} $ is indeed the inverse, we can check the multiplication:

$$
A \cdot A^{-1} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \cdot \begin{bmatrix} -2 & 1 \\ 1.5 & -0.5 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}
$$

Thus, the inverse is correct.

### Applications in Real Life

- **Cryptography**: Matrix inverses are used in encoding and decoding messages in certain encryption algorithms.
- **Computer Graphics**: Inverse matrices are used in transformations to move objects in virtual 3D spaces.
- **Physics and Engineering**: Inverse matrices solve systems of equations in circuit analysis, mechanical systems, and more.

Understanding how to compute and use the inverse of a matrix is fundamental in many areas of mathematics and applied sciences.

In [41]:
# Inverse of a matrix
inverse_A = np.linalg.inv(A)
inverse_A


array([[-2. ,  1. ],
       [ 1.5, -0.5]])

## Eigenvalues and Eigenvectors

Given a square matrix $ A $, a non-zero vector $ \textbf{v} $ is called an **eigenvector** of $ A $ if it satisfies the equation:

$$
A \textbf{v} = \lambda \textbf{v}
$$

where $ \lambda $ is a scalar known as the **eigenvalue** corresponding to the eigenvector $ \textbf{v} $. Eigenvalues and eigenvectors provide deep insights into the properties of a matrix, particularly its linear transformations, stability, and structure.

### Why Do We Need Eigenvalues and Eigenvectors?

Eigenvalues and eigenvectors are powerful tools in linear algebra with many applications across mathematics, physics, engineering, and data science:

- **Stability Analysis**: In systems of differential equations, eigenvalues help determine the stability of equilibria.
- **Principal Component Analysis (PCA)**: In data science and machine learning, eigenvectors represent principal directions of variance, helping reduce dimensionality.
- **Quantum Mechanics**: Eigenvalues and eigenvectors describe observable quantities, such as energy levels, in quantum systems.
- **Vibration Analysis**: In mechanical and structural engineering, they help analyze natural vibration modes of systems.
- **Graph Theory**: Eigenvectors can describe properties of networks, such as the importance of nodes.

### Mathematical Definition

The equation $A \textbf{v} = \lambda \textbf{v} $ means that the matrix $ A $ transforms the vector $\textbf{v} $ into a new vector that is simply a scaled version of $ \textbf{v} $ itself, with the scaling factor being $\lambda $. The eigenvector $ \textbf{v} $ points in a direction that remains unchanged by the transformation.

### How to Find Eigenvalues and Eigenvectors

1. **Eigenvalues**: To find the eigenvalues $ \lambda $ of a matrix $ A $, we solve the **characteristic equation**:

$$
\text{det}(A - \lambda I) = 0
$$

where $ I $ is the identity matrix, and $ \text{det} $ denotes the determinant.

2. **Eigenvectors**: Once the eigenvalues $ \lambda $ are found, we substitute each eigenvalue into the equation $ A \textbf{v} = \lambda \textbf{v} $ and solve for the eigenvector $ \textbf{v} $.

### Example: Eigenvalues and Eigenvectors of a 2x2 Matrix

Let

$$
A = \begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix}
$$

1. **Find the Eigenvalues**:
   The characteristic equation is:

   $$
   \text{det}(A - \lambda I) = \text{det}\begin{bmatrix} 1 - \lambda & 2 \\ 2 & 1 - \lambda \end{bmatrix} = 0
   $$

   Expanding the determinant:

   $$
   (1 - \lambda)(1 - \lambda) - (2)(2) = 0
   $$

   $$
   (\lambda^2 - 2\lambda + 1) - 4 = 0
   $$

   $$
   \lambda^2 - 2\lambda - 3 = 0
   $$

  Solving this system gives $ \textbf{v}_2 = \begin{bmatrix} 1 \\ -1 \end{bmatrix} $.

Thus, the eigenvalues of $ A $ are $ \lambda_1 = 3 $ and $ \lambda_2 = -1 $, and the corresponding eigenvectors are $ \textbf{v}_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix} $ and $ \textbf{v}_2 = \begin{bmatrix} 1 \\ -1 \end{bmatrix} $.

### Applications in Real Life

- **Machine Learning**: Eigenvectors are used in PCA to reduce dimensionality and extract important features.
- **Vibration Analysis**: In engineering, eigenvalues describe the natural frequencies at which structures vibrate.
- **Quantum Mechanics**: In quantum systems, eigenvalues represent measurable quantities like energy, while eigenvectors describe the state of the system.
- **Markov Chains**: In probability, eigenvectors of the transition matrix represent the steady-state distributions.

Eigenvalues and eigenvectors reveal fundamental properties of linear transformations, making them essential tools in a wide variety of scientific and engineering applications.

In [42]:
# Define a matrix
C = np.array([[1, 2], [2, 1]])

# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(C)
eigenvalues, eigenvectors


(array([ 3., -1.]),
 array([[ 0.70710678, -0.70710678],
        [ 0.70710678,  0.70710678]]))

# Important concepts in Linear Algebra 

## Linear Transformations
Linear transformations are functions that map vectors from one vector space to another while preserving the operations of vector addition and scalar multiplication. They play a key role in machine learning, especially in neural networks where layers of transformations are applied to input data.

### Properties of Linear Transformations:
- For a linear transformation $ T $, the following properties hold:
  - **Additivity**: $ T(\mathbf{v}_1 + \mathbf{v}_2) = T(\mathbf{v}_1) + T(\mathbf{v}_2) $
  - **Homogeneity**: $ T(\alpha \mathbf{v}) = \alpha T(\mathbf{v}) $, where $ \alpha $ is a scalar.

Linear transformations can be represented as matrix multiplications. For example, applying a transformation $ T $ to a vector $ \mathbf{v} $ is equivalent to multiplying a matrix $ A $ by $ \mathbf{v} $.

### Applications in Machine Learning:
- Linear transformations are the foundation for layers in neural networks, where inputs are transformed through matrices (weights) and passed through activation functions.
- Data preprocessing often involves linear transformations such as scaling, rotation, and projection onto lower-dimensional spaces.

### Python Example: Linear Transformation with a Matrix

In [43]:
import numpy as np

# Define a linear transformation matrix
T = np.array([[2, 0], [0, 3]])

# Input vector
v = np.array([1, 1])

# Apply the transformation
transformed_v = np.dot(T, v)
print("Transformed vector:", transformed_v)

Transformed vector: [2 3]


## Norms
A **norm** is a measure of the length or magnitude of a vector. In machine learning, norms are used to quantify the size of vectors, which is essential in optimization problems, regularization, and measuring distances between data points.

### Common Norms:
1. **L1 Norm (Manhattan Distance)**:
   - This norm is the sum of the absolute values of the vector's components:
     $$
     \| \mathbf{v} \|_1 = \sum_{i=1}^{n} |v_i|
     $$
   - **Application**: Lasso regularization uses the L1 norm to encourage sparsity in machine learning models by penalizing large coefficients.

2. **L2 Norm (Euclidean Distance)**:
   - This norm is the square root of the sum of the squared components of the vector:
     $$
     \| \mathbf{v} \|_2 = \sqrt{\sum_{i=1}^{n} v_i^2}
     $$
   - **Application**: Ridge regression uses the L2 norm for regularization, penalizing large weights to avoid overfitting.

### Python Example: Calculating Norms


In [44]:
from numpy.linalg import norm

# Define a vector
v = np.array([3, 4])

# Calculate L1 and L2 norms
l1_norm = norm(v, 1)
l2_norm = norm(v, 2)

print(f"L1 Norm: {l1_norm}")
print(f"L2 Norm: {l2_norm}")

L1 Norm: 7.0
L2 Norm: 5.0


## Orthogonal and Orthonormal Matrices
A matrix is **orthogonal** if its transpose is equal to its inverse, i.e., $ Q^T Q = I $, where $ I $ is the identity matrix. This property ensures that the matrix preserves the length and angles between vectors, making orthogonal matrices useful in many machine learning algorithms, such as PCA.

An **orthonormal** matrix has columns that are both orthogonal and of unit length.

### Properties:
- **Length-preserving**: If $ Q $ is orthogonal, $ \| Q \mathbf{v} \| = \| \mathbf{v} \| $.
- **Angle-preserving**: Orthogonal matrices preserve the angles between vectors.

### Applications in Machine Learning:
- **Principal Component Analysis (PCA)** uses orthogonal matrices to project data onto lower-dimensional subspaces while preserving as much variance as possible.
- **QR Decomposition**, used in solving linear systems and least squares problems, relies on orthogonal matrices.

### Python Example: Orthogonal Matrix Check

In [45]:
Q = np.array([[1, 0], [0, -1]])

# Check if Q is orthogonal
is_orthogonal = np.allclose(np.dot(Q.T, Q), np.eye(2))
print(f"Is Q orthogonal? {is_orthogonal}")

Is Q orthogonal? True


## Singular Value Decomposition (SVD)
**Singular Value Decomposition (SVD)** is a matrix factorization technique that decomposes any matrix \( A \) into three matrices:
$$
A = U \Sigma V^T
$$
Where:
- $ U $ is an orthogonal matrix.
- $ \Sigma $ is a diagonal matrix of singular values (non-negative values).
- $ V^T $ is the transpose of an orthogonal matrix \( V \).

### Applications in Machine Learning:
- **Dimensionality Reduction**: SVD is used in techniques like PCA for reducing the dimensionality of data while retaining the most important information.
- **Latent Semantic Analysis (LSA)**: In natural language processing, SVD is used to identify patterns in the relationships between terms and documents.

### Python Example: SVD

In [46]:
from numpy.linalg import svd

# Define a matrix
A = np.array([[1, 2], [3, 4], [5, 6]])

# Perform SVD
U, Sigma, Vt = svd(A)

print("U Matrix:\n", U)
print("Sigma (Singular Values):\n", Sigma)
print("V Transpose Matrix:\n", Vt)

U Matrix:
 [[-0.2298477   0.88346102  0.40824829]
 [-0.52474482  0.24078249 -0.81649658]
 [-0.81964194 -0.40189603  0.40824829]]
Sigma (Singular Values):
 [9.52551809 0.51430058]
V Transpose Matrix:
 [[-0.61962948 -0.78489445]
 [-0.78489445  0.61962948]]


## Rank of a Matrix
The **rank** of a matrix is the number of linearly independent rows or columns in the matrix. It indicates the maximum number of independent vectors that can be extracted from the matrix.

### Applications in Machine Learning:
- **Linear Systems**: The rank helps determine whether a system of linear equations has a unique solution. A matrix with full rank implies a unique solution, while a lower-rank matrix may indicate an under-determined system.
- **Data Compression**: In dimensionality reduction techniques like PCA, the rank of a matrix tells us how many dimensions are truly needed to represent the data.

### Python Example: Matrix Rank

In [47]:
# Define a matrix
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Compute the rank
rank = np.linalg.matrix_rank(A)
print(f"Rank of the matrix: {rank}")

Rank of the matrix: 2


## Projections
A **projection** is a linear transformation that maps a vector onto a subspace. In machine learning, projections are used for dimensionality reduction, where data is projected onto a lower-dimensional space.

### Applications in Machine Learning:
- **Principal Component Analysis (PCA)** projects data onto principal components, which are the directions of maximum variance, to reduce the dimensionality of the dataset.
- **Feature Engineering**: Projections can help remove irrelevant features by projecting data onto a subspace where only important features remain.

### Python Example: Projection of a Vector onto a Line

In [48]:
# Define a vector and a line (unit vector)
v = np.array([2, 3])
u = np.array([1, 0])  # Unit vector along x-axis

# Projection of v onto u
projection = (np.dot(v, u) / np.dot(u, u)) * u
print("Projection of v onto u:", projection)

Projection of v onto u: [2. 0.]


## QR Decomposition
**QR Decomposition** is a method of decomposing a matrix into two components:
- $ Q $, an orthogonal matrix.
- $ R $, an upper triangular matrix.

This is useful for solving linear systems and performing least squares optimization.

### Applications in Machine Learning:
- **Solving Linear Systems**: QR decomposition is used to solve systems of linear equations by reducing the complexity of matrix inversion.
- **Optimization**: In regression analysis, QR decomposition is used to compute the least squares solution to overdetermined systems (when there are more equations than unknowns).

### Python Example: QR Decomposition

In [49]:
from numpy.linalg import qr

# Define a matrix
A = np.array([[12, -51, 4], [6, 167, -68], [-4, 24, -41]])

# Perform QR decomposition
Q, R = qr(A)

print("Q Matrix:\n", Q)
print("R Matrix:\n", R)

Q Matrix:
 [[-0.85714286  0.39428571  0.33142857]
 [-0.42857143 -0.90285714 -0.03428571]
 [ 0.28571429 -0.17142857  0.94285714]]
R Matrix:
 [[ -14.  -21.   14.]
 [   0. -175.   70.]
 [   0.    0.  -35.]]


# Linear Regression on Housing Dataset using Linear Algebra

Linear regression is a fundamental method for predicting a target variable based on one or more input features. In this notebook, we will use the Boston housing dataset to demonstrate linear regression. We will perform the regression both with Scikit-learn's built-in functionality and by using linear algebra operations.

Linear regression tries to find the best-fit line for the data using the equation:  
$$ \hat{y} = X \beta + \epsilon $$  
where:
- $X$ is the matrix of input features,
- $\beta$ is the vector of coefficients,
- $\hat{y}$ is the predicted output,
- $\epsilon$ is the error term.

In [50]:
# Necessary libraries
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from numpy.linalg import inv

# Load the California housing dataset
california = fetch_california_housing()
data = pd.DataFrame(california.data, columns=california.feature_names)
data['MedHouseVal'] = california.target  # MedHouseVal is the target (house prices)
data.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


## Dataset Overview

We will be using the Boston housing dataset, which contains information on housing prices in Boston suburbs. The target variable is the median value of owner-occupied homes (`MEDV`), and there are 13 input features such as the percentage of lower status of the population (`LSTAT`), average number of rooms per dwelling (`RM`), etc.

In [51]:
# Split the data into training and test sets
X = data.drop('MedHouseVal', axis=1)  # Features
y = data['MedHouseVal']  # Target variable

# Standardize the features for better performance
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

## Linear Regression using Linear Algebra

The formula for linear regression is based on the normal equation:
$$ \hat{\beta} = (X^T X)^{-1} X^T y $$
Where:
- $X$ is the matrix of input features (with a column of ones added for the intercept),
- $y$ is the target vector,
- $\hat{\beta}$ is the vector of coefficients that we are trying to solve for.

In [52]:
# Add a column of ones to X_train for the intercept term
X_b = np.c_[np.ones((X_train.shape[0], 1)), X_train]  # Add bias term (intercept)

# Calculate the coefficients using the normal equation
beta_hat = np.dot(inv(np.dot(X_b.T, X_b)), np.dot(X_b.T, y_train))

# Print the coefficients
print(f"Coefficients (beta_hat):\n{beta_hat}\n")

Coefficients (beta_hat):
[ 2.06786231  0.85238169  0.12238224 -0.30511591  0.37113188 -0.00229841
 -0.03662363 -0.89663505 -0.86892682]



## Predictions and Evaluation

Now that we have the coefficients, we can use them to make predictions on the test set. We'll also evaluate the model's performance by calculating the Mean Squared Error (MSE) on the test data.

In [53]:
# Prepare X_test for predictions (add intercept term)
X_test_b = np.c_[np.ones((X_test.shape[0], 1)), X_test]

# Make predictions
y_pred = np.dot(X_test_b, beta_hat)

# Calculate the Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (MSE): {mse}")

Mean Squared Error (MSE): 0.555891598695247


## Comparison with Scikit-learn's Linear Regression

Let's compare our manually computed linear regression model with Scikit-learn's built-in linear regression to ensure that the results match.

In [54]:
# Import Scikit-learn's LinearRegression model
from sklearn.linear_model import LinearRegression

# Fit the model using Scikit-learn
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)

# Make predictions
y_pred_sklearn = lin_reg.predict(X_test)

# Calculate the Mean Squared Error for comparison
mse_sklearn = mean_squared_error(y_test, y_pred_sklearn)
print(f"Mean Squared Error (Scikit-learn): {mse_sklearn}")

# Compare coefficients
print(f"Scikit-learn coefficients:\n{np.r_[lin_reg.intercept_, lin_reg.coef_]}\n")
print(f"Manual coefficients (linear algebra):\n{beta_hat}")

Mean Squared Error (Scikit-learn): 0.5558915986952441
Scikit-learn coefficients:
[ 2.06786231  0.85238169  0.12238224 -0.30511591  0.37113188 -0.00229841
 -0.03662363 -0.89663505 -0.86892682]

Manual coefficients (linear algebra):
[ 2.06786231  0.85238169  0.12238224 -0.30511591  0.37113188 -0.00229841
 -0.03662363 -0.89663505 -0.86892682]
