# Chapter 10: Systems of Equations

- content:  265 - 296
- exercises:  297 - 300

In [1]:
# import commonly used Python libraries
import numpy as np
from matplotlib import pyplot as plt

- One of the great things about linear algebra is that it can provide compact notation for large collections of mathematical expressions, equations, variables, data, etc.
- In this chapter, we will learn how to represent systems of equations using matrices and vectors, and how to solve those systems using linear algebra operations.
- This knowledge will be central to a range of applications including matrix rank, the inverse, and least-squares statistical model fitting.

- What is an equation?  A statement of equality, typically with 1 or more unknown.  E.g. $2x=6$
- A single equation doesn't require linear algebra to solve, but a system of equations does.

Example of a system of equations:
$$2x+3y-5z = 8$$
$$-2y+2z = -3$$
$$5x-4z = 3$$

- It turns out that systems like this can be represented compactly using the form $Ax=b$
- And this doesn't just save space, converting a system of many equations into 1 matrix equation leads to new and efficient ways of solving those equations.

## 10.1 Algebra and geometry of equations

- algebraic equations have an associated picture.  e.g. $2x+3=11$ can be visualized with a point on the number line where $x=4$
- An equation with 2 variables is a line in 2D space.  e.g. $2y=x+4$
- Similarly, an equation with 3 variables has an associated 3D graph, and so on.

- So far, we've talked about individual equations.  What do *systems* of equations look like?  
- Consider the following system of 2 equations with 2 variables:
$$y=x/2+2$$
$$y=-x+4$$
- If one 2-variable equation is a line, then two 2-variable equations is 2 lines.
- The point where the 2 lines intersect is teh solution to both equations.
- In the above case, the solution is $(x,y)=(4/3,8/3)$

- As we konw, when one side of equations are modified, the other side of the equation must be modified as well.  Similarly, with systems of equations, if 1 equation is modified, the other equations must be modified as well.
- But having systems of equatinos allows you to do something more: you may add and subtract entire equations from each other (this is analogous to multiplying the left-hand side by 8 and the right-hand side by "4x2")
- Let's try this using the above equations.  (transform the 1st equation to be itself minus the 2nd equation):
$$0y=3x/2-2$$
$$y=-x+4$$
- next we'll replace the 2nd equation by itself plus 2 times the original first equation
$$0y=3x/2-2$$
$$3y=0x+8$$
- At first glance, this system of equations seems very different than the original, but what happens when we graph the lines?
- The 2 new lines (one horizontal, one vertical) intersect at the exact same point!
- So **even though the lines are different, the solution remains the same**: $(x,y)=(4/3,8/3)$

- The conclusion here is that you can take a system of equations, scalar multiply individual equations, and add or subtract equations from each other, to your heart's delight.
- The individual equations--and the lines representing those equations--will change, but the point of intersection will remain exactly the same (excluding multiplying by 0, but let's ignore that obvious case)
- This is an extremely powerful tactic, because you can manipulate equations into ones that are much easier for humans to solve.
- That is the principle behind Gaussian elimination and row-recution which we will cover later in this chapter.

- Do all systems have a common algebraic solution and unique geometric crossing?  No, definitely not.
- A system of 2D equations like the one above can either have 1) one intersecting point (most common), 2) no points in common (parallel lines), or 3) infinite number of points in common (identical lines).

## 10.2 Matrices representing systems of equations

- Converting a system of equations into a matrix equation is straight-forward, and requires an understanding of the 3 components that together define a system of equations:
1. **Variables**: These are the unknowns that you want to solve for (x, y, etc)
2. **Coefficients:**  These are the numbers that multiply the variables. There is one coefficient per variable. If the variable is sitting by itself, then the coefficient is 1.  If the variable is not present, then the coefficient is 0.
3. **Constants:**  These are the numbers that do not multiply variables.  Every equation has one constant (which might be 0).

Example:

$2x+3y-4z=5$
- variables: x, y, z
- coefficients: 2, 3, -4
- constant: 5

### Components to matrices

Once you've identified the components of a system, you put those components into matrices:

1. The coefficients go into the coefficients matrix, with columns corresponding to variables and with rows corresponding to equations
2. The variables go into a column vector that right-multiplies the coefficients matrix.  Importantly, the order of variables in this vecot rmust match the order of variables in the columns of the matrix.
3. The coefficients matrix and variables vector are on the left-hand side of the equation.  The constants go into a column vector on the right hand side of the equation, with the number of elements in the vector corresponding to the number of equations.  Of course, the nth element in the constants vector must correspond to the nth equation in the coefficients matrix.

Example:

$$
\begin{Bmatrix}
2x + 3y - 5z = 8 \\
-2y + 2z + 2 = -1
\end{Bmatrix}
$$

can be converted into matrix form of:
$$
\begin{bmatrix}
2 & 3 & -5 \\
0 & -2 & 2
\end{bmatrix}
\begin{bmatrix}
x \\
y \\
z
\end{bmatrix}
=
\begin{bmatrix}
8 \\
3
\end{bmatrix}
$$

- remember to include a coefficient of 0 for missing matrices
- all loose constants should be combined into 1 constant

### Reflection:
- Sometimes, the biggest challenge in data analysis and modeling is figuring out how to represent a problem using equations; the rest is usually just a matter of algebra and number crunching.
- Indeed, the translation from real-world problem to matrix equation is rarely trivial, and sometimes impossible.  In this case, representing a system of equations as matrix-vector multiplication leads to the compact and simplified notation: $Ax=b$.  And this form leads to an equally compact solution via the least-squares algorithm (covered in Ch 14).

## 10.3 Row reduction, echelon form, and pivots

Note: I've covered this before in previous Linear algebra courses so for the sake of time and brevity I'll be taking briefer notes for the rest of the chapter, only highlighting key points.

- Row reduction involves modifying rows of a matrix while leaving many key properties (i.e. relationships) of the matrix intact.
- "Transforming the matrix to facilitate analyses" basically means increasing the number of zeros in the matrix.  The more zeros a matrix has, the easier and faster it is to work with.
- So think of row-reduction as reorganizing a matrix to increase the zero-values entries.

### Echelon form

- A matrix is in echelon form when the following two criteria are satisfied:
1. The first non-zero number in each row is to the right of the first non-zero numbers in the rows above.
2. Rows of all zeros are below rows with at least one non-zero element.

(i.e. aim for non-zeros in upper right diagonal)

Examples:
$$
\begin{bmatrix}
2 & 4 & 5 \\
0 & 1 & 3 \\
0 & 0 & 9
\end{bmatrix},
\begin{bmatrix}
4 & 3 & 0 \\
0 & 0 & 2 \\
0 & 0 & 0
\end{bmatrix},
\begin{bmatrix}
2 & 5 & 0 & 0 \\
0 & 0 & 2 & 0 \\
0 & 0 & 0 & 9
\end{bmatrix},
\begin{bmatrix}
4 & 1 \\
0 & 1 \\
0 & 0 \\
0 & 0
\end{bmatrix}
$$

- To get into reduced row echelon form, manipulate rows by adding or subtracting scaled versions of other row
- note: if any rows are linearly dependent, they will end up as a row of all zeros.
  - This can be very helpful in calculating the rank of a matrix.

### A few tips for row reduction:


1. Divide an entire row by a scalar to make the left-most non-zero number equal 1.
$$
\begin{bmatrix}
3 & 6 & 9 \\
...
\end{bmatrix}
\rightarrow
\begin{bmatrix}
1 & 2 & 3 \\
...
\end{bmatrix}
$$

2. Multiply a row by a scalar to facilitate eliminating elements

3. Multiply a row by a scalar to get rid of difficult fractions

### Keeping track of row reduction

- Changes to a matrix are reversible if you keep track of them
- We can consolidate all the changes to a matrix in another matrix: the $R$ matrix
- Note that if information can be lost if we do not keep track of changes, but by using an $R$ matrix to track changes, no info is lost
- In simple cases, we don't really need to keep track of changes, but for more complex changes and permutations, it becomes vital.

### Exhanging rows in a matrix

- exchanging rows of a matrix is a linear transformation--sometimes called a *permutation* since we are permuting the row order--which means it can be expressed as matrix multiplication.
- The way to do this is to manipulate the identity matrix for the rows you want to exchange
- for example, if we want to exchange the 1st and 2nd rows of a 3x3 matrix, create matrix $P$ which is the identity matrix with the first 2 rows exchanged/swapped:
$$
PA=
\begin{bmatrix}
0 & 1 & 0 \\
1 & 0 & 1 \\
0 & 0 & 1
\end{bmatrix}
\begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9
\end{bmatrix}
=
\begin{bmatrix}
4 & 5 & 6 \\
1 & 2 & 3 \\
7 & 8 & 9
\end{bmatrix}
$$

- note that this can be a good example of how matrix multiplication is non-commutative.
- if we take the above example and right multiply, then we exchange columns instead of rows
$$
AP=
\begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9
\end{bmatrix}
\begin{bmatrix}
0 & 1 & 0 \\
1 & 0 & 1 \\
0 & 0 & 1
\end{bmatrix}
=
\begin{bmatrix}
2 & 1 & 3 \\
5 & 4 & 6 \\
8 & 7 & 9
\end{bmatrix}
$$

#### Order of matrices for transorming rows vs. columns
Using the mnemonic from previous chapters will help remember order of matrices when multiplying left vs. right for rows vs. columns:

- P**R**e-multiply to transform **R**ows

- P**O**st-multiply to transform c**O**lumns

### Pivots

- AFter putting the matrix into echelon form (non-zero upper right diagonal), the pivots are the left-most non-zero elements in each row
- Not every row has a pivot and not every column has a pivot.
- A zero cannot be a pivot
- The pivots of a matrix are generally not known/visible before transforming into echelon form.

In [2]:
# Lower/upper decomposition
# "LU" stands for lower-upper, and is a decomposition that represents a matrix of 2 triangular matrices
A = np.random.randn(3,3)
from scipy.linalg import lu
P,L,U = lu(A)
print(P)
print(L)
print(U)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[ 1.          0.          0.        ]
 [-0.23713341  1.          0.        ]
 [-0.22031483 -0.54649913  1.        ]]
[[-1.25500948  0.0788026   0.76348943]
 [ 0.         -1.61259837 -0.44079618]
 [ 0.          0.         -2.22865359]]


### Pivot-counting and rank
- The rank of a matrix is the number of pivots in the echelon form of that matrix.
- any linearly dependent rows get zero'd out via row reduction and we are left with the linearly independent rows
- So the echelon form "cleans up" the matrix to reveal the dimensionality of the row space and therefore the rank of the matrix.

### Non-uniqueness of the echelon form
- Do you always get the *exact* same echelon matrices that I got in all the exercises above?  Maybe, but you might have also gotten somewhat different results.
- The echelon form of a matrix is non-unique.  This means that there are multiple equally valid versions of echelon form reduced matrices.

## 10.4 Gaussian elimination

- Gaussian elimination is one version of row reduction to solving a system of equations.
- First, we will convert the system of equations into matrix form as done in previous sections with a few modifications as follows:

1. Augment the coefficients matrix by the constants vector.
2. Row-reduce to echelon form
3. Apply back-substitution to solve the system. (i.e. revert to original equation format and solve for variables)

Example:

$$
\begin{Bmatrix}
2x + 3y = 8 \\
-2x + 2y = -3
\end{Bmatrix}
$$

1. Augment the coefficients matrix by the constants vector
$$
\begin{bmatrix}
2 & 3 & | & 8 \\
-2 & 2 & | & -3
\end{bmatrix}
$$

2. Row-reduce to echelon form
$$
\begin{bmatrix}
2 & 3 & | & 8 \\
0 & 5 & | & 5
\end{bmatrix}
$$
simplify row 2 further by dividing by 5
$$
\begin{bmatrix}
2 & 3 & | & 8 \\
0 & 1 & | & 1
\end{bmatrix}
$$

3. Apply back-substitution to solve the system. (i.e. revert to original equation format and solve for variables)
$$
\begin{Bmatrix}
2x + 3y = 8 \\
y = 1
\end{Bmatrix}
$$

**SOLUTION:**
$$(x, y) = (5/2, 1)$$

## 10.5 Row-reduced echelon form

- Often abbreviated as RREF, this is row reduction taken to the extreme!
- The goal is to continue row reduction until all pivot points have a value of 1 and each pivot is the only non-zero element of the row

Steps to produce RREF:
1. Transform a matrix into its echelon form as described earlier.
2. For each row that contains a pivot:  
  a. Divide that row by its pivot, which converts the pivot into the number 1.  
  b. Apply row-reduction but work upwards instead of downwards.  That is, use row-reduction to produce zeros in teh elements above the pivot.  Continue "upwards row-recution" until the pivot is the only non-zero element in its column.

- it is often useful to apply step 2 from the bottom row of the matrix to the top.
- i.e. to obtain echelon form of the matrix, work from top row to bottom, then to obtain the RREF, work from the bottom row (or last row with pivots) up to the top.

Examples:

$$
\begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9
\end{bmatrix}
\Rightarrow
\begin{bmatrix}
1 & 0 & -1 \\
0 & 1 & 2 \\
0 & 0 & 0
\end{bmatrix}
$$
$$
\begin{bmatrix}
5 & 2 & 1 & 0 & 4 \\
4 & 1 & 5 & 6 & 3 \\
2 & 1 & 9 & 0 & 4
\end{bmatrix}
\Rightarrow
\begin{bmatrix}
1 & 0 & 0 & 3.4 & -.03 \\
0 & 1 & 0 & -8.6 & 1.967 \\
0 & 0 & 1 & .2 & .233
\end{bmatrix}
$$
$$
\begin{bmatrix}
1 & 2 \\
6 & 2 \\
4 & 7 \\
1 & 7
\end{bmatrix}
\Rightarrow
\begin{bmatrix}
1 & 0 \\
0 & 1 \\
0 & 0 \\
0 & 0
\end{bmatrix}
$$
$$
\begin{bmatrix}
1 & 2 & 3 \\
4 & 1 & 2 \\
6 & 4 & 2
\end{bmatrix}
\Rightarrow
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}
$$

- each system has a unique RREF, but many different matrix systems can lead to the same RREF

In [3]:
# computing the RREF.
# note: in Python, first convert the matrix into a sympy matrix.
import sympy as sym
A = np.random.randn(2, 4)
sym.Matrix(A).rref()

(Matrix([
 [1, 0, 0.653368052064143, -1.24951016350731],
 [0, 1,  -2.1094073038511, 0.309249655951786]]),
 (0, 1))

## 10.6 Gauss-Jordan elimination

- Gauss-Jordan elimination is like a super-charged version of Gaussian elimination.
- It's basically the same procedure as above, exception you modify step 2: instead of row reducing the coefficients-constants augmented matrix to its echelon form, you row-reduce the matrix to its RREF.
- This row reduction takes a bit longer, but the back-substitution becomes *much* easier.

Example:
$$
\begin{bmatrix}
2 & 3 & | & 8 \\
0 & 1 & | & 1
\end{bmatrix}
\rightarrow
\begin{bmatrix}
2 & 0 & | & 5 \\
0 & 1 & | & 1
\end{bmatrix}

\rightarrow
\begin{bmatrix}
1 & 0 & | & 5/2 \\
0 & 1 & | & 1
\end{bmatrix}
$$

Now, the solution becomes obvious:
$$(x, y) = (5/2, 1)$$

## 10.7 Possibilities for solutions

As mentioned earlier, there are 3 possible solutions:

1. The system has no solutions (in 2D, the lines are parallel)
2. The system has exactly one unique solution (in 2D, the lines cross at a point = any non-parallel, non-collinear lines)
3. The system has an infinite number of solutions (in 2D, the lines are collinear / identical)

### No solution (figure 10.5A on p. 292)
- In a 2D system, geometrically, the lines never touch.  That means the lines in the system are parallel to each other.
- mapping an RREF of equations with no solution will provide nonsense results like $0=1$
- i.e. if the system provides inconsistent / illogical answers, then it has no solution

### Unique solution
- In a 2D system, the lines will cross at 1 specific point
- This is the majority of the cases and the examples we worked through in this chapter

### Infinite solutions
- In 2D systems, the lines overlap with each other (they are collinear)
- This happens when there is only 1 linearly independent row, and all other rows are simply linear transformations of that row
- This is not really a "system" anymore, it's actually just 1 equation

## 10.8 Matrix spaces after row reduction

- Row-reduction has implications for the span of matrix subspaces
- The main take-home message here is that row reduction does not affect the row space of a matrix, but it can *drastically* affect the column space of the matrix.  On the other hand, the dimensionalitities of the matrix do not change.

### Rank
- As we know, rank is a property of a matrix, not of rows or columns.
- The rank doesn't change before vs. after row-reduction.
- In fact, row-reduction makes the rank much easier to compute (count the pivots)

### Row space
- This doesn't change after row reduction
- Why? Because a row subspace includes all linear combinations of the rows. And since a row reduced matrix is simply linear combinations of the matrix, then the subspace remains the same.
- The only characteristic of the row space that *could* change is the basis vectors: if you take the rows of the matrix (possibly sub-selecting to get a linearly independent set) as a basis for the row space, then row reduction will change the basis vectors.  But those are just different vectors that span the same space; the subspace spanned by the rows is unchanged before, during, and after row reduction.

### Column space
- The column space actually can change during row reduction
- First off, the dimensionality of a column space does not change with row reduction (since the rank doesn't change)
- But what can change is the actual subspace that is spanned by the columns.  This can happen when the column space occupies a lower-dimensional subspace of the ambient space in which the columns live.
- The reason is that *row* reduction involves changing entire *rows* at a time; individual elements of a column will change while other elements in the same column stay the same.

### Take-home message on matrix spaces after row-reduction
Row reduction...
1. does not affect the rank or the dimensionalitities of the matrix subspaces
2. does not change the row space, and
3. can (but does not necessarily) change the column space.

## 10.9 - 10.10 Exercises

do in group discussion?

## 10.11 - 10.12 Code Challenges

do in group discussion?