# Aram Dovlatyan - Assignment 1 for PyTorch Deep Learning Course

### Exploring the fundamentals of Linear Alegbra with PyTorch

An short introduction about PyTorch and about the chosen functions that relate to Linear Algebra. 
- torch.det
- torch.lstsq
- torch.eig
- torch.qr
- torch.svd

In [1]:
# Import torch and other required modules
import torch

## Function 1 - torch.det

The function torch.det calculates the determinant of a *square* matrix or groups of *square* matrices. It takes a tensor as an input and without loss of too much generality we can think of a tensor as a batch a multiple matrices.

In [28]:
# Example 1 - working

A = torch.tensor([[4.,7.], [2.,3.]]) # dimensions must be nxn, 3 parameters for input
print(A)

torch.det(A)

tensor([[4., 7.],
        [2., 3.]])


tensor(-2.)

In this example, we create a square matrix with dimensions 2x2 ahead of time, then call and apply the .det (determinant) function on the square matrix. The determinant function returns a value of -2.

In [27]:
#Example 2-  working

B = torch.randn(2,3,3)
print(B)

B.det()

tensor([[[-0.3742,  2.0282,  1.9084],
         [ 0.6228,  0.1879, -1.8463],
         [ 0.7574, -3.7132, -0.6964]],

        [[-1.0723, -0.7526, -0.6909],
         [-0.5210, -1.1502,  0.1008],
         [ 1.6173,  0.3579, -0.1398]]])


tensor([-4.0272, -1.3580])

In this example, we create a tensor that consists of 2 matrices, each matrix is a 3x3 square matrix. Our matrix consists of random input and we apply the determinant function on the tensor. It calculates a determinant for each 3x3 matrix and lists it in order.

In [30]:
# Example 3 - breaking
C = torch.tensor([[4,7], [2,3]])
torch.det(A)

RuntimeError: Expected a floating point tensor as input

In this example we created a tensor that consists of 2 matrices that are each 2x2, however, we used integers as input and didn't specify the datatype explicitly. Therefore, PyTorch assumed the datatype to be int64 or in other words, integer. It seems that the determinant function only works on tensors with the floating-point datatype. In order to fix it, we have to turn our Tensor "C" into a floating-point tensor and that can be done by adding a .0 decimal point to any number.

### Summary of the torch.det() function

As we know in Linear Algebra, the determinant is a fundamental characteristic for describing matrices. The determinant obeys many important identities. They are used to construct adjoint matrices, solve a system of n linear equations in n variables by utilizing Cramer's rule, applications in analytic geometry to see how area in a matrix scales from a linear transformation as well as determine volume for outputs from linear transformations.

In general, you would use this function to calculate determinants and build on it for computational efficiency if necessary. Determinants often have multiple different mathematical definitions.



## Function 2 - torch.lstsq

This function computes the solution to the least square and least norm problems for a full-rank matrix A of size (m x n ) and a matrix B of size (m x k). This is fundamental for fitting linear regression models to data. Ordinary Least Squares is an estimation technique that can be represented with linear algebra or calculus.

The returned tensor X, contains solution to each column(variable) in first $n$ rows, the remaining $m-n$ contains residuals for the model in each column.

In [5]:
# Example 1 - working

points = torch.tensor([[1,0], [2,1], [3,3]]) # Let these 3 vectors be 3 points in the cartesian coordinate plane.
print(points)

A = torch.tensor([[1.0,1], [1,2], [1,3]]) # create matrix A for inputs(X coordinates) that satifies Ax = B
B = torch.tensor([0.0,1,3]) # create matrix B for outputs(Y Coordinates) that satisfies Ax=B


print("\n The solution of the best fit where the first 2 rows contain the solution and the final row is the residual.")
X, _ = torch.lstsq(B,A) # determining the values of the X matrix which minimizes the error |Ax-B| and approximates the equality
print(X)

tensor([[1, 0],
        [2, 1],
        [3, 3]])

 The solution of the best fit where the first 2 rows contain the solution and the final row is the residual.
tensor([[-1.6667],
        [ 1.5000],
        [ 0.4082]])


In this example, we take 3 points and represent them with 2 tensors. The first tensor A, represents the X-coordinates of these ordered points, the second tensor B, represents the corresponding Y-coordinates for the points. Fitting a system of linear equations for the best approximation means satisfying the equation $Ax=B$ to the best of ability, in other words, minimizing $(Ax-B)^2$ or the squared error in the model. 

We have determined a linear equation that takes the form of a constant/intercept and variable, X. It looks like <b>Y= B<sub>o</sub> + B<sub>1</sub>*x</b>. The first row in our solution tensor(X) is the intercept, and the 2nd row is the coefficient for the x variable.

The least squares regression line is $y = 3/2x - 5/3$

In [4]:
# Example 2 - working
# Fit a quadratic polynomial of the form y = Co + C1x + C2x^2

A = torch.tensor([[1.0,0,0], [1,5,25,], [1,10,100], [1,15,225], [1,20,400]]) # data for every 5 years of sample population
B = torch.tensor(([4.5,4.9,5.3,5.7,6.1])) # data for populations in millions per 5 year cross-section

X, _ = torch.lstsq(B,A)
print(X)

tensor([[ 4.5000e+00],
        [ 8.0000e-02],
        [-1.1296e-10],
        [-2.0136e-07],
        [ 2.5062e-08]])


In this example, we create a new data table replicating a population for some hypothetical species that is measured every 5 years. The population is in the B tensor denoted by millions, (ie 4.5 million in the first year). We have fit a least squares model to the data and returned a solution with 4.5 as our intercept, 8.0 as coefficient for the first column, and $-1.12*10^{-10}$ for the 2nd column, our quadratic term, however this value is approximately 0, therefore we will exclude it as it wont make an impact on our equation performance.

We fit a regression quadratic polynomial of the form <b>y = Co + C<sub>1</sub>(x) + C<sub>2</sub>(x^2)</b>. Our final linear polynomial looks like this, 
$y = 4.5 + 0.08x$ where y is our population and x is the number of years, for example, if we input 5 for x(after the first 5 years), we would expect a return of the population closs to 4.9.

Indeed $y(5) = 4.5 + 0.08(5) = 4.5 + 0.40 = 4.9$

In [5]:
# Example 3 - breaking

A = torch.tensor([[1.0, 2, 5], [1, 4, 10,], [1, 5, 12.5], [1, 7, 16.5]])
B = torch.tensor(([5.5, 7, 9.5]))

X, _ = torch.lstsq(B,A)
print(X)

RuntimeError: Expected A and b to have same size at dim 0, but A has 4 rows and B has 3 rows

Our tensor A contains 4 rows of data but our tensor B has 3. This means that we are missing an output, we can't fit a regression function without an even number of inputs and outputs in the data. When you use the least squares function, make sure you have the correct number of dimensions setup throughout input and outputs.

### Summary of the torch.lstsq() function

This function is helpful for computing solutions to least squares fitting of a system of linear equations. Generally you can use this function to fit linear equations that also capture non-linearity with quadratic terms, by our definition of linear, we mean that our input parameters are linear. Least squares as an estimator has applications across modeling *inconsistent* systems of linear equations and approximating functions.

## Function 3 - torch.eig

This function computes the eigenvalues and eigenvectors of *real* square matrices.

In [3]:
# Example 1 - working

A = torch.tensor([[2.0,0], [0,-1]])

X =torch.eig(A, eigenvectors = True)
print(X)

torch.return_types.eig(
eigenvalues=tensor([[ 2.,  0.],
        [-1.,  0.]]),
eigenvectors=tensor([[1., 0.],
        [0., 1.]]))


Here we computed the eigenvalues and eigenvectors of a 2x2 square matrix. Our respective *real* eigenvalues are the first element in each row of the returned eigenvalue tensor (the first output), so 2 and -1 are the eigenvalues. The eigenvectors in the second output are represented by each row, so (1,0) and (0,1) are the eigenvectors.

In [4]:
# Example 2 - working

A = torch.tensor([[1.0,0,0], [-1,1,1], [-1,-2,4]])

X = torch.eig(A, eigenvectors = True)
print(X)

torch.return_types.eig(
eigenvalues=tensor([[3., 0.],
        [2., 0.],
        [1., 0.]]),
eigenvectors=tensor([[0.0000, 0.0000, 0.5774],
        [0.4472, 0.7071, 0.5774],
        [0.8944, 0.7071, 0.5774]]))


Here we worked on a 3x3 square matrix and computed both eigenvalues and eigenvectors. Our respective *real* eigenvalues are 3,2 and 1 as they are the first element in each row of the eigenvalue tensor. The eigenvectors are (0,0,0.57), (0.44,.70,.57), (0.89,.70,.57), note that some of these decimals are repeating in similar fashion meaning if we could represent them as fraction it could look cleaner.

In [5]:
# Example 3 - breaking

A = torch.tensor(([1.0,5], [2,4], [0,1]))

X = torch.eig(A, eigenvectors=True)
print(X)

RuntimeError: invalid argument 1: A should be square at /opt/conda/conda-bld/pytorch_1587428266983/work/aten/src/TH/generic/THTensorLapack.cpp:194

By definition, the eigenvalues and eigenvectors are concepts of linear algebra that apply only to square matrices. Square martices have the same dimensional number of rows and columns, ie(nxn). Here we input a 3x2 matrix and ask PyTorch to compute the eigenvalues and eigenvectors and get an error as should be. Make sure prior to calculations for eigenvectors/eigenvalues that the matrices are square!

### Summary about torch.eig function

This function is an important component in lots of machine learning and mathematical code. Finding Eigenvalues and eigenvectors are fundamental problems in Linear Algebra. For square matrices of $ n x n $ dimensions, it seeks to find a non-zero vector x in R^n space such that <b>Ax = &lambda;x</b>. This has applications in mathematical modeling, systems of linear differential equations, rotation of axes problems and more.

## Function 4 - torch.qr

Calculates the QR decomposition of a matrix or batch of matrices. It returns a tuple of (Q,R) of tensors such that the $ input = QR$ where Q is a orthogonal matrix composed of orthonormal columns and R is an upper triangular matrix.

In [4]:
# Example 1 - working

A = torch.tensor([[1.0,1,0], [1,2,1], [0,0,2]])

q,r = torch.qr(A)

print(q)
print(r)

tensor([[-0.7071, -0.7071,  0.0000],
        [-0.7071,  0.7071,  0.0000],
        [-0.0000, -0.0000,  1.0000]])
tensor([[-1.4142, -2.1213, -0.7071],
        [ 0.0000,  0.7071,  0.7071],
        [ 0.0000,  0.0000,  2.0000]])


Here we take a Matrix A of $m$ $x$ $n$ dimensions and rank n and expressed it as the *product* $A=QR$ where Q is an orthogonal matrix and R is an upper triangular matrix. The tensor outputs are in respective order of Q,R

In [8]:
# Example 2 - working

A = torch.tensor([[1.0,1,2,8], [0,1,3,7], [1,0,2,4]])

q,r = torch.qr(A)

print(q)
print(r)

tensor([[-0.7071, -0.4082, -0.5774],
        [-0.0000, -0.8165,  0.5774],
        [-0.7071,  0.4082,  0.5774]])
tensor([[-1.4142, -0.7071, -2.8284, -8.4853],
        [ 0.0000, -1.2247, -2.4495, -7.3485],
        [ 0.0000,  0.0000,  1.7321,  1.7321]])


Similar to the first previous example, here we take another matrix A and computes its orthogonal and upper triangular matrix such that the product $A=QR$ is satisfied.

In [9]:
# Example 3 - breaking

A = torch.tensor([[3,2,1], [9,6,4]])

q,r = torch.qr(A, some=False)

print(q)
print(r)

RuntimeError: "qr_cpu" not implemented for 'Long'

There aren't many things that can go wrong with this function since QR decomposition consists of several algorithms and can be performed on any matrix, whether square or rectangular. Therefore I thought it would be good to implement a common error that arises with many functions in PyTorch and is not special to this one particularly. PyTorch expects floating-point tensors as inputs for many of its functions. Here we only used integers and it infered that the data-type is integer but the function work unless we change a number to represent the tensor as float. This can be fixed by adding .0 to any number without changing any information in the matrix.

### Summary of the torch.qr function

QR Factorization or decomposition of matrix forms the basis for many algorithms in numerical Linear Algebra. It is involved in many of the computations done in machine learning algorithms. Algorithms for computing eigenvalues, and least squares solutions are based off the concept of QR factorization. Each algorithmic method has its pros and cons and should be evaluated for the best fit to any problem.

## Function 5 - torch.svd

This function computes a singular value decomposition of an input *real* valued matrix or batches of real matrices such that the equation $ input = U*diag(S)*V^T $ is satisfied. Where U is an orthnormal matrix of dimension $m*n$, S is a diagonal matrix of dimension $n*n$, and  V is an $n*n$ dimensional orthonormal matrix.

In [3]:
# Example 1 - working

A = torch.tensor([[3.0,2,2], [2,3,-2]])

u, s, v = torch.svd(A)

print(u)
print(s)
print(v)

tensor([[ 0.7071, -0.7071],
        [ 0.7071,  0.7071]])
tensor([5.0000, 3.0000])
tensor([[ 7.0711e-01, -2.3570e-01],
        [ 7.0711e-01,  2.3570e-01],
        [ 4.2130e-08, -9.4281e-01]])


Here we compute the SVD of the 2x3 matrix A. The 3 tensors returned if multiplied from left to right will give us the matrix A. We have decomposed the matrix into a product of 3 matrices.

In [5]:
# Example 2 - working

A = torch.tensor([[1.0,2], [2,1]])

u, s, v = torch.svd(A)

print(u)
print(s)
print(v)

tensor([[-0.7071, -0.7071],
        [-0.7071,  0.7071]])
tensor([3.0000, 1.0000])
tensor([[-0.7071,  0.7071],
        [-0.7071, -0.7071]])


Similar to the first example, we have computed the SVD of the 2x2 matrix A. The 3 tensors represent matrices U, S, V respectively and if multiplied in the order $U*S*V$ will return matrix A.

In [31]:
# Example 3 - breaking

A = torch.tensor([[[1,1,0], [0,1,1.0], [0,0,4]], [[1.0,2], [4,3], [5,7]]])

u, s, v = torch.svd(A, some=False)

print(u)
print(s)
print(v)

ValueError: expected sequence of length 3 at dim 2 (got 2)

Here we lack consistency in our 2 batches of matrices. The first batch contains a 3x3 matrix and our 2nd batch contains a 3x2 matrix. The input for this function has to be zero or more batch dimensions of tensors that consists of $m * n$ matrices. This can happen during massive datasets and backpropoagation where NA values arise.

### Summary of the torch.svd function

The Singular Value Decomposition is a powerful mathematical techniques that has many applications. It is a major component of machine learning algorithms and should be understood well. This function is used in backpropoagation algorithms, dimensionality reduction, and many more applications of numerical linear algebra methods.

## Conclusion

In this introductory notebook, we went over some basic functions in PyTorch focusing on the Linear Algebra methods, concepts and algorithms. These are fundamental tools in Machine Learning research and applications and are helpful in conducting research or trying to construct new methods. They are used as components in many mathematical applications as well. Determinants and eigenvalues/eigenvectors are characteristics of matrices and help us understand the world of matrix objects by creating new associations and identities. Least Squares, QR decomposition and Singular Value decomposition are powerful applications of linera algebra methods used across a variety of disicplines for approximating solutions and representing matrices as products to be better understood.

## Reference Links
Provide links to your references and other interesting articles about tensors
* Official documentation for `torch.Tensor`: https://pytorch.org/docs/stable/tensors.html
* Elementary Linear Algebra, Fifth Edition, by Larson / Edwards / Falvo
* https://en.wikipedia.org/wiki/Singular_value_decomposition
* https://en.wikipedia.org/wiki/QR_decomposition

In [1]:
!pip install jovian --upgrade --quiet

In [2]:
import jovian

In [None]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
