In [0]:
# notation: vector and matrix form
# if each element is in R, and the vector has n elements, then the vector
# lies in the set formed by taking the Cartesian product of R n times,
# denoted as Rn.

#1. Introduction
We will start with studying the basic concepts required to learn Deep Learning

Let us try to give some definition that will be very useful for us:
<br>
$x_{i}$ - example
<br>
$y_{i}$ - target value
<br>
$x_{i}$ = ($x_{i1}$, ...., $x_{id}$) - features : for an image, features are intensities of every pixel on the image set.
<br>
$X$ - (($x_{1}$,$y_{1}$), ($x_{2}$,$y_{2}$, ..., ($x_{l}$, $y_{l}$)) - training set
<br>
$a(x)$ - model, hypothesis

The study of linear algebra involves several types of mathematical objects:
* scalars: 
  * A scalar is just a single number. 
  * We write in italics. 
  * We usually give scalars lowercase variables names. 
  * When we introduce them, we specify what kind of number they are. For example, we might say "Let $s$ $\epsilon$ $\mathbb{R}$ be the slope of line," while defining a real-valued scalar.
* vectors: 
  * A vector is an array of numbers. 
  * The number are arranged in order. 
  * We can identify each individual number by its index in that ordering.
  * Typically we give vectors lowercase names in bold typeface, such as $x$. 
  * The element of the vector ared identified by writing its name in italic typeface, with a subscript. 
  * The first element of $x$ is $x_{1}$, the second element is $x_{2}$, and so on.
  * We also need to say what kind of numbers are stored in the vector. If each element is in $\mathbb{R}$, and the vector has $n$ elements, then the vector lies in the set formed by taking the Cartesian product of $\mathbb{R}$$n$ times, denoted as $\mathbb{R}^{n}$.
  * We can think of vectors as identifying points in space, with each element giving the coordinate along a different axis.
  * sometimes we need to index a set of elements of a vector. In this case, we define a set containing the indices and write the set as a subscript. For example, to access $x_{1}$, $x_{3}$, $x_{6}$, we define the set S = {1, 3, 6} and write $x_{s}$.
  * we use the - sign to index the complement of a set. For example $x_{-1}$ is the vector containing all the elements of $x$ except for $x_{1}$, and $x_{-s}$ is the vector containing all elements of x except for $x_{1}$, $x_{3}$, $x_{6}$.
* Matrices: 
  * A matrix is a 2-D array of numbers, so each element is identified by two indices instead of one.
  * We usually give matrices uppercase variable names with bold typeface, such as **A**.
  * If a real-valued matrix **A** has a height of m and a width of n, then we say that **A** $\epsilon$ $\mathbb{R}^{m\times n}$
  * we usually identify the elements of a matrix using its name in italic but not bold font, and the indices are listed with separating commas. For example, $A_{1,1}$ is the upper left entry of **A** and $A_{m,n}$ is the bottom right entry.
  * We can identify all the numbers with vertical coordinate *i* by writing a ":" for the horizontal coordinate. For example, $A_{i,:}$ denotes the horizontal cross section of **A** with vertical coordinate *i*. This is known as the i-th row of **A**. Likewise, $A_{:,i}$ is the i-th column of **A**.
  * Sometimes we may need to index matrix-valued expressions that are not just a single letter. In this case, we use subscripts after the expression but do not convert anything to lowercase. For example, $f(A)_{i,j}$ gives element (i,j) of the matrix computed by applying the function $f$ to A.

________________________
One important operations on matrices is the **transpose**. The transpose of a matrix is the mirror images of the matrix across a diagonal line, called the main diagonal, running down and to the right, starting from its upper left corner. We denote the transpose of a matrix as $A^{T}$, and it is defined such that
$(A^{T})_{i,j}$ = $A_{j,i}$
_______________________
**Vectors** can be thought of as **matrix** that contain only **one column**. The **transpose** of a **vector** is therefore a **matrix** with only **one row**.
____________
Sometimes we define a vector by writing out its elements in the text inline as a row matrix, then using the transpose operator to turn it into a standard column vector, for example $x = [x_{1}, x_{2}, x_{3}]^{T}$.

## Linear model for regression
$a(x) = b + w_{1}x_{1} + w_{2}x_{2}+...+w_{d}x_{d}$
Where:
* $w_{1}, ......., w_{d}$ - coefficients (weights)
* b - bias
* d+1 parameters
* To make it simpler, we suppose in every sample there is a fake feature that will always have a value of one. So, a coefficient with this feature is a bias. We don't analyze bias separately, we suppose it is among the weights.

In vector notation:
<br>
$a(x) = w_{T}x$

For a sample X:
<br>
$a(X) = Xw$


In [0]:
# how to write
# mathematical symbols in italic
# matrix in circular bracket

In [0]:
# analytical solution
# iterative solution
# further study of linear model

In [0]:
# cartesian products
# and R m times n

In [2]:
import numpy as np
X = np.random.rand(49000, 3072)
X

array([[0.7127627 , 0.52683909, 0.36305126, ..., 0.88511529, 0.323427  ,
        0.45963983],
       [0.21153714, 0.37889848, 0.74251371, ..., 0.39198457, 0.61947713,
        0.74909212],
       [0.04245357, 0.25263266, 0.07219145, ..., 0.36772199, 0.70291992,
        0.97831184],
       ...,
       [0.14022919, 0.561953  , 0.92810499, ..., 0.03825376, 0.87987751,
        0.41240738],
       [0.36397976, 0.66355934, 0.11178647, ..., 0.23506835, 0.88370119,
        0.41928292],
       [0.99583432, 0.70290336, 0.80128041, ..., 0.38809767, 0.97495997,
        0.57448472]])

In [3]:
X_copy = X.copy()
X_copy = np.hstack([X, np.ones((X_copy.shape[0], 1))])
X_copy

array([[0.7127627 , 0.52683909, 0.36305126, ..., 0.323427  , 0.45963983,
        1.        ],
       [0.21153714, 0.37889848, 0.74251371, ..., 0.61947713, 0.74909212,
        1.        ],
       [0.04245357, 0.25263266, 0.07219145, ..., 0.70291992, 0.97831184,
        1.        ],
       ...,
       [0.14022919, 0.561953  , 0.92810499, ..., 0.87987751, 0.41240738,
        1.        ],
       [0.36397976, 0.66355934, 0.11178647, ..., 0.88370119, 0.41928292,
        1.        ],
       [0.99583432, 0.70290336, 0.80128041, ..., 0.97495997, 0.57448472,
        1.        ]])

In [0]:
# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print(X_train.shape, X_val.shape, X_test.shape, X_dev.shape)