#### SETS

* sets are well-defined **collections of objects**, each called **elements**.
* we denote a set with an upper case italic letter, e.g. _A_

#### BELONGING AND INCLUSION

* sets are built using the notion of **belonging**
* _a_ belongs to _**A**_ is denoted as a ∈ A
* given two sets _A_ and _B_, if every element of _A_ is also an element of _B_, we say that _A_ is a subset of _B_: A ⊂ B

#### SET SPECIFICATION

* generally, anything we assert about the elements of a set results in **generating a subset**
* given for instance the set of all dogs (_D_), if we assert that _d_ is black, we are saying that there is a subset of elements of _D_ for which it is true, and false for the others
* such a sentence, generate a subset: B={d∈D:d is black} (: -> 'such that')
* **axiom of specification** : _to every set A and every condition S(x) there corresponds a set B whose elements are exactly those elements a ∈ A for which s(x) holds_, where S(x) is a sentence / assertion

#### ORDERED PAIRS

* were have to pairs of sets: unordered and ordered
* let us consider a pair of sets _x_ and _y_, an **unordered pair** is a set whose elements are x,y and x,y = y,x (presentation order dose not matter, the set is the same)
* in ML we usually deal with **ordered pair**, which means that in (x,y) x is the _first coordinate_ and y the _second coordinate_

#### RELATIONS

* from ordered pairs, we can derive the idea of **relations** among sets or between elements and sets
* relations are defined as setsof ordered pairs, and denoted as _R_
* we express the relation as _x R y_
* for any _z ∈ R_, there exist _x_ and _y_ such that _z_ = (x,y)
* the **domain** is a set defined as the values of x such that for at least one element of y, x has a relation with y 
* the **range** is defined as the set formed by the values of y such that at least one element of x, x has a relation with y

#### FUNCTIONS

* given a pair of sets X and Y, we say that a **function** from X to Y is a relation such that:
> dom f = X and  
> for each x in X there is a unique element of Y with (x,y) ∈ f
* we say that a function transform / maps / sends x onto y, and for each argument x there is a unique value y


The ultimate goal of ML is learning function from data, i.e. transformations or mappings from the domain onto the range of a function

In [2]:
# libraries for this section

import numpy as np
import pandas as pd
import altair as alt
alt.themes.enable('dark')

ThemeRegistry.enable('dark')

linear algebra is the study of vectors, which are **ordered finite lists of numbers**, which are used to **represent attributes of entities**
_______
if we have a vector x = age and a second vector y = weight we can add them together and obtain a third vector z = x + y  
we can also multiply 2\*x to obtain 2x which is, again, a vector
_____
we have three types of vectors: **geometric vectors**, **polynomials**, and **elements of Rn space**

* geometric vectors: oriented segments
* a polynomial is an expression adding multiple "terms" (nomials) 
* elements of Rn are a sets of real numbers, which is arguably the most important for applied ML, with n equals the num of dimensions in the vector

In [2]:
# in numpy the vectors are represented as n-dimensional arrays
#this is a vector of R3

x = np.array([[1],
             [2],
             [3]])
print(x.shape)

(3, 1)


There are a couple of special vectors worth remembering: zero vector, unit vector and sparse vectors:
* zero vectors are vectors composed of zeros and zeros only
* unit vectors are vectors composed of a single element equal to one, and the rest are zeros
* sparse vectos are vectors with most of its elements equal to zero

In [7]:
# vector addition
x = y = np.array([[1],
                 [2],
                 [3]])

x + y

array([[2],
       [4],
       [6]])

In [8]:
np.add(x,y)

array([[2],
       [4],
       [6]])

In [10]:
# vector-scalar multiplication

alpha = 5
x = np.array([[1],
             [2],
             [3]])

alpha * x

array([[ 5],
       [10],
       [15]])

In [13]:
# linear combinations of vectors

a, b = 2,3

x, y = np.array([[2],[3]]), np.array([[4], [5]])
a*x + b*y

array([[16],
       [21]])

In [17]:
# vector-vector multiplication is usually called dot product
# or inner product
# x . y -> [x1 x2] [y1 y2] -> x1*y1 + x2*y2
# extremely important for ML

x = np.array([[-2],
             [2]])
y = np.array([[4],
             [-3]])

#.T transpose x
x.T @y


array([[-14]])

if we consider two vectors x and y and two scalars alpha and beta, we obtain the **span** of the vectors by taking all the possible linear combinations of alphax + betay

to understand the concept: if x and y overlaps, the span is a line, if they do not overlap the span is a plan, if we have 3 non overlapping vectors the span is the entire hyper plane

a vector subspace is a vector space that lies within a larger vector space, with three conditions for the subspace:
1. contains the zero vector
2. closure under moltiplication (i.e. a multiplication does not take the subspace outside the vector space)
2. closure under addition (i.e. an addition cannot take the subspace outside the vector space)

**linear dependence**  
- a set of vectors is **linearly dependent** if at least one vector can be obtained as a linear combination of other vectors in the set  
- a set of vectors is **linearly independent** if none vector can be obtained as a linear combination of other vectors in the set 

important point: linearly dependent vectors contain **redundant information**, linearly independent vectors do not

**vector null space**  
the null space of a set of vectors are all linear combinations that map into the zero vector

**vector norms**  
the *norm* (or the *length*) of a vector is the distance between its origin and its end


In [21]:
x=np.array([[3],[4]])
#euclidean norm
eucl = np.linalg.norm(x,2)
#manhattan norm
manth = np.linalg.norm(x,1)
#max norm
maxnorm = np.linalg.norm(x, np.inf)

In [23]:
print(f"euclidean norm: {eucl}\nmanhattan norm: {manth}\nmax norm: {maxnorm}")

euclidean norm: 5.0
manhattan norm: 7.0
max norm: 4.0


In [24]:
distance = np.linalg.norm(x-y,2)
print(distance)

7.0710678118654755


**vector angles and orthogonality**  
In ML, the angle between a pair of vectors is used as a measure of vector similarity.  
An angle between vectors can be thought as a generalization of the **law of cosine** in trigonometry, which defines for a triangle with sides _a_, _b_ and _c_, and an angle $\theta$ are related as:  
$c^2 = a^2 + b^2 - 2abcos\theta$  
we can replace the expression with vector lengths as:  
$||x-y||^2 = ||x||^2 + ||y||^2 -2(||x||||y||)cos\theta$  
which can be cleared as:  
$cos\theta = <x,y>/(||x||||y||)$  
this value must be higher or equal to -1 and less or requal +1.

In [5]:
#calculate cos of theta between a pair of vectors using NumPy:

x, y = np.array([[1],[2]]), np.array([[5],[7]])

cos_theta = (x.T @ y) / (np.linalg.norm(x,2) * np.linalg.norm(y,2))

print(f"cos of the angle: {np.round(cos_theta, 3)}")

#to know the actual angel, we need to take the trigonometric inverse of the cosine function:

cos_inverse = np.arccos(cos_theta)
print(f"angle in radiants = {np.round(cos_inverse, 3)}")

cos of the angle: [[0.988]]
angle in radiants = [[0.157]]


We say that a pair of vectors x and y are **orthogonal** (i.e., independent) if their inner product is zero, in which case we say x ⊥ y (i.e., they are perpendicular)

In [8]:
x = np.array([[2],[0]])
y = np.array([[0],[2]])

cos_theta = (x.T @ y) / ((np.linalg.norm(x, 2))*(np.linalg.norm(y,2)))
print(cos_theta)

[[0.]]


The purpose of linear algebra is to **solve systems of linear equations**, which means multiple equations that have to be solved **simultaneosly**.  
For instance, we have the two equations:  
$x + 2y = 8$ and  
$5x -3y = 1$  
The solution is x = 2 and y = 3  

