# Lecture 3: Linear Algebra
### General 
- Branch of mathematics: *linear equations* and their representation in the *vector* space using *matrices*
- Used to simplify the process of representing large amounts of information 
- it's the foundation of almost all the machine learning algorithms 
- It is important to reduce the dimensions of data or choose the right *hyperparameters* to reflect 

## Notation, Operations, Matrix Factorization 
### Notation 
- A scalar is a measurable quantity that is entirely characterized by its magnitude
- A vector is an object that has both magnitude and direction (e.g. velocity, force, acceleration)
    - Represented by an arrow with the same direction as the quantity and a length proportional to the magnitude
    - vec{v}


### Operations for working with vectors and matrices 
- some common operations 
    - Multiplication 
    - Addition
    - Inversion
    - Transpose


### Matrix factorization 
- essential for ML, and it is the decomposition of a matrix into product of two or threee matrices 
- Regression algorithms can be simplified using matrix decomposition methods like:  
    - SVD: Singular Value Decomposition 
    - QR decomposition 

### Dot products of two vectors 
- multiplying their corresponding elements and adding the individual results 
- must have same number of elements in both vectors to do dot product 

### Matrix representation of dot product

### Linear independence of vectors 
- set of vectors is linearly independent if NO VECTOR in the set can be expressed as a linear combination of the other vectors in the same set 
- otherwise, the vectors are linearly dependent 

### Norm of a vector 
- the vector's length and is denoted by ||v|| 
- ||v|| = sqrt(v dot v)

### Matrix 
- a rectangular array of numbers arranged in rows and columns 
- can multipy, add, etc of matrices 

### Broadcasting rules 
- in how you apply matrix operations 


### Scalar multiplication 

# Pandas

In [6]:
pip install pandas

In [7]:
import pandas as pd

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [9]:
#Build a data frame from a dictionary 

data = {
    'name':['Mark',"chai","becky","mykel"],
    'age':[20,24,30,4]
}

In [10]:
df = pd.DataFrame(data)

In [11]:
df['name']

0     Mark
1     chai
2    becky
3    mykel
Name: name, dtype: object

In [12]:
#get the first row 
df.loc[0]

name    Mark
age       20
Name: 0, dtype: object

In [13]:
#get the first 3 rows 
df.loc[0:2]

Unnamed: 0,name,age
0,Mark,20
1,chai,24
2,becky,30


In [14]:
#get the number of dimensions in the data frame
df.ndim

2

In [15]:
#data type
type(df)

pandas.core.frame.DataFrame

In [16]:
#type of the data frame column  
type(df['name'])

pandas.core.series.Series

In [17]:
#get the indices for the dataframe. Only works foro numerical indices 
df.index

RangeIndex(start=0, stop=4, step=1)

In [20]:
#another dataframe example 

data = {
    'name': ['mykel','chai','nancy','sorida'], 
    'age': [30,35,20,10],
    'city':['oakland','nola','la','tahoe']
}

df = pd.DataFrame(data)

df

Unnamed: 0,name,age,city
0,mykel,30,oakland
1,chai,35,nola
2,nancy,20,la
3,sorida,10,tahoe


In [21]:
#transpose 
df.T

Unnamed: 0,0,1,2,3
name,mykel,chai,nancy,sorida
age,30,35,20,10
city,oakland,nola,la,tahoe


In [23]:
# get full column(s)
df[['name','age']]

Unnamed: 0,name,age
0,mykel,30
1,chai,35
2,nancy,20
3,sorida,10


In [24]:
# get a specific list of columns i.e. filtering or slicing the data frame
df[['name','age']].loc[0:2]

Unnamed: 0,name,age
0,mykel,30
1,chai,35
2,nancy,20


In [41]:
len(df.columns)

3

In [59]:
#this only works if you have 1 column because the result is a series
df[['name','age']].loc[1:3]

Unnamed: 0,name,age
1,chai,35
2,nancy,20
3,sorida,10


In [64]:
df.loc[2]

name    nancy
age        20
city       la
Name: 2, dtype: object

In [62]:
#gives positon for row AND column index. can't give both in regular loc. need to use iloc for this notation  
df.iloc[2,1]

20

### Iterations 
- important feature in pandas 
- apply something line by line 

#### grab the table and use it's elements row by row to print out meaningful insights 


In [69]:
#method 1
for i in df.index: 
    print('student name:',df['name'][i])

print('')

for i in df.index: 
    print('student name:',df['name'][i],'\t student age:',df['age'][i],'\t age after graduation:',df['age'][i]+4)

student name: mykel
student name: chai
student name: nancy
student name: sorida

student name: mykel 	 student age: 30 	 age after graduation: 34
student name: chai 	 student age: 35 	 age after graduation: 39
student name: nancy 	 student age: 20 	 age after graduation: 24
student name: sorida 	 student age: 10 	 age after graduation: 14


In [71]:
#method 2 - iterrows 
for i, row in df.iterrows(): 
    print('student name:',df['name'][i])  #works the same way with i 

print('')

for i, row in df.iterrows(): 
    print('student name:',row['name']) #can also replace dataframe name with 'row' 

student name: mykel
student name: chai
student name: nancy
student name: sorida

student name: mykel
student name: chai
student name: nancy
student name: sorida
