# REAL Python Starts Here
Until now, all we discussed was the syntax of crude Python operations. It is important to know it and not fall in Syntax errors, but crude Python is slower than most porgramming languages. That is why libraries like NumPy, Pandas, and Matplot-Lib not only allow you to use ready made functionalities, they are **MUCH** faster than normal Python operations.

**NOTICE** in upcoming sessions, you will be evaluated on performance, so make sure 
you understand these libraries very well.

# NUMPY
The first important library you **MUST** use whenever it comes to matrix/vector operations.

Numpy is a general purpose **Array** processing package. It provides high-performance multidimensional array objects, with operations over them, that make them very computationally efficient. 

Think of NumPy as a special kind of ARRAYS called `ndarrays`, where these arrays are very fast, dynamic in length, and have their associated operations like matrix multiplication, summation of elements, getting maximum element, and many other array operations that are efficient.

**HOW** does NumPy achieve efficient operations? 

For a starter, NumPy is written and compiled in C not python (that's why it is much faster than Python).

But more importantly, NumPy allows for **Parallelization** (and this HIGHLY depends on the code you write). 

## How to use numpy
First you need to have the package installed on your python package manager (ANACONDA) using the command `conda install numpy`.

For (pip) using the command 'pip install numpy'.

Then you import the package modules to your work script using:

`import numpy as np`

**GOOD PRACTICE** the `as np` in the last command is not mandatory, it is called an alias (or nickname) for the package so you can use `np` instead of `numpy`. A common practice is to use `np` to refer to `numpy`, and we will use `pd` for pandas and `plt` for `matplotlib` as you will see later in the file 

In [1]:
## import numpy to the file
import numpy as np

## Arrays in NumPy
Arrays in NumPy are like Matrices. 1D arrays are called `vectors`, and nD arrays are high dimentional matrices, for example 2D. 


**VERY IMPORTANT** For 2D matrices, the shape is (`row ,column`), and vectors in the matrices in NumPy are **Horizontal** not vertical like we learn in linear algebra.

There are multiple easy ways to create array objects from numpy.

### 1. Create arrays from Lists values

In [2]:
## create array from values
np_arr = np.array([1, 2, 3, 4]) ##

## print it:
print(np_arr, "is of type", type(np_arr))

## use array_name.shape to get array shape
print("Its shape is", np_arr.shape)  

## use array_name.dtyoe to get type of array elements
print("Its data type", np_arr.dtype)

[1 2 3 4] is of type <class 'numpy.ndarray'>
Its shape is (4,)
Its data type int32


**IMPORTANT** Unlike lists in Python, NumPy arrays should have the same data type fro **ALL** its elements

In [5]:
## create 2D array
two_dim_array = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9]]) ## each of the [1, 2, 3] [4, 5, 6] and [7, 8, 9] is a VECTOR

## print it
print(two_dim_array)   ## observe how the vectors above are Horizontal not vertical

## get its shape
print("Its shape is", two_dim_array.shape)  

## get its data type
print("Its data type", two_dim_array.dtype)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
Its shape is (3, 3)
Its data type int32


### 2. Create Common types of arrays

In [6]:
## create an empty array of size 3*2
arr = np.empty((3,2)) 
print(arr) ## non initialized array 

[[0. 0.]
 [0. 0.]
 [0. 0.]]


In [7]:
## create identity matrix of size 5*5 
#### (Remember, identity matrix has 1s on diagonals and 0s otherwise, and it is always a square matrix)
identity = np.eye(5)

print(identity)

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]


In [8]:
## create array full of 1s  of size 3*4
all_ones = np.ones((3, 4))  
print(all_ones)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [9]:
## create array full of zeros of size 2*2
all_zeros = np.zeros((2,2))
print(all_zeros)

[[0. 0.]
 [0. 0.]]


In [10]:
## create array full of specified value (here character 'a') of size 3*2
all_a = np.full((3,2), 'a')
print(all_a)

[['a' 'a']
 ['a' 'a']
 ['a' 'a']]


## Matrix Operations in NumPy

NumPy allows many built-in matrix operations that are very useful

In [11]:
## reshape matrix
mat = np.array([ [1, 2], [3, 4], [5, 6]]) 

print('old matrix ', mat)
print('old shape is ', mat.shape)

## then reshape it to 6*1 array
mat = mat.reshape(6,1)
print('new matrix', mat)
print(' new shape is', mat.shape)

old matrix  [[1 2]
 [3 4]
 [5 6]]
old shape is  (3, 2)
new matrix [[1]
 [2]
 [3]
 [4]
 [5]
 [6]]
 new shape is (6, 1)


**REMEMBER** The number of elements is the same in the new matrix and the old matrix, so make sure the new dimension are equivalent to that.

i.e.: the rows_old * columns_old = rows_new * columns_new

This is useful because:

If you know one dimension of the new matrix, but not sure the other dimension, you can let NumPy guess it by placing -1 instead of the new dimension value. **REMEMBER** you can use -1 for only one of the dimensions.

In [12]:
mat = np.ones((7,4)) ## a 7*4 matrix

## i want to reshape it to 1D array! i knwo it would look like (x,1) but don't knwo what x should be
new_matrix = mat.reshape(-1,1) # reshape to 1D array

print(new_matrix.shape) ## it correctly reshaped it!

(28, 1)


In [13]:
## Transposing a matrixc is mat.T:
mat = np.array([ [1, 2], [3, 4], [5, 6]]) 

print('transponse of ', mat, 'is',  mat.T, sep='\n')

transponse of 
[[1 2]
 [3 4]
 [5 6]]
is
[[1 3 5]
 [2 4 6]]


In [14]:
## Matrix by Matrix Multiplication using np.dot:
mat1 = np.array([ [1, 2], [3, 4], [5, 6]])  ## of shape 3*2
mat2 = np.array([[1, 3, 5], [2, 4, 6]]) ## of shape 2*3

## result matrix is of shape 3*3
mat1_by_mat2 = np.dot(mat1, mat2) ## rememeber, inner dimensions must be the similar

print(mat1_by_mat2)
print('result shape: ', mat1_by_mat2.shape)

[[ 5 11 17]
 [11 25 39]
 [17 39 61]]
result shape:  (3, 3)


In [15]:
## similar to np.dot, the @ can be used to multiply two matrices arithmatically
## @ calls np.matmul
mat1 = np.array([ [1, 2], [3, 4], [5, 6]])  ## of shape 3*2
mat2 = np.array([[1, 3, 5], [2, 4, 6]]) ## of shape 2*3

mat1_by_mat2 = mat1@mat2

print(mat1_by_mat2) ## same result
print('result shape: ', mat1_by_mat2.shape)

[[ 5 11 17]
 [11 25 39]
 [17 39 61]]
result shape:  (3, 3)


In [16]:
## the astrisk * is used for element-wise multiplication
mat1 = np.array([ [1, 2], [3, 4], [5, 6]])  ## of shape 3*2
mat2 = np.array([ [1, 2], [3, 4], [5, 6]])  ## of shape 3*2

mat1_elements_by_mat2_elements = mat1*mat2 ## they MUST be of the same size
print(mat1_elements_by_mat2_elements)
print('result shape: ', mat1_elements_by_mat2_elements.shape)

[[ 1  4]
 [ 9 16]
 [25 36]]
result shape:  (3, 2)


## Broadcasting in NumPy

Broadcasting is the MOST important concept in numpy. The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations.

In Mathematics, in order to multiply two matrices, their inner dimensions must be equal. for example, if matrix A is of shape `r1*c1`, and matrix B is of shape `r2*c2`, the multiplication is valid only if `c1 is equal to r2`.

However, in numpy, it allows for this operation to occurr if c1 is not equal to r2 under certain conditions.

For example:
if we want to add 5 to all values of the array [1, 2, 3] it goes like this: 

In [17]:
arr = np.array([1, 2, 3])
arr_plus_5 = arr + 5
print(arr_plus_5) 

[6 7 8]


What happened here is that the value 5 is broadcasted to all elements of the array `arr`. Broadcasting means it is **REPEATED** until the shapes are equivalent to perform the `addition` operation.

Another example is if we have the matrix  

`[ [1, 2, 3]
   [1, 2, 3]
   [1, 2, 3]]`
   
and we want to multiply the firs column by 2, the second column by 3, and the third column by 4. 

We can do that using broadcasting of array [2, 3, 4] over the matrix. See:

In [18]:
## create our matrix
mat = np.array([ [1, 2, 3], [1, 2, 3], [1, 2, 3]])

## the multiplication values
multiplication_vals = np.array([2, 3, 4])

## then perform the multiplication using broadcasting:

print(mat*multiplication_vals)

[[ 2  6 12]
 [ 2  6 12]
 [ 2  6 12]]


What happened here is that NumPy found the shape of the first matrix is (3,3) and the second one is (3,) so it repeated the **Smaller** matrix until its shape is like the first one (i.e made it 3 * 3 matrix) then multiplied each element by the corresponding one.

## Array Slicing
Indexing arrays in NumPy is easy and useful. We can index sub-arrays, elements and vectors (horizontally and vertically)

In [19]:
mat = np.arange(25).reshape(5,5) ## create a 5*5 array

print(mat)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]


In [20]:
## get the first row
print(mat[0]) ## similar to mat[0, :]

## get the 2nd column in numpy
print(mat[:, 1])

[0 1 2 3 4]
[ 1  6 11 16 21]


In [21]:
## get the sub array of elements from index (1,1) through the index (3,4)
print(mat[1:4, 1:5]) ## we want rows 1,2 and 3. and columns 1,2,3,4

[[ 6  7  8  9]
 [11 12 13 14]
 [16 17 18 19]]


## Masked Indexing

Another important concept in NumPy is the Masked indexing. We can treat elements of arrays as one object, and index over them using operations.

To check which elements of an array are positive:

In [23]:
arr = np.array([-1, 0, 5, 6, -2, 7, 9, -4]) ## defined array

print(arr>0) ## returns a list of truth (True for +ve and False for non-positive values)

[False False  True  True False  True  True False]


We can use this returned list to index from the array itself.

For example:

In [24]:
array = np.array([1, 2, 3, 4, 5, 6, 7]) ## 

array[ array%2==0 ] = -1 ## the mask returns true for even values, and false for odd values. 
                             ## Then it indexes from the array itsef and assigns -1 to the indices of True values in the mask
print(array)

[ 1 -1  3 -1  5 -1  7]


## Aggregation

NumPy provides a lot of common aggregation functions for arrays.

In [25]:
 # sum, mean, var, std and A LOT more!
arr = np.arange(5) ## similar to np.array([0, 1, 2, 3, 4])
print(arr.mean())    

# If axis is specified, the function does not over the whole array

arr = np.arange(10).reshape(5,2) ## creates a 5*2 array of values from 0-9
print(arr.mean(axis=0))  ## axis = 0 means it creates the mean over the columns (0-> colums, 1->rows)

print(arr.mean(axis=1)) ## axis = 1 for means of rows

2.0
[4. 5.]
[0.5 2.5 4.5 6.5 8.5]


---------------------

# MatplotLib

Matplotlib is one of the most famous and used libraries for visualizations in Python. It has a very expressive API for most used types of graphs.

In [26]:
# import MatplotLib to your work file
from matplotlib import pyplot as plt
import pandas as pd

iris_data = pd.read_csv('Iris.csv') ## we will use these data for plotting

FileNotFoundError: [Errno 2] No such file or directory: 'Iris.csv'

## Line Plots
Line plots are typically used to visualize continuous data sequences like time series, e.g., readings from a sensor, stock market daily data, etc.

1. Plotting one variable 

By default, pyplot will plot that variable against a sequence of its length starting at 0

In [27]:
plt.plot(iris_data.SepalLengthCm.values) 
plt.show()

NameError: name 'iris_data' is not defined

2. Plotting two variables against each other

In [None]:
plt.plot(iris_data.SepalLengthCm.values, iris_data.SepalWidthCm.values)  
plt.show()

In [None]:
# Plotting Known functions: f(x) = x^2 will give a parabola centered at Zero
x = np.linspace(-100, 100)
plt.plot(x, x**2)
plt.show()

## Scatter Plots
Scatter is almost always the first plot to try with unordered data samples.
It places shapes at the data points' location

In [None]:
plt.scatter(iris_data.SepalLengthCm.values, iris_data.SepalWidthCm.values)
plt.show()

**OBSERVE** how using the Scatter plot gives a more meaningful graph than the line plot. Here: scatter plot shows how the Sepal Length is distributed against the Width (dense at certain values and scarce at others). This distribution was not shown in the line plot.

Plotting your data can help you understand things about them. Using the **correct** plot is important.  

## Histogram Plots
Histogram counts the values frequencies. For example, an array of values `[1, 1, 2, 3, 3, 5, 5, 5, 5]` will have a histogram like this: 
`[
1: 2, # because 1 is repeated twice
2: 1, # because 2 exists only once
3: 2,
5: 4
]`

PyPlot has a built in `hist` function to plot histogram of values easily.

In [None]:
plt.hist(iris_data.Species)
plt.show()

## Pie Charts
Pie charts can serve the same role as histograms, and sometimes better at understanding

In [None]:
pie_data = iris_data.Species.value_counts() ## get the histogram but from pandas: the values_counts counts the frequency of each value
plt.pie(pie_data.values, labels=pie_data.index) ## labels are given the spicies' names
plt.show()

**NOTICE** the graph will not be printed until the function plt.show() is called.

## Multiple Plots on the Same Figure

It is useful for plotting multiple data on the same figure. 

For example: to plot both the SepalLengthCm (in red) and the PetalLengthCm (in blue) versus the SepalWidthCm

In [None]:
plt.scatter(iris_data.SepalLengthCm.values, iris_data.SepalWidthCm.values, c='red', marker='s', alpha=0.8, edgecolors='none', s=25) ## s in marker means put a square at data points
plt.scatter(iris_data.PetalLengthCm.values, iris_data.SepalWidthCm.values, c='blue', marker='^',  alpha=0.8, edgecolors='none', s=25) ## ^ means triangles
plt.show()