In [24]:
import numpy as np
import math

# Numpy Notes
- Numpy is the backbone of Pandas
- Core package for scientific computing in python

## Getting Started

Terms:
- vectorization
- broadcasting 
- universal function.

### Numpy Array

- Numpy can compress python loops to make them faster
- Numpy uses N-dimensional arrays for homogenous (same data type) data.

Simply numpy array, aka a 1 dimensional array:


In [None]:
array1 = np.array([10, 100, 1000.])
print(array1)

Nested numpy array, aka a 2 dimensional array:

In [None]:
array2 = np.array([[1., 2., 3.], [4., 5., 6.]])
print(array2)

2 dimensional arrays let you pick one from each array, like a point on a graph.
#### important: each array a row, and the amount of data in the array is the columns

Numpy uses dtype(datatype) float64, which lets it accomodate all elements.

In [None]:
array1.dtype

Numpy uses its own data types which are more granular than Pythons, so you can switch them in the following manner:

In [None]:
float(array1[0])

## Vectorization and Broadcasting
### Vectorization

- also called an element-wise operation, vectorization is when you add a NumPy array, and a scalar (basic data type, like float or string)

Allows you to write concise code, that looks like mathmatical notation


In [17]:
print(array2)
array3 = array2 + 1
print(array3)

[[1. 2. 3.]
 [4. 5. 6.]]
[[2. 3. 4.]
 [5. 6. 7.]]


Also works with 2 arrays:

In [19]:
array4 = array2 * array2
print(array2)
print(array4)


[[1. 2. 3.]
 [4. 5. 6.]]
[[ 1.  4.  9.]
 [16. 25. 36.]]


### Broadcasting
- using 2 arrays with different shapes in an arithmetic operation, NumPy automatically extends the smaller array across the larger array so that their shapes become compatible

In [21]:
print(array1)
print(array2)

array5 = array2 * array1
print(array5)

[  10.  100. 1000.]
[[1. 2. 3.]
 [4. 5. 6.]]
[[  10.  200. 3000.]
 [  40.  500. 6000.]]


Matrix Multiplication uses the @ operator

You can transpose an array with .T

In [23]:
print(array2 @ array2.T)

[[14. 32.]
 [32. 77.]]


## Universal Functions (ufunc)

Universal functions are a way to apply a function to every element of an array.

This raises an error:


In [None]:
math.sqrt(array2)

This won't:

In [None]:
np.sqrt(array2)

If NumPy offers a ufunc for it, use it as it will be faster than writing it out the long way.

Some are available as array methods:


In [28]:
array6 = array2.sum(axis=0)
print(array6)

[5. 7. 9.]


if you use .sum without specifying an axis, it will add the whole array

## Getting and Setting Array Elements

for matrixes, you can search it like you search a graph:

numpy_array[row_selection, columb_selection]

remember, axis 0 is down, axis 1 is across, and the data starts at 0!

### Array construction
arange stands for array range
reshape generates the correct dimensions

In [32]:
np.arange(2 * 5).reshape(2, 5) #2 rows, 5 columns

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

Without Reshape:

In [31]:
np.arange(2*5)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Also works with random:

In [33]:
np.random.randn(2, 3)

array([[ 1.27557772, -1.3756188 , -2.11373987],
       [-0.24304736, -2.97510021, -1.78916477]])

### View vs copy
by default, you are viewing, like a link to the array

ie. subset = array2(:, :2)
- When you do this, if you modify the view it modifies the original
- to make a copy, add .copy() to the end of the equation

ie. subset = array2(:, :2).copy()