![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/39118381-910eb0c2-46e9-11e8-81f1-a5b897401c23.jpeg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Numpy: Numeric computing library

NumPy (Numerical Python) is one of the core packages for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, **everything is an object**, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

<img src="https://docs.google.com/drawings/d/e/2PACX-1vTkDtKYMUVdpfVb3TTpr_8rrVtpal2dOknUUEOu85wJ1RitzHHf5nsJqz1O0SnTt8BwgJjxXMYXyIqs/pub?w=726&h=396" />


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on! 

In [1]:
import sys
import numpy as np

## Basic Numpy Arrays

The performance of these arrays is much better than that of the python lists

In [2]:
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

In [3]:
a = np.array([1, 2, 3, 4])

In [4]:
b = np.array([0, .5, 1, 1.5, 2])

In [5]:
a[0], a[1]

(1, 2)

In [7]:
b[3]

1.5

In [6]:
a[0:]

array([1, 2, 3, 4])

In [7]:
a[1:3]

array([2, 3])

In [8]:
a[1:-1]

array([2, 3])

In [9]:
a[::2]

array([1, 3])

In [10]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [11]:
b[0], b[2], b[-1]

(0.0, 1.0, 2.0)

In [10]:
b[[0, 2, -1]]  #select the element from index 0 2 and -1, creates a whole other numPy array

array([0., 1., 2.])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Array Types

Arrays must have a specific type of elements within it

In [13]:
a

array([1, 2, 3, 4])

In [11]:
a.dtype  #always default to 64 bit integers

dtype('int32')

In [15]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [16]:
b.dtype

dtype('float64')

In [12]:
np.array([1, 2, 3, 4], dtype=np.float)  #specifying the type of the array upon declaration

array([1., 2., 3., 4.])

In [18]:
np.array([1, 2, 3, 4], dtype=np.int8)

array([1, 2, 3, 4], dtype=int8)

In [19]:
c = np.array(['a', 'b', 'c'])

In [20]:
c.dtype

dtype('<U1')

In [21]:
d = np.array([{'a': 1}, sys])

In [22]:
d.dtype

dtype('O')

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Dimensions and shapes

In [17]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

In [18]:
A.shape  #tells us the amount of rows and cols

(2, 3)

In [19]:
A.ndim

2

In [20]:
A.size #tells use the total amount of elements within the array

6

In [21]:
B = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4],
        [3, 2, 1]
    ]
])

In [22]:
B

array([[[12, 11, 10],
        [ 9,  8,  7]],

       [[ 6,  5,  4],
        [ 3,  2,  1]]])

In [23]:
B.shape

(2, 2, 3)

In [24]:
B.ndim

3

In [25]:
B.size

12

If the shape isn't consistent, it'll just fall back to regular Python objects:

In [26]:
C = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4]
    ]
])

In [27]:
C.dtype

dtype('O')

In [28]:
C.shape

(2,)

In [29]:
C.size

2

In [30]:
type(C[0])

list

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Indexing and Slicing of Matrices

In [42]:
# Square matrix
A = np.array([
#.   0. 1. 2
    [1, 2, 3], # 0
    [4, 5, 6], # 1
    [7, 8, 9]  # 2
])

In [43]:
A[1]

array([4, 5, 6])

In [44]:
A[1][0]

4

In [45]:
# A[d1, d2, d3, d4]

In [46]:
A[1, 0]  #same as the double index

4

In [47]:
A[0:2]  #Gets the rows 0 to 1

array([[1, 2, 3],
       [4, 5, 6]])

In [48]:
A[:, :2]  #take all rows, take columns up to 2 (0 and 1)

array([[1, 2],
       [4, 5],
       [7, 8]])

In [49]:
A[:2, :2]  #take all the rows up to 2, take all the columns up to 2

array([[1, 2],
       [4, 5]])

In [50]:
A[:2, 2:] #take all rows up to 2, take col from 2 onward

array([[3],
       [6]])

In [51]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [52]:
A[1] = np.array([10, 10, 10])

In [53]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [ 7,  8,  9]])

In [54]:
A[2] = 99  #expands the row for you to match dimensions

In [55]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [99, 99, 99]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Summary statistics

In [62]:
a = np.array([1, 2, 3, 4])

In [63]:
a.sum()

10

In [64]:
a.mean()

2.5

In [65]:
a.std()

1.118033988749895

In [66]:
a.var()

1.25

In [67]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [68]:
A.sum()

45

In [69]:
A.mean()

5.0

In [70]:
A.std()

2.581988897471611

In [71]:
A.sum(axis=0)  #sums of the columns

array([12, 15, 18])

In [72]:
A.sum(axis=1)

array([ 6, 15, 24])

In [73]:
A.mean(axis=0)

array([4., 5., 6.])

In [74]:
A.mean(axis=1)

array([2., 5., 8.])

In [75]:
A.std(axis=0)

array([2.44948974, 2.44948974, 2.44948974])

In [76]:
A.std(axis=1)

array([0.81649658, 0.81649658, 0.81649658])

And [many more](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html#array-methods)...

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Broadcasting and Vectorized operations

Vectorized operations are quick operations that can be applied to all the elements within an array

In [78]:
a = np.arange(4) #creates an array with the length provided

In [67]:
a

array([0, 1, 2, 3])

In [79]:
a + 10  #operation applied to all elements of the array

array([10, 11, 12, 13])

In [80]:
a * 10

array([ 0, 10, 20, 30])

In [81]:
a

array([0, 1, 2, 3])

In [82]:
a += 100

In [83]:
a

array([100, 101, 102, 103])

List Comprehensions are similar to vectorized operations

In [87]:
l = [0, 1, 2, 3]

In [85]:
[i * 10 for i in l]

[0, 10, 20, 30]

.

In [86]:
a = np.arange(4)

In [77]:
a

array([0, 1, 2, 3])

In [89]:
b = np.array([10, 10, 10, 10])

In [90]:
b

array([10, 10, 10, 10])

In [91]:
a + b  #Adds columns that align with each other

array([10, 11, 12, 13])

In [92]:
a * b

array([ 0, 10, 20, 30])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Boolean arrays
_(Also called masks)_

In [82]:
a = np.arange(4)

In [83]:
a

array([0, 1, 2, 3])

In [85]:
a[0], a[-1]

(0, 3)

In [84]:
a[[0, -1]]

array([0, 3])

In [93]:
a[[True, False, False, True]]  #to select specific indexes, we can put True at the indexes that we want

array([0, 3])

In [94]:
a

array([0, 1, 2, 3])

In [96]:
a >= 2  #gives back an array with the result of the comparison for each individual element

array([False, False,  True,  True])

We can basically filter our arrays using the boolean conditions we provide

In [106]:
a[a >= 2]  #gives back an array that filters results according to the condition given

array([2, 3])

In [107]:
a.mean()

1.5

In [108]:
a[a > a.mean()]  #gives back an array with the values greater than the mean

array([2, 3])

In [109]:
a[~(a > a.mean())]  #gives back an array that does not have values greater than the mean

array([0, 1])

In [110]:
a[(a == 0) | (a == 1)]

array([0, 1])

In [111]:
a[(a <= 2) & (a % 2 == 0)]

array([0, 2])

In [114]:
A = np.random.randint(100, size=(3, 3))  #creates random array with a specified size and numbers within the range provided

In [113]:
A

array([[47, 99, 88],
       [77, 49, 83],
       [35, 79, 47]])

In [115]:
A[np.array([
    [True, False, True],
    [False, True, False],
    [True, False, True]
])]

array([42, 69, 61,  1, 16])

In [116]:
A > 30

array([[ True, False,  True],
       [ True,  True,  True],
       [False,  True, False]])

In [117]:
A[A > 30]

array([42, 69, 96, 61, 87, 38])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Linear Algebra

In [118]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [119]:
B = np.array([
    [6, 5],
    [4, 3],
    [2, 1]
])

In [120]:
A.dot(B)

array([[20, 14],
       [56, 41],
       [92, 68]])

In [121]:
A @ B

array([[20, 14],
       [56, 41],
       [92, 68]])

In [122]:
B.T

array([[6, 4, 2],
       [5, 3, 1]])

In [123]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [124]:
B.T @ A

array([[36, 48, 60],
       [24, 33, 42]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Size of objects in Memory

### Int, floats

The size difference between the numbers is very big, this is why numpy is so efficient

In [125]:
# An integer in Python is > 24bytes
sys.getsizeof(1)

28

In [126]:
# Longs are even larger
sys.getsizeof(10**100)

72

In [127]:
# Numpy size is much smaller
np.dtype(int).itemsize

4

In [128]:
# Numpy size is much smaller
np.dtype(np.int8).itemsize

1

In [129]:
np.dtype(float).itemsize

8

### Lists are even larger

In [None]:
# A one-element list
sys.getsizeof([1])

In [None]:
# An array of one element in numpy
np.array([1]).nbytes

### And performance is also important

In [117]:
l = list(range(100000))

In [118]:
a = np.arange(100000)

In [119]:
%time np.sum(a ** 2)

CPU times: user 1.06 ms, sys: 279 µs, total: 1.34 ms
Wall time: 701 µs


333328333350000

In [120]:
%time sum([x ** 2 for x in l])

CPU times: user 36.1 ms, sys: 0 ns, total: 36.1 ms
Wall time: 35.5 ms


333328333350000

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Useful Numpy functions

### `random` 

In [130]:
np.random.random(size=2)

array([0.95810883, 0.01430805])

In [131]:
np.random.normal(size=2)

array([ 1.87546759, -0.40824949])

In [132]:
np.random.rand(2, 4)

array([[0.74897066, 0.08745427, 0.67301059, 0.6778958 ],
       [0.81753964, 0.19061754, 0.73729505, 0.77129554]])

---
### `arange`

In [133]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [134]:
np.arange(5, 10)

array([5, 6, 7, 8, 9])

In [135]:
np.arange(0, 1, .1)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

---
### `reshape`

In [136]:
np.arange(10).reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [137]:
np.arange(10).reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

---
### `linspace`

In [138]:
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [139]:
np.linspace(0, 1, 20)

array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
       0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
       0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
       0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ])

In [140]:
np.linspace(0, 1, 20, False)

array([0.  , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,
       0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])

---
### `zeros`, `ones`, `empty`

In [141]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [142]:
np.zeros((3, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [143]:
np.zeros((3, 3), dtype=np.int)

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

In [144]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [145]:
np.ones((3, 3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [146]:
np.empty(5)

array([1., 1., 1., 1., 1.])

In [147]:
np.empty((2, 2))

array([[0.25, 0.5 ],
       [0.75, 1.  ]])

---
### `identity` and `eye`

In [148]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [149]:
np.eye(3, 3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [150]:
np.eye(8, 4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [151]:
np.eye(8, 4, k=1)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [152]:
np.eye(8, 4, k=-3)

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

In [153]:
"Hello World"[6]

'W'

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)