![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/39118381-910eb0c2-46e9-11e8-81f1-a5b897401c23.jpeg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Numpy: Numeric computing library

NumPy (Numerical Python) is one of the core packages for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, **everything is an object**, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

<img src="https://docs.google.com/drawings/d/e/2PACX-1vTkDtKYMUVdpfVb3TTpr_8rrVtpal2dOknUUEOu85wJ1RitzHHf5nsJqz1O0SnTt8BwgJjxXMYXyIqs/pub?w=726&h=396" />


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on! 

In [2]:
import sys
import numpy as np

## Basic Numpy Arrays

In [3]:
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

In [4]:
a = np.array([1, 2, 3, 4])

In [5]:
b = np.array([0, .5, 1, 1.5, 2])

In [6]:
a[0], a[1] # interesting, the output is a tuple

(1, 2)

In [7]:
a[0:]

array([1, 2, 3, 4])

In [8]:
a[1:3]

array([2, 3])

In [9]:
a[1:-1]

array([2, 3])

In [10]:
a[::3] # seems to display the first value and last value in the array

array([1, 4])

In [11]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [12]:
b[0], b[2], b[-1]

(0.0, 1.0, 2.0)

In [13]:
b[[0, 2, -1]]

array([0., 1., 2.])

In [14]:
c = np.array([2, 4, 6, 8, 10, 11, 12, 13, 14])
c[[1, 5]] # numpy.ndarray type
c[1], c[5] # tuple type

(4, 11)

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Array Types

In [15]:
a

array([1, 2, 3, 4])

In [16]:
a.dtype

dtype('int32')

In [17]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [18]:
b.dtype

dtype('float64')

In [19]:
np.array([1, 2, 3, 4], dtype=np.float)

# what is a numpy scalar type?
# why do we want to specify the dtype within the np.array function?

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  np.array([1, 2, 3, 4], dtype=np.float)


array([1., 2., 3., 4.])

In [20]:
np.array([1, 2, 3, 4], dtype=np.int8)

array([1, 2, 3, 4], dtype=int8)

In [21]:
c = np.array(['a', 'b', 'c'])

In [22]:
c.dtype

dtype('<U1')

In [23]:
d = np.array([{'a': 1}, sys])

In [24]:
d.dtype

dtype('O')

In [25]:
#ab = np.array(['a': 1]) # this code is showing an error
ab = np.array({'a': 1}) # this is an object, interesting
ab.dtype

dtype('O')

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Dimensions and shapes

In [26]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [9, 6, 8]
])

In [27]:
A.shape

(3, 3)

In [28]:
A.ndim # returns the number of dimensions in an array

# what is this function used for?

2

In [29]:
A.size

9

In [30]:
B = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4],
        [3, 2, 1]
    ]
])

In [31]:
B

array([[[12, 11, 10],
        [ 9,  8,  7]],

       [[ 6,  5,  4],
        [ 3,  2,  1]]])

In [32]:
B.shape

(2, 2, 3)

In [33]:
B.ndim

3

In [34]:
B.size

12

If the shape isn't consistent, it'll just fall back to regular Python objects:

In [35]:
C = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4]
    ]
])

  C = np.array([


In [36]:
C.dtype # Python has detected the array above as an object, and not an array

dtype('O')

In [37]:
C.shape

(2,)

In [38]:
C.size

2

In [39]:
type(C[0])

list

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Indexing and Slicing of Matrices

In [40]:
# Square matrix
A = np.array([
#.   0. 1. 2
    [1, 2, 3], # 0
    [4, 5, 6], # 1
    [7, 8, 9]  # 2
])

In [41]:
A[1]

array([4, 5, 6])

In [42]:
A[1][0] # returns the value of index 0 from array 1

4

In [43]:
# A[d1, d2, d3, d4]

In [44]:
A[0, 1] # returns the value of index 1 from array 0. Does the same as A[0][1]?

2

In [45]:
A[0:2] # returns the first two elements of the multidimensional array

array([[1, 2, 3],
       [4, 5, 6]])

In [46]:
A[:, :2] 

# the first :, is the selection of rows
# the second :2 is the selection of columns

array([[1, 2],
       [4, 5],
       [7, 8]])

In [47]:
A[:2, :2]

array([[1, 2],
       [4, 5]])

In [48]:
A[:2, 2:]

# returns the first 2 rows
# returns the third column
# This is a 2-dimensional array

array([[3],
       [6]])

In [49]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [50]:
A[1] = np.array([10, 10, 10]) # changing the values in row 1

In [51]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [ 7,  8,  9]])

In [52]:
A[2] = 99 # Another way of doing a blank change of all values in the row

In [53]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [99, 99, 99]])

In [54]:
Z = np.array([
    [10, 10, 10],
    [5, 6, 7],
    [1, 2, 3],
    [1, 0, 1]
])

In [55]:
print(Z.ndim)
print(Z.shape)
print(Z.size)

2
(4, 3)
12


In [56]:
Z[2, 2]

3

In [57]:
Z[3, 1] = 9

In [58]:
Z[0] = [3, 2, 1]

In [59]:
Z

array([[3, 2, 1],
       [5, 6, 7],
       [1, 2, 3],
       [1, 9, 1]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Summary statistics

In [60]:
a = np.array([1, 2, 3, 4])

In [61]:
a.sum()

10

In [62]:
a.mean()

2.5

In [63]:
a.std() 

# a quantity expressing by how much the members of a group differ 
# from the mean value for the group
# std close to 0 means data are clustered around the mean
## std is the square root of the variance

1.118033988749895

In [64]:
a.plot(kind='kde', figsize=(6,6))

# so how do we plot numpy?

AttributeError: 'numpy.ndarray' object has no attribute 'plot'

In [65]:
a.var()

# computes the variance of the array
# a measure of dispersion that takes into account the spread of 
# all data points in a data set

1.25

In [66]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [67]:
A.sum()

45

In [68]:
A.mean()

5.0

In [69]:
A.std()

2.581988897471611

In [70]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [71]:
A.sum(axis=0) # what is axis=0 referring to? The x axis? The rows?

array([12, 15, 18])

In [72]:
A.sum(axis=1)

array([ 6, 15, 24])

In [73]:
A.mean(axis=0)

array([4., 5., 6.])

In [74]:
A.mean(axis=1)

array([2., 5., 8.])

In [75]:
A.std(axis=0)

array([2.44948974, 2.44948974, 2.44948974])

In [76]:
A.std(axis=1)

array([0.81649658, 0.81649658, 0.81649658])

And [many more](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html#array-methods)...

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Broadcasting and Vectorized operations

In [77]:
a = np.arange(4)

# what does the arrange function do?
# returns evenly spaced intervals within a given interval

In [78]:
a

array([0, 1, 2, 3])

In [79]:
a + 10
# you can immediately add a value to all of the elements in the array

array([10, 11, 12, 13])

In [80]:
a * 10

array([ 0, 10, 20, 30])

In [81]:
a

array([0, 1, 2, 3])

In [82]:
a += 100

In [83]:
a

array([100, 101, 102, 103])

In [84]:
l = [0, 1, 2, 3]

In [85]:
[i * 10 for i in l]

# for every element in l, multiply the element with 10

[0, 10, 20, 30]

In [86]:
[a * 2 for a in l]

[0, 2, 4, 6]

In [87]:
a = np.arange(4)

In [88]:
a

array([0, 1, 2, 3])

In [89]:
b = np.array([5, 10, 10, 10])

In [90]:
b

array([ 5, 10, 10, 10])

In [91]:
a + b

# you can add two arrays together of the same length

array([ 5, 11, 12, 13])

In [92]:
a * b

# you can also multiply two arrays of the same length

array([ 0, 10, 20, 30])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Boolean arrays
_(Also called masks)_
Why are they called masks?

In [93]:
a = np.arange(5)

In [94]:
a

array([0, 1, 2, 3, 4])

In [95]:
a[0], a[-1]
# this returns a tuple. What's the utility of a tuple?

(0, 4)

In [96]:
a[[0, -1]]
# this returns an array. What's the utility of an array?

array([0, 4])

In [97]:
a[[True, False, False, True, True]]
# you can mask with boolean operators

array([0, 3, 4])

In [98]:
a

array([0, 1, 2, 3, 4])

In [99]:
a >= 2
# applies the boolean operation on the entire array

array([False, False,  True,  True,  True])

In [100]:
a[a > 2]
# I see now. So this is like a mask, like df[df.quantity > 3]

array([3, 4])

In [101]:
a.mean()

2.0

In [102]:
a[a > a.mean()]

array([3, 4])

In [103]:
a[~(a > a.mean())]

# so the tilde symbol returns the opposite of the boolean operation?

array([0, 1, 2])

In [104]:
a[~(a == 3)]

array([0, 1, 2, 4])

In [105]:
a[(a == 0) | (a == 1)]

array([0, 1])

In [106]:
a[(a <= 2) & (a % 2 == 0)]

array([0, 2])

In [107]:
A = np.random.randint(100, size=(3, 3))

# size returns the shape of the array
# 100 is the highest limit of the returned array

In [108]:
A

array([[63, 86, 26],
       [43, 63, 39],
       [31, 18, 81]])

In [109]:
A[np.array([
    [True, False, True],
    [False, True, False],
    [True, False, True]
])]

array([63, 26, 63, 31, 81])

In [110]:
A > 30
# returns trues and falses for values that meet the condition

array([[ True,  True, False],
       [ True,  True,  True],
       [ True, False,  True]])

In [111]:
A[A > 70]
# this masks the values that meet the conditions

array([86, 81])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Linear Algebra

In [112]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [113]:
B = np.array([
    [6, 5],
    [4, 3],
    [2, 1]
])

In [114]:
A.dot(B)
#dot product of two arrays; multiply matching members
#https://www.mathsisfun.com/algebra/matrix-multiplying.html

array([[20, 14],
       [56, 41],
       [92, 68]])

In [115]:
A @ B
# matrix multiplication? The same as dot multiplication

array([[20, 14],
       [56, 41],
       [92, 68]])

In [116]:
B.T
#transposes the array?

array([[6, 4, 2],
       [5, 3, 1]])

In [117]:
A
# when is dot multiplication used in data science?

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [118]:
B.T @ A

array([[36, 48, 60],
       [24, 33, 42]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Size of objects in Memory

### Int, floats

In [161]:
# An integer in Python is > 24bytes
sys.getsizeof(5)
# quiz question asking for size of integer 5 said size is 20???

28

In [120]:
# Longs are even larger
sys.getsizeof(10**100)

72

In [121]:
# Numpy size is much smaller
np.dtype(int).itemsize

4

In [122]:
# Numpy size is much smaller
np.dtype(np.int8).itemsize

1

In [123]:
np.dtype(float).itemsize

8

In [124]:
np.dtype(np.int16).itemsize

2

### Lists are even larger

In [125]:
# A one-element list
sys.getsizeof([1])

64

In [126]:
# An array of one element in numpy
np.array([1]).nbytes

4

### And performance is also important

In [127]:
l = list(range(100000))

In [128]:
a = np.arange(100000)
# what is arange? Returns evenly spaced values within a given interval

# %time np.arange(100000) - 0 ns
# %time list(range(100000)) - 1.99 ms

In [129]:
%time np.sum(a ** 2)

Wall time: 2.99 ms


216474736

In [130]:
%time sum([x ** 2 for x in l])
# performing functions/operations on lists takes much longer than
# doing the same for arrays

Wall time: 40.9 ms


333328333350000

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Useful Numpy functions

**Random:** use to create arrays with random numbers \
**Arange:** use to create arrays with evenly spaced integers and defined upper limit\
**Reshape:** throws the array into the format/dimension you want\
**linspace:** returns evenly spaced integers over a specified interval\
**zeros:** Return a new array of given shape and type, filled with zeros.\
**ones:** Return a new array of given shape and type, filled with ones.\
**empty:** Return a new array of given shape and type, without initializing entries. Why not initialise entries?\
**identity:** The identity array is a square array with ones on the main diagonal.\
**eye:** Return a 2-D array with ones on the diagonal and zeros elsewhere.

In [131]:
np.arange(0, 1, .1)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

In [132]:
np.linspace(0, 1, 9)
# This line of code essentially divides 1 by 8

array([0.   , 0.125, 0.25 , 0.375, 0.5  , 0.625, 0.75 , 0.875, 1.   ])

### `random` 

In [133]:
np.random.random(size=5)

array([0.4376495 , 0.28221116, 0.35045334, 0.24504708, 0.54121731])

In [134]:
np.random.normal(size=2)

array([-0.58678893,  0.82918528])

In [135]:
np.random.rand(2, 4)

array([[0.55855457, 0.63737287, 0.87148374, 0.67402506],
       [0.5788011 , 0.12914476, 0.01045761, 0.69981209]])

---
### `arange`

In [136]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [137]:
np.arange(5, 10)

array([5, 6, 7, 8, 9])

In [138]:
np.arange(0, 1, .1)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

---
### `reshape`

In [139]:
np.arange(10).reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [140]:
np.arange(10).reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

---
### `linspace`

In [141]:
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [142]:
np.linspace(0, 1, 20)

array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
       0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
       0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
       0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ])

In [143]:
np.linspace(0, 1, 20, False)

array([0.  , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,
       0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])

---
### `zeros`, `ones`, `empty`

In [144]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [145]:
np.zeros((3, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [146]:
np.zeros((3, 3), dtype=np.int)

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  np.zeros((3, 3), dtype=np.int)


array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

In [147]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [148]:
np.ones((3, 3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [149]:
np.empty(5)

array([1., 1., 1., 1., 1.])

In [150]:
np.empty((2, 2))

array([[0.25, 0.5 ],
       [0.75, 1.  ]])

---
### `identity` and `eye`

In [151]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [152]:
np.eye(3, 3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [153]:
np.eye(8, 4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [154]:
np.eye(8, 4, k=1)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [155]:
np.eye(8, 4, k=-3)

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

In [156]:
"Hello World"[6]

'W'

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)