# Numpy

NumPy and SciPy are the twin pillars of doing data science in Python.  Early on in Python's history, it became clear that Python's list data structures weren't ideal for doing heavy-duty number crunching on vectors and matrices. 

So, numpy was born to try to solve the problem, and introduce an array-type data structure into Python.

References
- [Python Data Science Handbook, Ch 02.1](https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html)
- [Python Data Science Handbook, Ch 02.2](https://jakevdp.github.io/PythonDataScienceHandbook/02.02-the-basics-of-numpy-arrays.html)

In [None]:
c=1
print(type(c))

<class 'int'>


## 1 - Creating Numpy Arrays

In [None]:
# here is the import

import numpy as np
print(np.__version__)

1.22.4


---

Notice that we have to pass in a list of numbers rather than 

```python
np.array(1,2,3)  ## ERROR: won't work

np.array([1,2,3]) ## Correct
```

In [None]:
# integer array:
npint_arr = np.array([1,2,3,4])
print(npint_arr)
print(type(npint_arr))

[1 2 3 4]
<class 'numpy.ndarray'>


---

In [None]:
## Create a float array

npfloat_arr = np.array([1,2,3,4],dtype='float64')
print(npfloat_arr)
print(type(npfloat_arr))

[1. 2. 3. 4.]
<class 'numpy.ndarray'>


---

Let's do a sequence of numbers with arange

In [None]:
sequence_arr = np.arange(4,100,2)
print(sequence_arr)

[ 4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98]


---

## 2 - Accessing Numpy Arrays

zero based index

In [94]:
access_arr = np.arange(1,10)
print(access_arr)

[1 2 3 4 5 6 7 8 9]


In [None]:
# first element
access_arr[0]

0

In [None]:
# last elemnt
access_arr[9]

9

In [None]:
# this is an error!
access_arr[10]

IndexError: ignored

In [None]:
# Using negative indexes to walk backwards!
# This is very cool, try it out
# last element
access_arr[-1]

9

In [None]:
  # second to last element
  access_arr[-2]

8

---

## 3 - Manipulating numpy Arrays

In [None]:
# Try to multiply sequence by a scalar
sequence_arr * np.pi

array([ 12.56637061,  18.84955592,  25.13274123,  31.41592654,
        37.69911184,  43.98229715,  50.26548246,  56.54866776,
        62.83185307,  69.11503838,  75.39822369,  81.68140899,
        87.9645943 ,  94.24777961, 100.53096491, 106.81415022,
       113.09733553, 119.38052084, 125.66370614, 131.94689145,
       138.23007676, 144.51326207, 150.79644737, 157.07963268,
       163.36281799, 169.64600329, 175.9291886 , 182.21237391,
       188.49555922, 194.77874452, 201.06192983, 207.34511514,
       213.62830044, 219.91148575, 226.19467106, 232.47785637,
       238.76104167, 245.04422698, 251.32741229, 257.61059759,
       263.8937829 , 270.17696821, 276.46015352, 282.74333882,
       289.02652413, 295.30970944, 301.59289474, 307.87608005])

---

## 4 - Multidimensional Arrays

We can make multi-dimensional arrays from single dimensions with shape.

In [None]:
a = np.array([1,2,3,4,5,6])
a

In [None]:
b = a.reshape(3,2)
print(b)

[[1 2]
 [3 4]
 [5 6]]


---

We can access elements of the 2-D array using array subscription operators.

Remember we do **this**:

```python
b[1,2]
```

not **this**

```python
b[1][2] # BAD!
```

In [None]:
b[1,0]

3

---

## 5 - Handy NP Arrays

We can create 
- array of zeros
- array of random numbers 
- ..etc

In [None]:
## Zero
a = np.zeros(15)
a

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [None]:
## Ones
b = np.ones(20, dtype=int)
b

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [None]:
b = np.ones (10)
b

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [None]:
# Random numbers between 0 to 1
np.random.random(5)

array([0.56636821, 0.29170891, 0.55686808, 0.71403312, 0.29017324])

In [None]:
# Create a  array of 10 random integers in the interval [0, 100)
np.random.randint(0, 100, 15)

array([62, 79, 63, 68, 68, 66, 72, 81, 64, 10, 79, 80, 14,  6, 73])

In [None]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (2, 3))

array([[0, 6, 1],
       [9, 5, 7]])

In [115]:
# dense array fixed type in python
import array
L = list(range(10))
L.append(10.3)
#print(type(L))
A = array.array('i',L)
A
#print(type(A))

TypeError: ignored

---

## 6 - NP Datatypes

![](https://github.com/elephantscale/python-data-analytics/blob/main/assets/images/np-data-types.png?raw=1)

---

## 7 - Indexing and Slicing Arrays

It is imporant to know how to **slice** arrays.  Understand the concept and it will be easier

In [None]:
import numpy as np

a = np.arange(5,15)
a

array([ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [None]:
# Print first element
a[0]

5

In [None]:
# print last element
a[-1]

14

In [None]:
# print every thing after element 5
a[5:]

array([10, 11, 12, 13, 14])

In [None]:
# print upto element 7
a[:8]

array([ 5,  6,  7,  8,  9, 10, 11, 12])

In [None]:
# print elements 3 - 8
a[3:8]

array([ 8,  9, 10, 11, 12, 13])

In [None]:
# or 
a[3 : 8+1]
# which is correct ? :-) 

array([ 8,  9, 10, 11, 12, 13])

In [None]:
# print every thing from element 5 upto last element
a[5:-1]

array([10, 11, 12, 13])

In [None]:
# print every thing from element 5 upto second last element
a[5:-2]

array([10, 11, 12])

---

## FUN: Benchmark NP arrays vs Python lists

So let's do a quick benchmark comparing performance between np array vs python standard lists

### `%timeit`

`timeit` is a handy utilility.  It will measure a code snippet's performance.  It will typically run it a million times and calculate averager time taken

References:
- [1](https://docs.python.org/3/library/timeit.html)

### Small sized arrays

Let's start with an array size of 10

In [None]:
import random

py_list = range(1,10)
np_arr = np.arange(1,10)

In [None]:
# python small list
%timeit  random.choices(py_list, k=1)

1.36 µs ± 335 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [None]:
# np small array
%timeit np.random.choice(np_arr, size=1)

20.3 µs ± 3.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [118]:
size=10
py_list=range(1,size)
print(py_list)
np_arr = np.arange(10,15,1)
print(type(np_arr))

np.random.choice(np_arr, size=5)

range(1, 10)
<class 'numpy.ndarray'>


array([14, 12, 11, 14, 12])

### Large Sized Arrays

In [87]:
size = 10000000
py_list = range(1,size)
np_arr = np.arange (1,size)

In [None]:
# python large list
%timeit  random.choices(py_list, k=1000)

225 µs ± 54.9 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [None]:
# np large array
%timeit np.random.choice(np_arr, size=1000)

59.8 µs ± 7.81 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


### Sum Operation on large array

In [None]:
import random
size=100000
py_list = range(1,size)
np_arr = np.arange(1,size)

In [88]:
%timeit sum(py_list)

297 ms ± 103 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [91]:
%timeit np.sum(np_arr)

10.6 ms ± 723 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### Comparing Memory footprint

This is an approximate measurement of memory footprint.  For accurate measurements we should use memory profilers like heapy.

In [None]:
import sys

size = 1000000
py_list = range(1,size)
np_arr = np.arange (1,size)

print ("python list of {:,} elements takes up {:,} bytes".format(size, sys.getsizeof(py_list) * size))
print ("numpy array of {:,} elements takes up {:,} bytes".format(size, np_arr.size * np_arr.itemsize))

python list of 1,000,000 elements takes up 48,000,000 bytes
numpy array of 1,000,000 elements takes up 7,999,992 bytes


### Discussion

Discus your findings

References
- [Speed comparison. numpy vs python standard](https://stackoverflow.com/questions/52603487/speed-comparison-numpy-vs-python-standard)
- [What are the advantages of NumPy over regular Python lists?](https://stackoverflow.com/questions/993984/what-are-the-advantages-of-numpy-over-regular-python-lists)
- [Difference between list and NumPy array memory size](https://stackoverflow.com/questions/67549486/difference-between-list-and-numpy-array-memory-size)

---
---

## Bonus Exercises

- [guided-machine-learning/python-data-analysis/np-1__numpy-intro.md](https://github.com/elephantscale/guided-machine-learning/blob/master/python-data-analysis/np-1__numpy-intro.md)