[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/alikn/coding_for_analytics/blob/main/numpy/1_hello_numpy.ipynb)

The NumPy notebooks are heavily based on the following resources:
- [*Python Data Science Handbook*](https://jakevdp.github.io/PythonDataScienceHandbook/), Jake VanderPlas
- [*NumPy Illustrated: The Visual Guide to NumPy*](https://betterprogramming.pub/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d), Lev Maximov


# Why do we need NumPy?

In Python, we can assign any kind of data to any variable. In other words, types are inferred dynamically. Among other things, dynamic types makes Python easy to use and popular. The downside is that Python variables are more than just their values and need to contain extra information about their types.

This takes a toll on memory usage when large amount of data needs to be loaded. Also, processing the data stored in Python lists takes much more time than statically typed languages.

Moreover, as we saw in previous lessons, processing data stored in lists in Python needs us to write *for* loops which can get complicated pretty fast. Numpy solves these problems for us.

Almost always, when we talk about data processing in Python, NumPy is involved. It is widely used on its own and many other data processing libraries, such as Pandas, are built on top of it.

# NumPy arrays

The central concept of Numpy is a multi dimensional array. Most operations look the same irrespective of how many dimensions the array has. At first glance, NumPy arrays are very similar to Python lists. They act as containers of elements.

In [1]:
# We need to import NumPy in order to use it
import numpy as np

In [21]:
# This makes generated random numbers repeatable. Don't worry too much about it for now.
np.random.seed(0)

In [2]:
python_list = [1, 2, 3, 4]
numpy_array = np.array([1, 2, 3, 4])
print(python_list)
print(numpy_array)

[1, 2, 3, 4]
[1 2 3 4]


But as we will see later, it is much easier to do arithmatic on NumPy arrays. Here is an example for adding two lists/ arrays of the same dimension.

In [7]:
a = [1, 2, 3]
b = [4, 5, 6]

c = []
for i in range(3):
    c.append(a[i] + b[i])

print(c)

[5, 7, 9]


In [8]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b
print(c)

[5 7 9]


On top of being more user friendly, NumPy arrays are:

- more compact, especially when there’s more than one dimension
- faster than lists when the operation can be vectorized
- slower than lists when you append elements to the end
- usually homogeneous: can only work fast with elements of one type

![](https://miro.medium.com/max/1100/1*D-I8hK4WXC8wtpR5tvR0fw.png)

# Creating an array

Unlike Python, NumPy arrays contain elements of the same type. If types do not match, Numpy will upcast if possible. For example, if an array includes both integers and floats, the array type will be float.

NumPy arrays cannot grow the way Python lists do. Adding an element is more time consuming, because there is no space reserved at the end for added elements. A routine way to create arrays is to first add elements to a Python list and then convert it to an array.

In [11]:
arr = np.array([3, 6, 77])
arr

array([ 3,  6, 77])

In [13]:
# We can also specify the type
arr = np.array([3, 6, 77], dtype='float') 
arr

array([ 3.,  6., 77.])

![](https://miro.medium.com/max/1100/1*oQRt7v_zDO7DNRLRUlCm5w.png)

# Creating arrays from scratch

Numpy has several functions for creating arrays with placeholder values.

In [14]:
# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [15]:
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [16]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [17]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [18]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [19]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

array([[0.41105603, 0.02065706, 0.65475096],
       [0.22595096, 0.85517145, 0.92839044],
       [0.81987553, 0.14095623, 0.45615337]])

In [20]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

array([[5, 6, 8],
       [4, 1, 2],
       [2, 8, 8]])

### Exercise
Create a 3x3 2D NumPy array which contains numbers 1 to 9 and name it *my_array*.

In [None]:
# Add your code here

# Array attributes

NumPy has a few methods for inquiring about the attributes of an array.

In [2]:
x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

In [23]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


As mentioned before, all elements in a NumPy array have the same type. Here is how to check that type.

In [24]:
print("dtype:", x3.dtype)


dtype: int32


In [26]:
# Get help information for a function, class, or module with np.info.
np.info(x3)

class:  ndarray
shape:  (3, 4, 5)
strides:  (80, 20, 4)
itemsize:  4
aligned:  True
contiguous:  True
fortran:  False
data pointer: 0x1789ce72410
byteorder:  little
byteswap:  False
type: int32


### Exercise
Check the dimension, shape and size of *my_array*.

In [None]:
# Add your code here

# Array indexing
For one dimensional arrays, we can use Python's standard indexing just likes lists, but Numpy extends that to multi dimension arrays too.

In [29]:
print(x1)
print(x1[2])
print(x1[-2])

[5 0 3 3 7 9]
3
7


In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices:

In [32]:
print(x2)
print(x2[2, 0])
print(x2[1, 1])
print(x2[-1, 2])

[[3 5 2 4]
 [7 6 8 8]
 [1 6 7 7]]
1
6
7


Elements can be modified in place using their index.

In [33]:
x2[-1, 2] = 99
print(x2)

[[ 3  5  2  4]
 [ 7  6  8  8]
 [ 1  6 99  7]]


### Exercise
Update the element at row 2 and column 0 of *my_array* to 1000.

In [None]:
# Add your code here

# Array slicing
Just like Python lists, we can use slicing for selecting subarray elements.

>x[start:stop:step]

if start or stop are omitted, they are assumed to be start/end of the array. If step is omitted, it is assumed to be 1.

In [34]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [35]:
x[5:]  # elements after index 5

array([5, 6, 7, 8, 9])

In [36]:
x[4:7]  # middle sub-array

array([4, 5, 6])

In [37]:
x2

array([[ 3,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6, 99,  7]])

In [38]:
x2[:2, :3]

array([[3, 5, 2],
       [7, 6, 8]])

In [39]:
print(x2[:, 0])  # first column of x2

[3 7 1]


In [40]:
print(x2[0, :])  # first row of x2

[3 5 2 4]


### Exercise
Get the first row of *my_array* in a new array named *my_subarray*.

In [None]:
# Add your code here

One important–and extremely useful–thing to know about array slices is that they return views rather than copies of the array data. This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies. 

![](https://miro.medium.com/max/1100/1*4xpufyWZWcIbabsOHVlc4g.png)

### Exercise
Change the first element of *my_subarray* to 99. Print *my_array* and see if any of the elements have changed.

In [None]:
# Add your code here

Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the copy() method

In [41]:
x2.copy()

array([[ 3,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6, 99,  7]])

# Reshaping of arrays
A useful operation on arrays is reshaping. It resizes the dimensions of the input array to the given input. Note that for this to work, the total elements of the original array should match the number of elements in the final shape.

In [42]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


# Array concatenation
Concatenation allow us to join two or more arrays in NumPy.

In [43]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

In [45]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [46]:
# concatenate along the first axis
np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [47]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

# Universal functions (UFuncs)

As mentioned before, Python loops are slow. NumPy on the other hand provides convenient interfaces for many arithmatic and logical operations known as verctorized operations. These operations can run very fast and process a large amount of data in a short amount of time.

Vectorized operations in NumPy are implemented via ufuncs, whose main purpose is to quickly execute repeated operations on values in NumPy arrays.

## Array arithmatic

In [49]:
x = np.arange(4)
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)  # floor division
print("-x     = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2  = ", x % 2)

x     = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]
-x     =  [ 0 -1 -2 -3]
x ** 2 =  [0 1 4 9]
x % 2  =  [0 1 0 1]


| Operator	    | Equivalent ufunc    | Description                           |
|---------------|---------------------|---------------------------------------|
|``+``          |``np.add``           |Addition (e.g., ``1 + 1 = 2``)         |
|``-``          |``np.subtract``      |Subtraction (e.g., ``3 - 2 = 1``)      |
|``-``          |``np.negative``      |Unary negation (e.g., ``-2``)          |
|``*``          |``np.multiply``      |Multiplication (e.g., ``2 * 3 = 6``)   |
|``/``          |``np.divide``        |Division (e.g., ``3 / 2 = 1.5``)       |
|``//``         |``np.floor_divide``  |Floor division (e.g., ``3 // 2 = 1``)  |
|``**``         |``np.power``         |Exponentiation (e.g., ``2 ** 3 = 8``)  |
|``%``          |``np.mod``           |Modulus/remainder (e.g., ``9 % 4 = 1``)|

## Mathematical functions
Numpy has many mathematical functions implemented as ufuncs.

In [50]:
x = [1, 2, 3]
print("x     =", x)
print("e^x   =", np.exp(x))
print("2^x   =", np.exp2(x))
print("3^x   =", np.power(3, x))
print("ln(x)    =", np.log(x))
print("log2(x)  =", np.log2(x))
print("log10(x) =", np.log10(x))

x     = [1, 2, 3]
e^x   = [ 2.71828183  7.3890561  20.08553692]
2^x   = [2. 4. 8.]
3^x   = [ 3  9 27]
ln(x)    = [0.         0.69314718 1.09861229]
log2(x)  = [0.        1.        1.5849625]
log10(x) = [0.         0.30103    0.47712125]


## Aggregation functions

In [52]:
print(np.sum(x))
print(np.mean(x))
print(np.min(x))

6
2.0
1


|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |

### Exercise
Create a NumPy array of the scores out of 100 you expect to get from your courses this semester. Calculate min, max, mean and median of your scores.

In [None]:
# Add your code here