# NumPy - Tutorial

## What is NumPy?

 - NumPy, which stands for Numerical Python is a Python library used for scientific computing.
 - It can also be used as an efficient multi-dimensional container for data and and a collection of routines for processing those arrays.
 - It is the most basic and a powerful package for working with data in python.

## Numpy  vs. Lists

Numpy and Lists are similar to each other in the sense that they can both store data, be indexed and be iterated.  

### However...

Numpy uses **less memory**, is **faster**, and **more convenient** than Lists. Also, we cannot perform calculations (add, subtract, multiply, divide and exponentiation) on Python Lists but we can on Numpy Arrays. 

##  Importing Numpy:

In [20]:
import numpy as np

#### Create an array:

One of the most common ways to create a numpy array is to create one from a list by passing it to the np.array function.

In [21]:
a = np.array([1, 2, 3])

- The key difference between an array and a list is, arrays are designed to handle vectorized operations while a python list is not.
- That meams, if you apply a function it is performed on every item in the array, rather than on the whole array object.

Let’s suppose you want to add the number 2 to every item in the list. The intuitive way to do it is something like this:

In [22]:
listA = [1,2,3,4]
listA + 2  # error

TypeError: can only concatenate list (not "int") to list

That was not possible with a list. But you can do that on a ndarray.

In [23]:
# Add 2 to each element of arr1d
arrA = np.array([1,2,3,4]) + 2
arrA

array([3, 4, 5, 6])

#### Finding the size of an array - how many values are in the array

In [24]:
a.size

3

#### Finding the shape of an array - dimensions of the array in the format (rows, columns)

np.arange() - Create large matrices with evenly spaced values within a given interval

np.arange(start, stop, step, dtype) has four parameters:
- start — starting the array from the start number.
- stop — end the array (excluded in stop value)
- step — jump the value
- dtype - The type of the output array. If dtype is not given, infer the data type from the other input arguments.

In [25]:
np.arange(0, 300, 3)

array([  0,   3,   6,   9,  12,  15,  18,  21,  24,  27,  30,  33,  36,
        39,  42,  45,  48,  51,  54,  57,  60,  63,  66,  69,  72,  75,
        78,  81,  84,  87,  90,  93,  96,  99, 102, 105, 108, 111, 114,
       117, 120, 123, 126, 129, 132, 135, 138, 141, 144, 147, 150, 153,
       156, 159, 162, 165, 168, 171, 174, 177, 180, 183, 186, 189, 192,
       195, 198, 201, 204, 207, 210, 213, 216, 219, 222, 225, 228, 231,
       234, 237, 240, 243, 246, 249, 252, 255, 258, 261, 264, 267, 270,
       273, 276, 279, 282, 285, 288, 291, 294, 297])

numpy.reshape(a, newshape, order) - converts the 1-Dimensional array to n-Dimensional matrices
 - a : array_like. Array to be reshaped.
 - newshape : int or tuple of ints. The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.
 - order : {‘C’, ‘F’, ‘A’}, optional

In [26]:
b = np.arange(0, 10, 3)
b = b.reshape(2, 2)
b

array([[0, 3],
       [6, 9]])

np.zeros(shape, dtype, order) - gives the 0 value to each sample in the matrix
 - shape : int or tuple of ints. Shape of the new array, e.g., (2, 3) or 2.
 - order : Whether to store multi-dimensional data in row-major 'C' or column-major 'F order in memory. Default: 'C'

In [27]:
np.zeros((2,3))

array([[0., 0., 0.],
       [0., 0., 0.]])

np.ones(shape, dtype, order) - gives the 0 value to each sample in the matrix
 - dtype and order are not required 

In [28]:
#(2,3,4) is the tuple of array dimensions.
np.zeros((2,3,4))

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])

np.full(shape, fill_value, dtype, order) - gives a custom value to each sample in the matrix
 - fill_value - scalar

In [29]:
np.full((2,2),7)

array([[7, 7],
       [7, 7]])

np.eye(N, M, k, dtype, order) - return a 2-D array with ones on the diagonal and zeros elsewhere.
 - N : int, Number of rows in the output.
 - M : int, optional. Number of columns in the output. If None, defaults to N.
 - k : int, optional. Index of the diagonal: 0 (the default) refers to the main diagonal, a positive value refers to an upper diagonal, and a negative value to a lower diagonal.

In [30]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

numpy.random.random(size) - gives a random value to each sample in the matrix
 - Results are from the “continuous uniform” distribution over the stated interval. 
 - size : int or tuple of ints, optional. Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

In [31]:
np.random.random((3,2))

array([[0.80674808, 0.24714912],
       [0.8741992 , 0.63265395],
       [0.41518862, 0.9252614 ]])

numpy.random.normal(loc, scale, size) - Draw random samples from a normal (Gaussian) distribution.
 - loc : float or array_like of floats. Mean (“centre”) of the distribution.
 - scale : float or array_like of floats. Standard deviation (spread or “width”) of the distribution.
 - size : int or tuple of ints, optional. Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if loc and scale are both scalars.

In [32]:
np.random.normal(loc = 0, scale = 1, size = (3,3))

array([[-2.10484922, -0.8257435 ,  0.57735432],
       [ 1.2567431 ,  1.48748947,  0.79102098],
       [-1.24600836,  0.93402557,  0.93698332]])

### Finding the max, min and sum of an array

In [33]:
numpy_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])


In [34]:
numpy_array.max()

9

In [35]:
numpy_array.min()

1

In [36]:
numpy_array.sum()

45

### Finding the mean, median, variance and standard deviation of an array:

In [37]:
numpy_array.mean()

5.0

In [38]:
np.median(numpy_array)

5.0

In [39]:
numpy_array.var()

6.666666666666667

In [40]:
numpy_array.std()

2.581988897471611

![1_Ikn1J6siiiCSk4ivYUhdgw.png](attachment:1_Ikn1J6siiiCSk4ivYUhdgw.png)

### Subsetting

### How to extract specific items from an array?


In [41]:
a = np.array([1, 2, 3])

Extract the item located in the 2nd index

In [42]:
a[2]

3

In [43]:
b = np.array([[1, 2, 3],[4,5,6]])

Extract the item located in the 1st index row position (2nd row) and in the 2nd index column position (3rd column)

In [44]:
b[1,2]

6

You can extract specific portions on an array using indexing starting with 0, something similar to how you would do with python lists.

But unlike lists, numpy arrays can optionally accept as many parameters in the square brackets as there is number of dimensions.



In [45]:
arr2 = np.array([[ 1,  2,  3,  4], [ 3,  4,  5,  6], [ 5,  6,  7,  8]])

Extract the first 2 rows and columns

In [46]:
arr2[:2, :2]

array([[1, 2],
       [3, 4]])

### Question 

How would you extract the 2nd column of the following array:

In [47]:
arr2d = np.array([[1,2,3],[4,5,6]])

#### Answer

Write your answer to the question in this box.

In [48]:
# Question Code Answer

### Slicing

Extract the items located from the 0th index up to (not including) 2nd index

In [49]:
a = np.array([1, 2, 3])
a[0:2]

array([1, 2])

Extract the items in the rows 0 up to 2 and column 1

In [50]:
b = np.array([[1, 2, 3],[4,5,6]])
b[0:2,1]

array([2, 5])

Extract the rows up to the 1st index

In [51]:
b[:1]

array([[1, 2, 3]])

### Boolean Indexing

Select elements from b less than 3

In [52]:
b[b<3] 

array([1, 2])