# NumPy Crash Course

Hi there, 

Today we will take a look at the NumPy Python library. NumPy is used for numerical processing. Keep in mind that some basic Python knowledge is required. I recommend that you follow along with a Jupyter-notebook so that you can see the output of the code and also experiment using different inputs.


In this article you will learn how to:
* Create numpy arrays
* Generate random numbers, and how to set a seed
* Perform operations using arrays
* How to reshape an array
* From an N-dimensional array how to:
    1. get  a single element
    2. get  a row and a column
    3. slice
    4. do masking


Along the way, we will see some __tips__ and __tricks__ you can use to make coding more efficient and easy.

I hope you will enjoy it 🙂

First do _pip install numpy_ , then import it in your jupyter notebook:

In [1]:
import numpy as np

In [5]:
?np.random.randint

## Creating numpy arrays
Creating arrays can be done using a list, or built-in functions. Let's see how each of them works.

### A. Creating numpy arrays using a list
We can create an array using a list. 

We first create a list

In [2]:
my_list = [0,1,2,3]
my_list

[0, 1, 2, 3]

and then convert it to a numpy array.

In [3]:
my_array = np.array(my_list)
my_array

array([0, 1, 2, 3])

### B. Creating numpy arrays using built-in functions
Numpy has many built-in functions that provide a fast and efficient way to do create arrays.

## Trick:
To see all available functions, type the name of your array and then press tab.

In [None]:
my_array.  # press tab

#### 1. Arange built-in function. 
Imagine you want to create an array of size 5 which has numbers from 0 to 4. 
We could do it like this:

In [4]:
np.array([0,1,2,3,4])

array([0, 1, 2, 3, 4])

But let's say that we need to create an array of size 100 which has all the numbers from 0 to 99. It would probably be very painful to do it using the previous method. This is where arange comes to save the day. 

Here we simply need to specify the starting point and the endpoint (note that the endpoint will not be included in the array).

In [5]:
np.arange(0, 100)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

We can also specify a step size. Let's say we want every second element of this array, we can simply do it with adding the step size in the arange function:

In [6]:
np.arange(0, 100, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,
       34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66,
       68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98])

#### 2. Linspace built-in function, to create linearly spaced arrays
Linspace takes as input a starting point, an endpoint, and the number of points evenly spaced between them which also indicates the length of the array. 

Let's see an example

In [7]:
np.linspace(1, 10, 5) # From 0 to 10, 5 elements, evenly spaced between them.

array([ 1.  ,  3.25,  5.5 ,  7.75, 10.  ])

Here the starting point is 1 and the endpoint is 10. Linspace creates an array with 5 evenly spaced numbers between the start and endpoint.

#### 3. Random built-in function to create random arrays
What if we don't care about what numbers our array will have? Well, we simply put random numbers there, right? The random.randint function does exactly that. 

It takes as inputs a starting point, an endpoint (which is not included in the array), and a size of the shape (no worries, we will look into it in a bit).

In [8]:
np.random.randint(0, 4, (2,2))

array([[2, 0],
       [0, 2]])

Ok but now I want to generate the same random numbers with you! How can we do it?

Simple, we will use a seed! If you type in one sell : 

we will get the same random numbers!

In [9]:
np.random.seed(12)
same_arr = np.random.randint(0, 10, 20)
same_arr

array([6, 1, 2, 3, 3, 0, 6, 1, 4, 5, 9, 2, 6, 0, 5, 8, 2, 9, 3, 4])

#### 4. Sample from a normal (Gaussian) distribution

What if we need a sample of size 100 from the Gaussian distribution, with mean 0 and standard deviation 1?
Well, we use again .random but here instead of .randint we use .normal!

In [10]:
np.random.seed(101)
norm_arr = np.random.normal(0, 1, 100)
norm_arr

array([ 2.70684984e+00,  6.28132709e-01,  9.07969446e-01,  5.03825754e-01,
        6.51117948e-01, -3.19318045e-01, -8.48076983e-01,  6.05965349e-01,
       -2.01816824e+00,  7.40122057e-01,  5.28813494e-01, -5.89000533e-01,
        1.88695309e-01, -7.58872056e-01, -9.33237216e-01,  9.55056509e-01,
        1.90794322e-01,  1.97875732e+00,  2.60596728e+00,  6.83508886e-01,
        3.02665449e-01,  1.69372293e+00, -1.70608593e+00, -1.15911942e+00,
       -1.34840721e-01,  3.90527843e-01,  1.66904636e-01,  1.84501859e-01,
        8.07705914e-01,  7.29596753e-02,  6.38787013e-01,  3.29646299e-01,
       -4.97104023e-01, -7.54069701e-01, -9.43406403e-01,  4.84751647e-01,
       -1.16773316e-01,  1.90175480e+00,  2.38126959e-01,  1.99665229e+00,
       -9.93263500e-01,  1.96799505e-01, -1.13664459e+00,  3.66479606e-04,
        1.02598415e+00, -1.56597904e-01, -3.15791439e-02,  6.49825833e-01,
        2.15484644e+00, -6.10258856e-01, -7.55325340e-01, -3.46418504e-01,
        1.47026771e-01, -

## Trick!
Type __?np.random.normal__ and you will get the information about how to use this function. This helps a lot if you don't want to remember what inputs each function needs and what the function returns.

In [None]:
?np.random.normal

Let's check if the mean of the norm_arr array is close to 0 and the standard deviation close to 1.

In [11]:
norm_arr.mean()

0.166369880423112

In [12]:
norm_arr.std()

1.0338189430873386

If you increase the sample size these numbers will get closer and closer to 0 and 1 respectively.

Let's find the minimum and maximum number in the norm_arr array and at which index each of these are!

In [13]:
norm_arr.min() # minimun value

-2.1412122910809264

In [14]:
norm_arr.max() # maximun value

2.706849839399938

In [15]:
norm_arr.argmin() # index of the minimun value

75

In [16]:
norm_arr.argmax() # index of the maximun value

0

The last step is to put everything in order. Let's sort the _norm_arr_ array!

In [17]:
np.sort(norm_arr)

array([-2.14121229e+00, -2.01816824e+00, -1.97260510e+00, -1.70608593e+00,
       -1.46751402e+00, -1.38292010e+00, -1.22308204e+00, -1.15911942e+00,
       -1.13664459e+00, -1.13381716e+00, -1.04677954e+00, -1.00518692e+00,
       -9.93263500e-01, -9.43406403e-01, -9.33237216e-01, -9.25874259e-01,
       -9.05099902e-01, -8.66885035e-01, -8.55196041e-01, -8.48076983e-01,
       -7.58872056e-01, -7.55325340e-01, -7.54069701e-01, -7.41789705e-01,
       -7.32845148e-01, -6.10258856e-01, -5.89000533e-01, -5.68581361e-01,
       -5.38234626e-01, -4.97104023e-01, -4.94095358e-01, -4.79448039e-01,
       -3.91156627e-01, -3.76518675e-01, -3.46418504e-01, -3.19318045e-01,
       -1.62534735e-01, -1.56597904e-01, -1.34840721e-01, -1.16773316e-01,
       -3.15791439e-02, -3.11604815e-02,  3.66479606e-04,  7.29596753e-02,
        1.47026771e-01,  1.66904636e-01,  1.84501859e-01,  1.87124522e-01,
        1.88695309e-01,  1.90794322e-01,  1.96799505e-01,  2.21490685e-01,
        2.30336344e-01,  

Now we have an array that starts with the minimum value and goes up to the maximum.

### Tip:
1. To check the dimensions of an array use the name of your array and then .ndim
2. To check the shape of an array use the name of your array and then .shape

But what are the dimensions and the shape of an array?

Let's look at an example.

Imagine what you have a nested list looking like this: 

[ [1,2], [3,4],[5,6],[7,8] ]

Here, you have one list, and inside that list, you have 4 more lists, each of them has 2 elements. Great. Now imagine having an even bigger list, containing 3 times the previous one! And we transform all of these lists to numpy arrays.

In [18]:
small_array  = np.array([1,2])
small_array

array([1, 2])

In [19]:
medium_array = np.array([[1,2],[3,4],[5,6],[7,8]])
medium_array

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

In [20]:
big_array = np.array([ [[1,2],[3,4],[5,6],[7,8]] , [[1,2],[3,4],[5,6],[7,8]], [[1,2],[3,4],[5,6],[7,8]] ])
big_array

array([[[1, 2],
        [3, 4],
        [5, 6],
        [7, 8]],

       [[1, 2],
        [3, 4],
        [5, 6],
        [7, 8]],

       [[1, 2],
        [3, 4],
        [5, 6],
        [7, 8]]])

We learned that to get the dimensions from an array we use .ndim, so let's try it out!

In [21]:
small_array.ndim

1

In [22]:
medium_array.ndim

2

In [23]:
big_array.ndim

3

To see that shape of an array use .shape

In [24]:
small_array.shape

(2,)

In [25]:
medium_array.shape

(4, 2)

In [26]:
big_array.shape

(3, 4, 2)

What we see here is that the dimensions have to do with how nested the lists are!

The shape of an array is the length of the lists used to create the array. In the big_array, we have 3 lists, each of them has 4 lists nested in them and each of these 4 lists is of length 2. So the shape is: (3, 4, 2)

#### 5. Arrays of zeros
To create an array which has zero at each index we need to specify:
1. The length of the array.
2. The shape (as a tuple). When no shape is specified, the default
 is (array's length, ).


The output is an array with floats of 0's.

In [27]:
# (3,) shape numpy array with 3 zeros
array1 = np.zeros(3)
array1

array([0., 0., 0.])

In [28]:
# (2,3) shape numpy array with 6 zeros
array2 = np.zeros((2,3))
array2

array([[0., 0., 0.],
       [0., 0., 0.]])

#### 6. Arrays of ones
As with the .zeros function, the same things apply here. To create an array which has one at each index we need to specify:
1. The length of the array.
2. The shape (as a tuple). When no shape is specified, the default is (array's length, ).

The output is an array with floats of 1's.

In [29]:
# (3,) shape array with 3 ones
np.ones(3)

array([1., 1., 1.])

In [30]:
# (2,3) shape array with 6 ones
np.ones((2,3))

array([[1., 1., 1.],
       [1., 1., 1.]])

#### 7. Full built-in function

O's and 1's are maybe not the only numbers that we probably want to use. What if we want to create an array which has at each index is the number 4? We can use the built-in function full.

First, we specify the shape of the array and then the number that we want at each index.

In [31]:
all_4 = np.full((2, 3), 4)
all_4

array([[4, 4, 4],
       [4, 4, 4]])

---
### At this point, we know two ways to create numpy arrays. 
The first way is using a list, and the second-way using built-in functions. Now it's time to discover what we can do with these arrays! First, we will take a look at how to perform operations, then how to reshape the array, and finally how to access specific elements from an array.   


---


### Operations
Let's say we have two arrays called _a_ and _b_ with the same shape and length. 

We can:
* Take the sum of the corresponding indexes.
* Add at each index the element value.

The same things apply to subtraction, multiplications, and division.





In [32]:
a = np.random.randint(0, 10, 3) # Array with 3 random numbers.
a

array([1, 9, 9])

In [33]:
b = np.random.randint(0, 10, 3) # Array with 3 random numbers.
b

array([2, 0, 2])

In [34]:
a + b # addition of two arrays

array([ 3,  9, 11])

In [35]:
a + 1 # adding 1 to each element of a

array([ 2, 10, 10])

### Reshape
Imagine that we have an array of length 9. 

In [36]:
first_arr = np.arange(0,9)
print(first_arr)
first_arr.shape

[0 1 2 3 4 5 6 7 8]


(9,)

The goal here is to create a new array, containing the same data but with a new shape! Let's choose (3,3) shape. We can use the reshape built-in function.

In [37]:
sec_arr = first_arr.reshape(3,3)
print(sec_arr)
sec_arr.shape

[[0 1 2]
 [3 4 5]
 [6 7 8]]


(3, 3)

So now we have spit the first array in 3 parts, creating a (3,3) shaped array! Can you guess the dimension of the second array? How many things do you see nested inside the array? Let's check:

In [38]:
sec_arr.ndim

2

It's like the medium_array we saw previously.

###  Access specific elements from an array
We created our arrays, of various dimensions and shapes. But how can we access the elements that we want?

First, think of a 2-dim array, having 5 nested arrays, of 5 elements each. You can have this array as a matrix in your head. 

In [3]:
mat = np.arange(0, 25).reshape(5, 5)
mat

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

#### Get the first element of the matrix.
But what _is_ here the first element? Well, we have a nested array, so the first element should be an array right? 
Let's check. 

_Note that the indexing in python starts at 0!_

In [40]:
mat[0]

array([0, 1, 2, 3, 4])

#### Get the last element of the matrix.
Instead of counting how long your array is you can use -1 to get the last element.

In [41]:
mat[-1]

array([20, 21, 22, 23, 24])

#### Get a single element
Now let's say we want to get the value 0 of this matrix.

In [42]:
mat[0,0]

0

#### Get the 3rd row
Again, indexing starts at 0, so the 3rd row is at index 2.

In [43]:
mat[:,2]

array([ 2,  7, 12, 17, 22])

#### Get the 2nd column

In [4]:
mat[1,:]

array([5, 6, 7, 8, 9])

#### Slicing
Here we want to get a slice or a piece of the matrix. Let's say we want a 3x3 matrix with the values from the upper left corner from the original matrix. We can do it like this:

In [45]:
mat[0:3, 0:3]

array([[ 0,  1,  2],
       [ 5,  6,  7],
       [10, 11, 12]])

#### Masking
Masking is useful if we want to specify some limits. Let's say we only want from the matrix the values which are lower than 5. 

In [46]:
mat < 5 # matrix with boolean values

array([[ True,  True,  True,  True,  True],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False]])

#### Get the actual values
To get the actual values and not an array with boolean values we can do it like this:

In [47]:
my_filer = mat < 5
mat[my_filer]

array([0, 1, 2, 3, 4])

Or just

In [48]:
mat[mat<5]

array([0, 1, 2, 3, 4])

# Hooray!

With that, you actually completed the numpy crash course! Be proud of yourself that you did it! I hope this was useful for you, you can play around with it using different inputs and I'm looking to seeing you at the pandas crash course. 

---
Thanks for reading, stay safe, and be happy.