![NumPy Logo](attachment:image-2.png)

NumPy which simply stands for **Numerical Python**.

It is a python library responsible for carrying out complex array operations. It is a widely adopted library in python to perform various linear algebra operations, Fourier Transforms and random number generation. It is 50x faster than generic python lists and it was originally written in C, C++ for faster computation.

NumPy Github link : https://github.com/numpy/numpy



# Installation 

In [1]:
%pip install numpy

Note: you may need to restart the kernel to use updated packages.


# Import

You can import the NumPy package using the below code. It is a common practice to import as `np`. You may use a different name than `np`, but you might find it convenient to follow the common practice of naming NumPy as `np`. 

In [2]:
import numpy as np

We can now operate on the variable `np` to use the NumPy functions.

To get started, let us create a random matrix with NumPy. We will be using NumPy to mainly to handle matrices and perform operations on them. 

# NumPy is Important. Why?

In several of the machine learning programs, we have to deal with matrices. Rather, most data we have is in the form of a matrix.

You can consider for example a simple sales register. This sales register could come from an Excel document. This simple data that comes from an Excel document, is arranged in the form of rows and columns. A row and column format is considred as a 2 dimensional matrix. This means if you have an excel sheet with 5 columns and 100 rows, then you have a matrix of size 100 x 5, or as we would write it in NumPy (100, 5)

Here, the first number denotes the number of rows, and the second number denotes the number of columns.

As a matter of fact, even image data and video data is in the form of matrices. We represent images as 3 dimensional matrices. If we have to pass input to machine learning models, and handle the output from machine learning models, we almost never can do it without the use of some form of matrix data. This is why understanding how to work on matrices in Python is very important. 

Keep in mind that NumPy is not your only option for handling matrix information. You could use simple arrays to handle dimensional data. However, NumPy was specifically written to handle large volumes of data efficiently. It was specially written to perform operations much faster than what those operations would take using simple arrays. 

In most scenarious, data is going to be very large. Using a package like NumPy is going to save you a ton of execution time. 

# Creating your first matrix

Before we start creating our first array. Let's understand what is a **matrix**. <br> A **matrix** is a collection of numbers arranged into a fixed number of rows and columns. 

![](https://chortle.ccsu.edu/VectorLessons/vmch13/mtrx1.gif)

This matrix is a 3x3 matrix because it has three rows and three columns. In describing matrices, the format is:
**rows X columns** 

Each number that makes up a matrix is called an element of the matrix.

Let us create our first matrix using a random number based matrix generator. NumPy provides several inbuilt functions to quickly create matrices. This includes creating a matrix with random values. 

In [3]:
data = np.random.randn(3, 3)
data

array([[-0.14155633, -1.38513525,  1.12584434],
       [-2.01933397,  1.07991499, -0.62370323],
       [-0.07299109, -0.18702077, -0.74727163]])

There we go. We have successfully created a `3x3` matrix. This is a square matrix, filled with random numbers. 

Let us now try creating a matrix with 2 rows and 3 columns. Or, as we would call it a `2x3` matrix.

In [4]:
np.random.randn(2, 3)

array([[-0.5848593 , -0.54368605,  1.28138602],
       [-0.39456216,  0.21461109, -0.51766601]])

We can also quickly try 3 rows and 2 columns, as shown below.

In [5]:
np.random.randn(3, 2)

array([[-0.21009713,  0.71181626],
       [ 0.03790045,  0.46613502],
       [-0.30579528, -0.90247215]])

# Performing mathematical operations

In NumPy it is really easy to perform certain mathematical operations on the data. Let's try a few on randomly generated matrices. 

In [6]:
data = np.random.randn(2, 3)

print('Before', data)

Before [[-0.41243102 -0.52198758 -0.43280414]
 [-1.24399283 -1.01050165 -0.01265889]]


In [7]:
print('After', data * 10)

After [[ -4.12431021  -5.21987576  -4.32804135]
 [-12.43992834 -10.1050165   -0.12658892]]


Keep in mind, the above multiplication operation will not actually change the original matrix. It returns a new matrix with values of the previous matrix multiplied by 10. We can check this by printing the values of `data`. 

In [8]:
data

array([[-0.41243102, -0.52198758, -0.43280414],
       [-1.24399283, -1.01050165, -0.01265889]])

There are several different types of mathematical operations that we can perform on such matrices. NumPy provides us a simple and business user friendly syntax to perform such operations. Let's try a few more.

## Add matrix to itself

In [9]:
data + data

array([[-0.82486204, -1.04397515, -0.86560827],
       [-2.48798567, -2.0210033 , -0.02531778]])

## Add 1 to each element of the matrix

In [10]:
data + 1

array([[ 0.58756898,  0.47801242,  0.56719586],
       [-0.24399283, -0.01050165,  0.98734111]])

## Divide by half

In [11]:
data / 2

array([[-0.20621551, -0.26099379, -0.21640207],
       [-0.62199642, -0.50525083, -0.00632945]])

## Add 2 different matrices

In [12]:
data2 = np.random.randn(2, 3)

print('Data 1 =>', data)
print('Data 2 =>', data2)
print('Data 1 + Data 2 =>', data + data2)

Data 1 => [[-0.41243102 -0.52198758 -0.43280414]
 [-1.24399283 -1.01050165 -0.01265889]]
Data 2 => [[ 0.23872176  1.87540312  0.25237765]
 [-0.3875943   0.03512294 -0.00854803]]
Data 1 + Data 2 => [[-0.17370926  1.35341554 -0.18042649]
 [-1.63158713 -0.97537871 -0.02120692]]


There are some restrictions ofcourse. The two matrices must be of the same size, if they are to be added to each other. It will not work if you take matrices with different dimensions.

# NumPy is really fast

The biggest advantage of NumPy is that it is really fast. That is the whole reason why we use it. 

Using the below code, let us compare a NumPy array to a Python list. We will take both the array and the list as the same size. We will perform exactly the same operation on the both the items. We will measure the time it takes to execute both, and we will have a comparision. 

In [13]:
import numpy as np
arr = np.arange(100000)  # Numpy array
lis = list(range(100000))# Python List
%time for _ in range(10): arr2 = arr * 2

CPU times: user 0 ns, sys: 3.77 ms, total: 3.77 ms
Wall time: 3.43 ms


We have the result for NumPy above. Now let us try performing the same operation using a list in Python. The list should ideally take longer.

In [14]:
%time for _ in range(10): my_list2 = [x * 2 for x in lis]

CPU times: user 80.7 ms, sys: 11.8 ms, total: 92.5 ms
Wall time: 90 ms


There we go. The execution time for the list in `ms` (milli-seconds) is substantially larger.

We have taken a large array, so that the time difference is noticeable. If we take a small array, the time to perform the operation would be really small, and the time in both cases could be very similar. You can try reducing the array size, and confirm that the time taken is almost the same. 

When handling large data, NumPy-based algorithms are generally 10 to 100 times faster (or more) than their
pure Python counterparts and they also use significantly less memory. When handling large data, the CPU time is not your only problem. You might actually run out of RAM. NumPy is optimised to save RAM. It can perform the same operations faster, and with a lesser memory footprint. 