# Numpy basics
Numpy is a library that provides an easy interface to work with N-dimensional arrays, instead of using the pure python alternative that is nested lists.

Offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more. And also, supports a wide range of hardware and computing platforms, and plays well with distributed, GPU, and sparse array libraries 

Open source software, distributed under a liberal BSD license, NumPy is developed and maintained publicly on GitHub by a vibrant, responsive, and diverse community.
- **Author:** Travis Oliphant
- **Creation year**: 2005
- **Last stable version:** 1.19.2
- **Programme type:** Numerical analysis software
- **Programmed in:** Python; C; Fortran
- **Programming languages:** Python, C
- **Code base:** https://github.com/numpy/numpy

**The main problem we can have using numpy is that we cannot use values that are not numeric** (see: https://numpy.org/doc/stable/user/basics.types.html for all the available types). 
*Although this is not entirely true, as we will see when using pandas, this library is focused on working with numbers.*


### Why use NumPy?

NumPy aims to provide an array object that is **up to 50x faster than traditional Python lists**.

The array object in NumPy is called **ndarray**, it provides a lot of supporting functions that make working with ndarray very easy.

Arrays are very frequently used in data science, where speed and resources are very important.

> **Data Science:** is a branch of computer science where is studied how to store, use and analyze data for deriving information from it.



### Why is NumPy faster than Python lists?

NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently.

This behavior is called locality of reference in computer science.

This is the main reason why **NumPy is faster than lists**. Also it is optimized to work with latest CPU architectures.


## Get Started

**To install python is used 'pip' as package manager but there are others like 'anaconda' that can be used as well**

We can check if the numpy python module is installed, just running the next command in our shell. If so, it will show us information about it along with the version.

In [None]:
!pip show numpy

If we do not have it installed, we can install it using the following command in our shell (usually inside a virtualenv):

In [None]:
!pip install numpy

On the other hand, if the version we have installed is not the most recent one; we can update the module by running:

In [None]:
 !pip install -U numpy

## First step

First we need to import the module to give it access to the python script, and renaming it using the most common convention -> **np**

In [None]:
import numpy as np

## Numpy arrays

Once we have it, let's see how to create an array, access a single value, or access an entire row or column.

NumPy is used to work with arrays. The array object in NumPy is called **ndarray**.

In [None]:
pure_python_data = [
    [1,  2,  3,  4],
    [5,  6,  7,  8],
    [9, 10, 11, 12]
]

array = np.array(pure_python_data) # creation

print(array)
print(type(pure_python_data))
print(type(array))

We can fetch additional information of the array:

In [None]:
print("There are", array.ndim, "dimensions in the array") # 1 dim -> single row, 2 dim -> matrix, 3 dim -> cube ...
print("The shape of the array is", array.shape) # (rows, columns)
print("In total, there are", array.size, "values") # How many values form the array?
print("We have an array of", array.dtype) # What type are the values of the array?

Other interesting creation methods:

In [None]:
array2 = np.array([i for i in range(1,13)]) # convert 1 dim array with 12 elements into a 3 dim array
array2 = array2.reshape(2, 3, 2) # will have 2 arrays that contains 3 arrays, each with 2 elements
print(array2)

In [None]:
np.eye(10,5) # return a 2 dim array with ones on the main diagonal and zeros elsewhere

For other methods see: https://numpy.org/doc/stable/reference/routines.array-creation.html

## Can an array be reshaped into any shape?

Yes, as long as the elements required for reshaping are equal in both shapes.

We can reshape an 8 elements 1 dim array into 4 elements in 2 rows 2 dim array but we cannot reshape it into a 3 elements 3 rows 2 dim array as that would require 3x3 = 9 elements.

In [None]:
array3 = np.array([1, 2, 3, 4, 5, 6, 7, 8]).reshape(3, 3)

## Numpy axes

One of the most difficult concept of numpy is the concept of axes (we will see later why).

It is important to have this concept clear, as it will avoid having troubles when using numpy functions such as `sum`, `mean`, `max`, `min`...

### Definition
- Axes are defined for arrays with more than one dimension.
- A 2 dim array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1)


Assuming two dimensions, we have the following array:

In [None]:
np.eye(5,6, dtype=np.int) # (rows, columns)

When we talk that we are operating on the axis 0, we are talking about traversing the array in the direction of how the rows are span:

![Source: https://www.sharpsightlabs.com/blog/numpy-axes-explained/](https://vrzkj25a871bpq7t1ugcgmn9-wpengine.netdna-ssl.com/wp-content/uploads/2018/12/numpy-axis0.png)

Thus, when we apply an operation over the row axis, we are collapsing the rows into a single row, while keeping all the other dimensions.

The next dimension is the columns, so when we apply an operation over the column axis (axis 1) we are collapsing the columns.

![Source: https://www.sharpsightlabs.com/blog/numpy-axes-explained/](https://vrzkj25a871bpq7t1ugcgmn9-wpengine.netdna-ssl.com/wp-content/uploads/2018/12/numpy-axis-1.png)
Source: https://www.sharpsightlabs.com/blog/numpy-axes-explained/

**We will come at this later.**

## Accessing array values

If we want to access a single element, with python lists we would use nested indexing, such as:

In [None]:
pure_python_data[1][3] # two step operation

With numpy, we can access the value using only one indexing that combines both the first and second dimensions:

In [None]:
array[1, 3] # only one stpe -> more efficiency

If we skip a dimension, we get all the values in the dimension we skipped.

For example, not specifying the column we get the row with all the columns:

In [None]:
array[1,]

This resembles the code that we use with pure python (to get a row we use `list[row_no]`). The advantage of using numpy is that we can also access columns for example:

In [None]:
array[:, 2]

**Note** that we have to set the `:` indexing value (fetch all values) for all the dimensions that we skip before specifying a value, to be explicit about which dimension we are using.

This simplifies the code when we are working with lists of data. 

For example, when implementing the algorithm KMeans, we must compute the centroid by calculating the mean point of all the points assigned to this centroid.

With pure python, we used:

In [None]:
points = np.array([
    (1, 2),
    (3, 2),
    (4, 4)
])

def mean_points(points): # two loops are needed
    n_feats = len(points[0])
    acc = [0.0] * n_feats
    for i in range(n_feats):
        for p in points:
            acc[i] += p[i]
        
        acc[i] /= len(points)
    
    return acc
print(mean_points(points))

Using numpy, we can set this function to be:

In [None]:
def mean_points(points):
    n_points, n_feats = points.shape
    acc = [sum(points[:,i]) / n_points for i in range(n_feats)] # only one loop
    return acc

print(mean_points(points))

### Numpy convenient  methods

As some operations are common in mots user cases, Numpy provides some methods to apply those common operations to an array. Thus, if we want to sum an entire array we would not write an iterator, but just use the `.sum` method:

In [None]:
def mean_points(points):
    n_points, n_feats = points.shape
    return points.sum(axis=0) / n_points # no loops thanks to numpy axes
print(mean_points(points))

Other available methods that we have are:

- `.min`
- `.max`
- `.mean`
- `.median`
- ...

So we can improve even more our mean_points function to be:

In [None]:
def mean_point(points):
    # return points.mean(axis=0)
    return np.mean(points, axis=0)

print(mean_point(points))

## Operations using numpy arrays

We can apply arithmetic operators between arrays *elementwise*. This means that for example we can sum a matrix with another one directly.

**Note the "elementwise". When multiplying matrices using `a*b` in Numpy, it will not apply the same rules we use in maths.**

In [None]:
array1 = np.arange(9).reshape(3, 3)
# arange creates a single dimension array with the elements
# from 0 to 8
# with reshape we give it the desired shape (a 3x3 matrix)

array2 = np.ones((3,3), dtype=np.int32)
# ones creates an array with the desired shape (3,3) filled with ones

print(array1)
print(array2)

array1 * array2

In [None]:
print((array1 * array2) / 2.)
print((array1 * array2) / 2)   # <- division between ints is an int in python 2

## Conclusion

This was a really basic introduction to the numpy library. This library is really extense, and has lots of methods to help us deal with matrix operations.

Numpy is interesting by itself, but also when using other frameworks such as Pandas, Tensorflow, Pytorch... you will see that they resemble the language used by this library.

For more information, check the official documentation at https://numpy.org/doc/stable/