# Numpy Basic Concepts

## 1 What is Numpy ?

NumPy is the fundamental package for scientific computing with Python. It is:

* a powerful Python extension for N-dimensional array
* a tool for integrating C/C++ and Fortran code
* designed for scientific computation: linear algebra and Signal Analysis

If you are a MATLAB&reg; user we recommend to read [Numpy for MATLAB Users](https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html) and [Benefit of Open Source Python versus commercial packages](https://scipy.github.io/old-wiki/pages/NumPyProConPage.html). For an idea of the Open Source Approach to science, we suggest the [Science Code Manifesto](http://sciencecodemanifesto.org/)

### 1.1 Documentation and reference:

* [Numpy Reference guide](http://docs.scipy.org/doc/numpy/reference/)
* [SciPy Reference](http://docs.scipy.org/doc/scipy/reference/)
* [Scipy Topical Software](http://www.scipy.org/Topical_Software)
* [Numpy Functions by Category](http://www.scipy.org/Numpy_Functions_by_Category)
* [Numpy Example List With Doc](http://www.scipy.org/Numpy_Example_List_With_Doc)  

Lets start by checking the Numpy version used in this Notebook:

In [17]:
import numpy as np
print ('numpy version: ', np.__version__)

numpy version:  1.17.3


## 2 Array Creation

NumPy's main object is the homogeneous ***multidimensional array***. It is a table of elements (usually numbers), all of the **same type**. In Numpy dimensions are called ***axes***. The number of axes is called ***rank***. The most important attributes of an ndarray object are:

* **ndarray.ndim**     - the number of axes (dimensions) of the array. 
* **ndarray.shape**    - the dimensions of the array. For a matrix with n rows and m columns, shape will be (n,m). 
* **ndarray.size**     - the total number of elements of the array. 
* **ndarray.dtype**    - numpy.int32, numpy.int16, and numpy.float64 are some examples. 
* **ndarray.itemsize** - the size in bytes of elements of the array. For example, elements of type float64 has itemsize 8 (=64/8) 

In [18]:
# 1-d array:

When we create our array, it is possible to set properly the *ndmin* parameter in order to change the dimension of the given array (and so the shape).
Remember that the method *np.shape()* outputs a tuple, even when *dim*=1.

In [19]:
# 2-d array


**Try by yourself**   the following commands *(type or paste the commands in the cell below)*:

    b.ndim                  # Number of dimensions
    b.dtype.name            # Type of data
    b.itemsize              # Size in bytes of elements
    b.size                  # Number of elements in the array

The type of the array can be specified at creation time:

We can modify the type of the array using *.astype()* method (this operation is called **casting**)


However, pay attention when you are *downcasting* it!

If you want to convert legally a *float* array into an *int*, you can use **round**: 

### 2.1 Array creation functions

Often, the elements of an array are originally unknown, but its size is known. Hence, **NumPy** offers several functions to create arrays with initial placeholder content.

The function `zeros` creates an array full of zeros, the function `ones` creates an array full of ones, and the function `empty` creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is float64.  
***Try by yourself*** the following commands:

    zeros((3,4))
    ones((3,4))
    empty((2,3))
    eye(3)
    diag(np.arange(5))
    np.tile(np.array([[6, 7], [8, 9]]), (2, 2))

`zeros_like, ones_like` and `empty_like` can be used to create arrays of the same type of a given one

### 2.2 Sequences and reshaping

Arrays can be created with ***linspace***, ***logspace*** (returning evenly spaced numbers, linear or logarithmic) or ***arange*** and then shaped in matrix form. **mgrid** is like the equivaled "meshgrid" in MATLAB.

We can change the shape (and the dimension) of a given array using **reshape** but the new array that we are going to generate must have the same *size* (total number of elements) of the original one.

Another way to change the dimension of a given array is **newaxis**:
![](Images/newaxis_image.png)

We can use **List comprehension** to create a matrix:


### 2.3 Random numbers

There are several modules of numpy that allow us to generate arrays filled with random numbers from a given probability distribution.

Actually, when we say *random* we do not exactly mean random. Instead, we are talking about algorithms which create **pseudo-random** numbers, which means that if we give to the algorithm a **seed** (a starting point), it will return the same numbers at every run.

These algorithms are massively exploited in every machine learning model, for example when we want to generate new points from a given distribution or when we have to initialize some parameters that we want to learn.
The function *.seed()* could be very useful to study the robustness the model we are building, especially if we are handling small data.

In [21]:
np.random?

### 2.4 Sparse Matrices

Sparse Matrices are matrices (or more generally *multidimensional arrays*), usually with a huge size, filled mostly with zeros.

Because we are interested in the non-zero elements (which are a tiny percentage) there is a lot of useless information (so wasted memory) inside a sparse matrix.

To solve this problem, there are many formats that allow to compact all the useful information using far less memory. One of these smart formats is **Compressed-Sparse-Row**.

In [22]:
from scipy import sparse

# Create an array with many zeros



There are several other sparse formats that can be useful for various problems:

- `CSC` (compressed sparse column)
- `BSR` (block sparse row)
- `COO` (coordinate)
- `DIA` (diagonal)
- `DOK` (dictionary of keys)

The ``scipy.sparse`` submodule also has a lot of functions for sparse matrices
including linear algebra, sparse solvers, graph algorithms, and much more.

## 3 Basic Array Operations and Linear Algebra

Let's see now how to perform some basic operations on arrays:

**Broadcasting**

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

M+2 is a simple case of broadcasting: it's like you're performing the sum between M and a matrix with same shape of M and with only elements '2'

<![](Images/numpy_broadcasting_image.png)

In [23]:
 #broadcasting!

In [24]:
   #broadcasting!

But.. what about the classic matrix multiplication that we know from algebra? Use the command .dot() and remember that the shape of the two arrays must be aligned! i.e. #columns M = #rows N

In [25]:
# MxH classic multiplication between matrices 

**Transpose** of an array: rows become columns and columns become rows

**Trace** of a matrix: it is the sum of the matrix diagonal elements. If the matrix is not square, you have to specify what is the diagonal on which compute the trace.

In [26]:
 # trace over first diagonal

In [27]:
 # trace over the second diagonal

###  Determinant and inverse of a square matrix

The `scipy.linalg.det()` function computes the determinant of a square matrix:

In [34]:
from scipy import linalg

s = np.array([[1, 2],
               [3, 4]])
s


array([[1, 2],
       [3, 4]])

In [35]:
linalg.det(s)

-2.0

The `scipy.linalg.inv()` function computes the inverse of a square matrix:

In [36]:
print (linalg.inv(s))

[[-2.   1. ]
 [ 1.5 -0.5]]


### Stacking & concatenation

There are many methods that allow to put together different arrays, with the proper shape. In fact, when we manipulate data, these methods are very useful!

Other very common methods are *.insert()* and *.append()*

### Other useful commands

In [29]:
N

NameError: name 'N' is not defined

In [30]:
N.min(), N.max()   #to compute the max or the min element of the matrix

NameError: name 'N' is not defined

In [31]:
N.mean()  #mean of all elements of the matrix

NameError: name 'N' is not defined

In [32]:
N.cumsum(1)   #return the cumulative sum of the elements along a given axis.
              

NameError: name 'N' is not defined

In [None]:
N.cumsum()    #the default (None) is to compute the cumsum over the flattened array.

In [33]:
G = np.array([[1, 2, 2, 2, 4], [1, 4, 2, 1, 2], [1, 2, 4, 4, 2]])
print(np.unique(G))      #return the unique values of the matrix 

[1 2 4]


## 4 Slicing - Indexing 

Remember: slices (indexed subarrays) are references to memory in the original array, this means that if you modify a slice, you modify the original array. In other words a slice is a pointer to the original array.

### 4.1 Indexing single elements

### 4.2 Indexing by rows and columns