Die-hard C++ or Fortran users among physicists often say that python is too slow. 

True, python is an interpreted language and it is slow.

Even python advocates like me realize it, but we think that the (lack of) speed of python is not really an issue, e.g. because:

* The time spent computing is balanced by a much smaller development time;
* Profiling is easy, which means that one can find the parts of the code that are slow, optimize them, and even write them in faster languages so that they can be compiled and used from python;

And most importantly: 

* Some python tools like numpy are as fast as plain C. 

In this tutorial, you will understand what is numpy and why it's fast, and learn just what's needed about numpy for the usual machine learning operations. 



## Installation

Numpy is the core of scientific python, so it's installed as a dependency for most of the scientific python packages. For example, you will get it if you install scikit-learn, matplotlib, or Keras. Also, numpy is installed by default on the usual platforms as a service for jupyter notebooks, such as Google Colab or FloydHub. 

If you don't have it, you can install it with [Anaconda](https://thedatafrog.com/en/install-anaconda-data-science-python/), by doing: 

```
conda install numpy
```

Then, traditionally, numpy is imported in the following way:

In [1]:
import numpy as np

## The numpy array : Why is it fast? 

The main purpose of numpy is to provide a very efficient data structure called the numpy array, and the tools to manipulate such arrays. 

Why is the numpy array so fast? 

Because, under the hood, the whole data for a given array is stored in a contiguous region of the computer memory. This makes it possible to manipulate the array with compiled code, optimized for the CPU. In particular, numpy operations are parallel as they use [SIMD](https://en.wikipedia.org/wiki/SIMD) (Single Operation Multiple Data). 

To see how fast numpy is, we can time it. 

Let us create a large list with one million integers, and a numpy array from this list: 

In [3]:
lst = range(1000000)
arr = np.array(lst)

Now let's compute the square of all integers, and see how much time it takes. 

We start by the list: 

In [15]:
%timeit squares = [math.sqrt(x) for x in lst]

116 ms ± 2.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


And we do the same for the array:

In [13]:
%timeit squares = arr**2

616 µs ± 7.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


As you can see, this is more than 300 times faster. 

We can in principle loop on the numpy array like this:

In [14]:
%timeit squares = [x**2 for x in arr]

236 ms ± 4.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


But then, we completely lose the benefits of numpy! Indeed, when we do `arr**2` we use the square function of numpy, which is intrinsically parallel. When we loop, we process the elements one by one with basic python. So: 

**Never ever loop on a numpy array, no exceptions!**