# Optimisation: NumPy

Vanilla Python is bad at storing and manipulating large amounts of data. Lists, dictionaries, sets and tuples are inherently slow and inefficient. If you're handling large amouns of data in a performance-critical area of your code, substantail gains can be made by using other data types to store your data.

One of the most popular ways of holding large amounts of data  is through the use of the [NumPy](https://numpy.org/) package. NumPy provides access to the powerful array data type which can store large N-dimensional arrays of data. As Numpy largely overlays compiled code written in the C language, it bypasses many of the weaknesses of Python to provide very fast performance for common linear algebra (and other) operations.

This notebook doesn't aim to give you a working knowledge of NumPy. Instead, it aims to offer a brief demonstration of the savings which can be made by using NumPy.

## Dot Product Example

In the example below we generate two random vectors each with 1,000,000 entries and then calculate the dot product of them

In [1]:
!pip install line_profiler
import random

%load_ext line_profiler

def random_dot_product(n):
  list_1 = [random.randrange(1, 100, 1) for i in range(n)]
  list_2 = [random.randrange(1, 100, 1) for i in range(n)]

  return(sum([x*y for x, y in zip(list_1, list_2)]))

%lprun -f random_dot_product random_dot_product(1000000)

Collecting line_profiler
[?25l  Downloading https://files.pythonhosted.org/packages/d8/cc/4237472dd5c9a1a4079a89df7ba3d2924eed2696d68b91886743c728a9df/line_profiler-3.0.2-cp36-cp36m-manylinux2010_x86_64.whl (68kB)
[K     |████▊                           | 10kB 20.3MB/s eta 0:00:01[K     |█████████▌                      | 20kB 2.1MB/s eta 0:00:01[K     |██████████████▎                 | 30kB 3.1MB/s eta 0:00:01[K     |███████████████████             | 40kB 4.0MB/s eta 0:00:01[K     |███████████████████████▉        | 51kB 2.6MB/s eta 0:00:01[K     |████████████████████████████▋   | 61kB 3.0MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 2.6MB/s 
Installing collected packages: line-profiler
Successfully installed line-profiler-3.0.2


In [2]:
!pip install line_profiler
import numpy as np

%load_ext line_profiler

def random_dot_product(n):
  array1 = np.random.rand(n)
  array2 = np.random.rand(n)

  return(np.dot(array1, array2))

%lprun -f random_dot_product random_dot_product(1000000)

The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler


The second case, using NumPy, executes around 300 times faster than the first. This is because the bulk of the calcualtions has been shifted out of Python and into C, which is much faster. In addition, the availability of intrinsic function specifically designed to generate a random array and perform a dot product both allow a specific and optimised implementation compared to the first version where we wrote the calcualtions in raw Python.