# Introduction to NUMPY

Numpy is a the ideal tool to work with datasets. It is much faster than using python on its own

# Contents

- [raw python vs numpy](#pythonvsnumpy)
- [vectorisation](#vectorisation)
- [create](#create)
- [size](#size)
- [resize](#resize)
- [indexing](#indexing)
- [multi axis indexing](#multiindexing)
- [boolean indexing](#boolindexing)
- [aggregation](#aggregation)
- [maths](#maths)



In [1]:
import numpy as np

## raw python vs numpy <a name="pythonvsnumpy"></a>

lets say I wanted to multiply all the numbers in my list with 2.

In [2]:
N=10000000
numbers_list = list(range(N))
numbers_list[:10]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [3]:
%%time
for i in range(N):
    numbers_list[i] =  numbers_list[i] * 2

Wall time: 1.67 s


In [4]:
numbers_list[:10]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [5]:
numbers_array = np.array(list(range(N)))
numbers_array

array([      0,       1,       2, ..., 9999997, 9999998, 9999999])

In [6]:
%%time
numbers_array = numbers_array * 2

Wall time: 20 ms


We can see that numpy is so much faster, especially when we have lots of elements in our array/list

# vectorisation  <a name="vectorisation"></a>

We saw that numpy arrays are much faster than loops. Arrays like this can also be referred to as vectors in linear algebra. The term "vectorisation" comes from this inherent speed gain by using numpy arrays/vectors. If you find your self writing a for loop, always ask your self if you can do it in a vector operation instead.

# Creating a numpy array <a name="create"></a>


In [7]:
np.array([1,2,3])

array([1, 2, 3])

In [8]:
np.zeros((2,3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [9]:
np.ones((2,3))

array([[1., 1., 1.],
       [1., 1., 1.]])

In [10]:
np.random.randn(2,3)# normally distributed

array([[-0.22862697, -0.27919357, -1.12036277],
       [ 0.08731768,  0.61890415, -1.06808657]])

In [11]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [12]:
np.arange(100)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

# length & size <a name="size"></a>

In [13]:
data = np.zeros((100,200))
data.shape

(100, 200)

In [14]:
len(data)# this will give you first axis

100

# Joining numpy arrays <a name="size"></a>

In [15]:
a = np.arange(10)
b = np.arange(20)
np.hstack((a,b))

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  0,  1,  2,  3,  4,  5,  6,
        7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

In [16]:
np.vstack((a,a))

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

# resize <a name="resize"></a>

In [17]:
data = np.arange(9)
data = data.reshape(3,3)
data

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [18]:
data = np.arange(9)
data = data.reshape(3,-1) # if you dont want to do the mental arithmetic you can supply -1 to second axis
data

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

# indexing <a name="indexing"></a>

In [19]:
data = np.random.randn(5)
data

array([-0.63951153,  1.19790662, -0.2884772 , -2.05606532,  0.41299889])

In [20]:
data[1]

1.1979066197623418

In [21]:
data[0:3]

array([-0.63951153,  1.19790662, -0.2884772 ])

In [22]:
data[:3]

array([-0.63951153,  1.19790662, -0.2884772 ])

In [23]:
data[-3:]

array([-0.2884772 , -2.05606532,  0.41299889])

# multi axis indexing <a name="multiindexing"></a>

In [24]:
data = np.random.randn(3,3)
data

array([[ 0.87383284, -0.45744195,  0.32587163],
       [ 0.22827456, -0.51515771,  0.06818815],
       [ 0.67205407, -2.61244925, -0.73043869]])

In [25]:
data[1,2] # 2nd row, 3rd collumn

0.06818815415890678

In [26]:
data[:,0] # pick all elements from first column

array([0.87383284, 0.22827456, 0.67205407])

In [27]:
data[:2,0] # pick up to 3rd element from first column

array([0.87383284, 0.22827456])

In [28]:
data[:,0] # pick all elements from first column

array([0.87383284, 0.22827456, 0.67205407])

# boolean indexing <a name="boolindexing"></a>

In [29]:
data = np.random.randn(3,3)
data

array([[-0.27388681,  1.26552776,  0.51825834],
       [-1.00662826,  1.5810271 , -0.74117376],
       [-0.14766015,  0.07509477,  0.17826672]])

In [30]:
data > 0

array([[False,  True,  True],
       [False,  True, False],
       [False,  True,  True]])

In [31]:
data[data > 0]

array([1.26552776, 0.51825834, 1.5810271 , 0.07509477, 0.17826672])

# Aggregation <a name="aggregation"></a>

In [32]:
data = np.random.randn(3,3)
data

array([[-0.02903865,  0.49810872,  0.94314587],
       [-0.64722015, -0.75142939, -0.56008812],
       [-1.63052674, -0.45314264,  0.67072781]])

In [33]:
data.sum()

-1.959463288118376

In [34]:
np.sum(data)

-1.959463288118376

In [35]:
data.sum(axis=1)

array([ 1.41221594, -1.95873766, -1.41294157])

In [36]:
data.sum(axis=1,keepdims=True)

array([[ 1.41221594],
       [-1.95873766],
       [-1.41294157]])

In [37]:
data.sum(axis=0)

array([-2.30678553, -0.70646332,  1.05378556])

In [38]:
data.mean()

-0.217718143124264

In [39]:
data.min()

-1.6305267356363822

In [40]:
data.max()

0.943145870101194

# Math operations <a name="maths"></a>

In [41]:
data = np.random.randn(3,3)
data

array([[ 0.798753  ,  0.45533312,  1.31915547],
       [ 0.00775439, -0.05214384,  1.9040835 ],
       [-1.3743031 , -0.69520301,  0.40397114]])

In [42]:
np.exp(data)

array([[2.22276742, 1.57669853, 3.74026128],
       [1.00778453, 0.94919233, 6.71325209],
       [0.25301586, 0.49897314, 1.49776072]])

In [43]:
np.sin(data)

array([[ 0.71648674,  0.43976152,  0.96850516],
       [ 0.00775431, -0.05212021,  0.94497205],
       [-0.98075724, -0.64054135,  0.39307293]])