# This tutorial is about NumPy

One of the most important libaries in the python library. It is a numeric competing library and is used to process numbers and calculate things.

- It is important because generally processing numbers in python is very slow.

- NumPy is a processing library that sits on top of Python, gives it the same API with typing code.

Why is it not so popular? We do not deply NumPy directly, but Pandas etc use NumPy for their numeric processing. 

## 1. What is NumPy & how it works?

There needs to be an understanding of how data works on a PC. The computer processings data directly from the hard drive into RAM (random access memory), this will lead to the number of bits that a computer's memory can store. We need to understand how integers are represented in binary.

In [5]:
import numpy as np

In [6]:
n = 3

In [7]:
2 ** 3

8

The formula is 2^n(positions)

## 2. Using NumPy

To store the number of 120, you can do the maths to work out how many bits are stored. 

In [8]:
2 ** 7

128

In [9]:
x = 5

In [10]:
np.int8

numpy.int8

In [11]:
np.int16

numpy.int16

### 2.1 NumPy Arrays

In [12]:
import sys

Create two arrays

In [14]:
np.array([1,2,3,4])

array([1, 2, 3, 4])

In [15]:
a = np.array([1,2,3,4])

In [16]:
b = np.array([0, -5, 1, 1.5, 2])

To access two individual elements of an array

In [17]:
a[0], a[1]

(1, 2)

Slicing works in the same way

In [18]:
a[0:]

array([1, 2, 3, 4])

In [19]:
a[1:3]

array([2, 3])

The -s misses off the last integer

In [20]:
a[1:-1]

array([2, 3])

In [23]:
a[::2] #steps

array([1, 3])

### Multi-indexing

In [24]:
b

array([ 0. , -5. ,  1. ,  1.5,  2. ])

You need to extract three elements out of it

In [25]:
b[0], b[2], b[-1]

(0.0, 1.0, 2.0)

In [28]:
b[[0, 2, -1]] # The result is another numpy array, so will be faster

array([0., 1., 2.])

## 2.1.1 Array Types

In [29]:
a

array([1, 2, 3, 4])

In [30]:
a.dtype

dtype('int64')

In [31]:
b

array([ 0. , -5. ,  1. ,  1.5,  2. ])

In [33]:
b.dtype #Remeber it contains floats.

dtype('float64')

In [35]:
np.array([1, 2, 3, 4], dtype=float) # Note np.float is depreciated.

array([1., 2., 3., 4.])

In [37]:
np.array([1, 2, 3, 4], dtype=np.int8)

array([1, 2, 3, 4], dtype=int8)

In [38]:
c = np.array(['a', 'b', 'c']) # NumPy is not suitable for sorting string data

In [39]:
c.dtype

dtype('<U1')

In [40]:
d = np.array([{'a': 1}, sys]) #Dictionary values

In [41]:
d.dtype

dtype('O')

## 2.1.2 Dimensions and shapes

In [42]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

In [43]:
A.shape #Shows two rows by 3 cols

(2, 3)

In [44]:
A.ndim #Two dimensions 

2

In [45]:
A.size

6

In [46]:
B = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4],
        [3, 2, 1]
    ]
])

In [47]:
B

array([[[12, 11, 10],
        [ 9,  8,  7]],

       [[ 6,  5,  4],
        [ 3,  2,  1]]])

In [48]:
B.shape

(2, 2, 3)

In [49]:
B.ndim

3

In [50]:
B.size

12

Remember that the dimensions must match, otherwise it will fall back to regular python.

In [51]:
C = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4]
    ]
])

  C = np.array([


In [52]:
C

array([list([[12, 11, 10], [9, 8, 7]]), list([[6, 5, 4]])], dtype=object)

In [53]:
C.dtype

dtype('O')

In [54]:
C.shape

(2,)

In [55]:
C.size

2

In [56]:
type(C[0])

list

## 2.1.3 Index and slicing of matrices

In [57]:
# Square Matrix
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [58]:
A[1]

array([4, 5, 6])

In [59]:
A[1][0]

4

In [60]:
A[1, 0]

4

In [61]:
A[0:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [63]:
A[:, :2] #Select from every row, but only cols 0 and 1

array([[1, 2],
       [4, 5],
       [7, 8]])

In [65]:
A[:2, :2]

array([[1, 2],
       [4, 5]])

In [66]:
A[:2, 2:]

array([[3],
       [6]])

In [67]:
A[1] = np.array([10, 10, 10]) #Change for an entire row

In [68]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [ 7,  8,  9]])

In [70]:
A[2] = 99 # Use an expand operation

In [71]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [99, 99, 99]])

Summary statistics

In [72]:
a = np.array([1, 2, 3, 4])

In [73]:
a.sum()

10

In [74]:
a.mean()

2.5

In [75]:
a.std()

1.118033988749895

In [76]:
a.var() #Variance

1.25

In [77]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [78]:
A.sum()

45

In [79]:
A.mean()

5.0

In [80]:
A.std()

2.581988897471611

In [81]:
A.sum(axis=0) #The sum of the first, second, and third cols.

array([12, 15, 18])

In [82]:
A.sum(axis=1) # The sum of the first, second and third row.

array([ 6, 15, 24])

In [83]:
A.mean(axis=0)

array([4., 5., 6.])

In [84]:
A.mean(axis=1)

array([2., 5., 8.])

In [85]:
A.std(axis=0)

array([2.44948974, 2.44948974, 2.44948974])

In [86]:
A.std(axis=1)

array([0.81649658, 0.81649658, 0.81649658])

Note 0 = horizontal dimenson, 1 = vertical dimension

## 2.1.5 Broadcasting and vertorise operations

In [87]:
a = np.arange(4)

In [88]:
a

array([0, 1, 2, 3])

In [89]:
a + 10 

array([10, 11, 12, 13])

In [90]:
a * 10

array([ 0, 10, 20, 30])

In [91]:
a # So numpy never modifies the original array

array([0, 1, 2, 3])

In [92]:
a += 100 # Broadcast operation

In [93]:
a # a was modified

array([100, 101, 102, 103])

In [95]:
l = [0, 1, 2, 3]

In [96]:
[i * 10 for i in l]

[0, 10, 20, 30]

In [104]:
a = np.arange(4)

In [105]:
a + 20

array([20, 21, 22, 23])

In [106]:
b = np.array([10, 10, 10, 10])

In [107]:
a + b # need to be same size

array([10, 11, 12, 13])

In [108]:
a * b

array([ 0, 10, 20, 30])

## 2.1.6 Boolean arrays

In [110]:
a = np.arange(4)

In [111]:
a

array([0, 1, 2, 3])

In [113]:
a[[0, -1]] # The first and last element

array([0, 3])

In [118]:
a[[True, False, False, True]]

array([0, 3])

In [119]:
a >= 2

array([False, False,  True,  True])

In [120]:
a[a >= 2]

array([2, 3])

In [122]:
a.mean()

1.5

In [123]:
a[a > a.mean()]

array([2, 3])

In [126]:
a[~(a> a.mean())] # All elements that are NOT greater than the mean.

array([0, 1])

In [127]:
a[(a == 0) | (a == 1)]

array([0, 1])

In [128]:
a[(a <= 2) & (a % 2 == 0)]

array([0, 2])

In [131]:
A = np.random.randint(100, size=(3, 3))

In [132]:
A > 30

array([[ True, False,  True],
       [False,  True, False],
       [ True,  True,  True]])

In [133]:
A[A > 30]

array([82, 70, 46, 58, 49, 48])