<a href="https://colab.research.google.com/github/DanRHowarth/Artificial-Intelligence-Cloud-and-Edge-Implementations/blob/master/Oxford_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Numpy Tutorial

### This tutorial covers the following:
 1. Overview of Numpy 
 2. Introduction to Numpy Arrays
   *  Creating arrays 
   *  Locating elements of an array 
   *  Altering properties of an array, including it's `shape` and `datatype`
 3. Performing operations on Arrays
  * Arithmetic operations
  * Boolean Operations 
  * Universal functions
 
#### Exercises
 * Each section will have an exercise to help reinforce your learning. We suggest you:
   * Write out each line of code by hand (rather than copy and paste it from the relevant example) - this will improve your understanding of code syntax
   * Write out, above each line of code, an explanation as to what the code, using a comment `#` - this will improve your understanding of how the code works

### 1. Overview of Numpy


* [Python for Data Analysis](https://www.amazon.co.uk/Python-Data-Analysis-Wrangling-IPython-ebook/dp/B075X4LT6K/ref=sr_1_1?s=digital-text&ie=UTF8&qid=1540755909&sr=1-1&keywords=python+for+data+analysis ) sets out the following benefits of the Numpy library: 

  * the `ndarray`, is an efficient multidimensional array providing fast array-oriented arithmetic operations and flexible broadcasting capabilities.

  * Mathematical functions for fast operations on entire arrays of data without having to write loops.

  * Tools for reading/writing array data to disk and working with memory-mapped files.

  * Linear algebra, random number generation, and Fourier transform capabilities.


* It is designed for efficiency on large arrays of data because it internally stores data in a contiguous block of memory, independent of other built-in Python objects. Numpy operations perform complex computations on entire arrays without the need for Python for loops.

* See *Python for Data Analysis*, or [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/) (free) for more background on the library and its benefits.


In [1]:
## this simple example will demonstrate the difference in efficiency between numpy and python 

# it is convention to import numpy this way
import numpy as np

# we create our first numpy array - we will come back to this type of operation later 
my_arr = np.arange(1000000)

# create a python list with the same values, but this is obviously not stored as a numpy array
my_list = list(range(1000000))

In [4]:
## let's time how long an operation takes in numpy
%time my_arr2 = my_arr * 2

CPU times: user 2.97 ms, sys: 3.24 ms, total: 6.21 ms
Wall time: 4.65 ms


In [5]:
# and compare it to the same operation in python
%time my_list2 = [x * 2 for x in my_list]

CPU times: user 49.4 ms, sys: 16 ms, total: 65.4 ms
Wall time: 63.9 ms


### 2. Introduction to Numpy Arrays

* The *N-dimensional array object*, or `ndarray`, is a fast, flexible container for large datasets in Python. Arrays enable you to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements.
* In this section, we will cover:
  * How to create arrays
  * How to access different parts of the array 
  * The properties of arrays, and how to modify them.

#### 2.1 Creating arrays
* There are a number of different ways to create an array. We cover some of the main ways here, including:
  * passing in values to the parameters of `np.array`
  * `zeroes`: an array of zero values 
  * `ones`: an array of all ones 
  * `full`: an array of a specified constant value 
  * `random`: an array of random values
  * `eye`: an array with ones on the diagonal and zeroes otherwise
  * converting from a python list to a numpy array
  * `arange`: an array with values within the range specified 
  * `linspace`: an array of equally spaced values within the upper and lower bounds specified
  * `from pandas`
 


In [17]:
# Create an array by passing in values  
a = np.array([0, 1, 2]) 
a

array([0, 1, 2])

In [7]:
# Create a 2D array
b = np.array([[0,1,2],[3,4,5]])  
b

array([[0, 1, 2],
       [3, 4, 5]])

In [8]:
# Create an array of all zeros - with three rows of three columns each 
c = np.zeros((3,3)) 
c

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [9]:
# Create a 2x2 array of all ones
d = np.ones((2,2)) 
d

array([[1., 1.],
       [1., 1.]])

In [10]:
# Create a 3x3 constant array
e = np.full((3,3), 7) 
e

array([[7, 7, 7],
       [7, 7, 7],
       [7, 7, 7]])

In [13]:
# Create a 3x3 array filled with random values
f = np.random.random((3,3)) 
f

array([[0.09183256, 0.38523297, 0.60467487],
       [0.46246678, 0.18689598, 0.90421648],
       [0.63256652, 0.08092401, 0.40403242]])

In [14]:
# Create a 3x3 matrix with ones on the diagonal 
# np.identity also returns the same result 
g = np.eye(3)    
g

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [18]:
# convert list to array
h = list((2, 3, 1, 0))
h = np.array(h) 
h

array([2, 3, 1, 0])

In [19]:
# arange() will create arrays with regularly incrementing values
i = np.arange(20)
i

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [20]:
# linspace() will create arrays with a specified number of items which are 
# spaced equally between the specified beginning and end values
j = np.linspace(2., 4., 5)
j

array([2. , 2.5, 3. , 3.5, 4. ])

In [21]:
## Create an array from a pandas dataframe

# import pandas
import pandas as pd

#initialize a dataframe
df = pd.DataFrame(
	[[21, 72, 67],
	[23, 78, 69],
	[32, 74, 56],
	[52, 54, 76]],
	columns=['a', 'b', 'c'])

# use .to_numpy() to convert a dataframe to a numpy array
df.to_numpy()

array([[21, 72, 67],
       [23, 78, 69],
       [32, 74, 56],
       [52, 54, 76]])

In [22]:
df

Unnamed: 0,a,b,c
0,21,72,67
1,23,78,69
2,32,74,56
3,52,54,76


#### EXERCISE 2.1: Creating an array
* Create an array using the `np.arange()` function. 
* Use the [documentation](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.arange.html) to add start, stop and step parameters 

In [28]:
## EXERCISE CODE HERE
# creating an array whose element start from 1 and then incrementing by 2 until the the max value reaches 100
ed_arr1 = np.arange(1,100,2)

#print the array elements.
print(ed_arr1)

#print the number of elements in the array
print(len(ed_arr1))

[ 1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47
 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95
 97 99]
50


#### 2.2 Accessing arrays

* The elements of numpy arrays can be accessed using their index. Numpy (and Pandas) refers to elements *along* the rows as being on `axis 1`, and elements *down* the rows as being on `axis 0`. This is set out below:

![alt text](https://www.safaribooksonline.com/library/view/python-for-data/9781491957653/assets/pyda_0401.png)

* Indexing a 1D array is relatively straightforward as the index values are all on the same axis.
* Indexing a 2D array (or higher dimensions) is slightly trickier, as we need to access `axis 0` first, and then `axis 1`. In other words, we access rows, and then columns when indexing 2D arrays.
* Note that Python indexing starts with the first value being indexed as 0.
* The examples below should make this a little clearer. 


In [27]:
# lets go back to on the first array we created
print("1D array:\n", a)

# we can access the shape of the array like this
print("\nDimensions of array: ", a.shape)

# and call the individual elements of the array using the element's index
print("\nFirst element of array: ", a[0])
print("\nSecond element of array: ", a[1])
print("\nThird element of array: ", a[2])

1D array:
 [0 1 2]

Dimensions of array:  (3,)

First element of array:  0

Second element of array:  1

Third element of array:  2


In [29]:
# lets compare this to a 2D array
print("2D array:\n", b)

# and contrast the shape of this array with the shape of a
print("\nDimensions of 2D array: ", b.shape)

# pause and make sure you understand how the values are being accessed
print("\nArray element [0,0]:", b[0, 0])
print("\nArray elements [1,2] [0,1] [1,0]: ", b[1, 2], b[0, 1], b[1, 0])

2D array:
 [[0 1 2]
 [3 4 5]]

Dimensions of 2D array:  (2, 3)

Array element [0,0]: 0

Array elements [1,2] [0,1] [1,0]:  5 1 3


In [30]:
# We can also slice arrays, using the following approach:

# a[start:end] # items start through to end-1
# a[start:]    # items start through the rest of the array
# a[:end]      # items from the beginning through to end-1
# a[:]         # a copy of the whole array


# The key point to remember is that the :end value represents the first value that is not in the selected slice. So, the difference 
# beween end and start is the number of elements selected (if the step is 1, the default).
# The other feature is that start or end may be a negative number, which means it counts from the end of the array 
# instead of the beginning. The indexing is reversed when a negative number is used. So:

# a[-1]    # last item in the array
# a[-2:]   # last two items in the array
# a[:-2]   # everything except the last two items

# create a new array
k = np.array([1,2,3,4])

# access the first two elements
k[0:2]

array([1, 2])

In [31]:
# higher dimension slicing 
j = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# access one row of the array
j[2]

array([ 9, 10, 11, 12])

In [32]:
# access values within one row - note that we can also express this as j[2][:2]
j[2,:2]

array([ 9, 10])

In [33]:
# access all the rows (axis 0) and two of the columns (axis 1)
j[:,:2]

array([[ 1,  2],
       [ 5,  6],
       [ 9, 10]])

#### EXERCISE 2.2: Accessing Arrays 
* Create a 2D array using one of the methods from section 2.1
* Create a new variable containing a slice of the original array
* Replace one of the values of the sliced array using the following template:
  * `array_slice[index_value] = new_value`
* Call the original 2D array and notice if the new_value has been updated  

In [56]:
## EXERCISE CODE HERE
#creating 3by4 array using random funtion. Array is initialized with random values.
ex_arr2 = np.random.random((3,4))
print("Original array")
print(ex_arr2)

#extracting the elements 0-1 rows and 0th column
ex_slice1 = ex_arr2[0:2,0:1]
print()
print("Original Slice")
print(ex_slice1)

#Replace 1 value with 9:
ex_slice1[0] = 9
print()
print("Slice after update")
print(ex_slice1)

#printing the original array. 
print()
print('Original array after updating the slice')
print(ex_arr2)

Original array
[[0.45559035 0.05795459 0.65286311 0.52229635]
 [0.74391222 0.62391436 0.98717767 0.58060448]
 [0.10445647 0.11707758 0.96744797 0.41876451]]

Original Slice
[[0.45559035]
 [0.74391222]]

Slice after update
[[9.        ]
 [0.74391222]]

Original array after updating the slice
[[9.         0.05795459 0.65286311 0.52229635]
 [0.74391222 0.62391436 0.98717767 0.58060448]
 [0.10445647 0.11707758 0.96744797 0.41876451]]


We can see the original array value is also updated to 9

#### 2.3 Other properties of arrays
* An ndarray is a generic multidimensional container for homogeneous data. Every array has a shape (which we discovered above), a tuple indicating the size of each dimension, and a dtype, an object describing the data type of the array
* We can access information on these properties, and alter them, as we will run through below. Operations to reshape the data are common in Deep Learning, and some of the methods used are covered here. 


In [57]:
# One important thing to note is that we can change an element of the array *inplace*
print ("a before changing an element value in place:\n", a)

# change the value at the first elements
a[0] = 5                 

# and show the result 
print ("\na after changing an element value in place:\n", a)

a before changing an element value in place:
 [0 1 2]

a after changing an element value in place:
 [5 1 2]


In [58]:
# we can access the data type of an array using '.dtype'
print("Data type of a: ", a.dtype)

Data type of a:  int64


In [59]:
# note that numpy automatically assigned us the data type - but we can specify it
a_float = np.array(a, dtype=np.float64)

# print the datatype
print(a_float.dtype)

# and we can see that the array has a decimal point after the number 
a_float

float64


array([5., 1., 2.])

In [62]:
# we can explicitly convert or cast an array from one dtype to another using ndarray’s astype method
a_int32 = a_float.astype(np.int32)

# print the dtype
print(a_int32.dtype)

# we can see that the numbers have been converted back to integers 
a_int32

int32


array([5, 1, 2], dtype=int32)

In [65]:
# Converting from strings to numbers can also be useful
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)

# here we convert to 'float', we could also use np.float
numeric_strings.astype(float)

array([ 1.25, -9.6 , 42.  ])

In [66]:
# earlier we looked at the .shape attribute. Lets now look at how to reshape numpy arrays 
print(j.shape)

# reshape - noting that the new shape must 'fit' the same number of values (12 in total) 
k = j.reshape(6,2)
k

(3, 4)


array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10],
       [11, 12]])

In [67]:
# passing -1 as the second parameter means numpy will infer the size of the second value
# this is useful if you have large array 

l = np.arange(1500).reshape(2,-1)
l.shape

(2, 750)

In [70]:
# we can join two different arrays either vertically (axis 0) or horizontally (axis 1)
# Create 2 arrays
m = np.array([np.arange(0,3),np.arange(3,6), np.arange(6,9)])
display(m)
n = np.random.random((3,3))
display(n)

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

array([[0.95910292, 0.80030274, 0.82456898],
       [0.92427204, 0.43480263, 0.10532002],
       [0.8193717 , 0.01387159, 0.65119362]])

In [71]:
# join - 'stack' vertically - note it takes a tuple as an argument 
o = np.vstack((m,n))
o

array([[0.        , 1.        , 2.        ],
       [3.        , 4.        , 5.        ],
       [6.        , 7.        , 8.        ],
       [0.95910292, 0.80030274, 0.82456898],
       [0.92427204, 0.43480263, 0.10532002],
       [0.8193717 , 0.01387159, 0.65119362]])

In [72]:
# stack horizontally
p = np.hstack((m,n))
p

array([[0.        , 1.        , 2.        , 0.95910292, 0.80030274,
        0.82456898],
       [3.        , 4.        , 5.        , 0.92427204, 0.43480263,
        0.10532002],
       [6.        , 7.        , 8.        , 0.8193717 , 0.01387159,
        0.65119362]])

In [73]:
# we can then use .ravel() or .flatten() to return 1D arrays
q = o.ravel()
q

array([0.        , 1.        , 2.        , 3.        , 4.        ,
       5.        , 6.        , 7.        , 8.        , 0.95910292,
       0.80030274, 0.82456898, 0.92427204, 0.43480263, 0.10532002,
       0.8193717 , 0.01387159, 0.65119362])

In [74]:
# flatten returns a copy of the array, ravel does not - otherwise they perform the same
r = p.flatten()
r

array([0.        , 1.        , 2.        , 0.95910292, 0.80030274,
       0.82456898, 3.        , 4.        , 5.        , 0.92427204,
       0.43480263, 0.10532002, 6.        , 7.        , 8.        ,
       0.8193717 , 0.01387159, 0.65119362])

#### EXERCISE 2.3: Properties of Arrays
* Create two 2D arrays
* Reshape one or both of them 
* Stack them together
* Convert this array to a 1D array 

In [96]:
## EXERCISE CODE HERE 
ex_arr3 = np.random.random((2,3))
ex_arr4 = np.random.random((2,3))
print("ex_arr3")
print(ex_arr3)
print()
print("ex_arr4")
print(ex_arr4)
ex_arr5 = np.vstack((ex_arr3,ex_arr4))
print()
print("ex_arr5 created by vertical stacking of ex_arr3 and ex_arr4")
print(ex_arr5)

ex_arr3 = ex_arr3.reshape((3,-1))
print()
print("ex_arr3 after reshaping from 2by3 to 3by2")
print(ex_arr3)

print()
print("ex_arr5 after converting to 1D array using ravel function")
print(ex_arr5.ravel())


ex_arr3
[[0.89687164 0.3167747  0.20865634]
 [0.40696316 0.43234252 0.11801561]]

ex_arr4
[[0.71267417 0.65440634 0.03272594]
 [0.54814046 0.57282877 0.12560244]]

ex_arr5 created by vertical stacking of ex_arr3 and ex_arr4
[[0.89687164 0.3167747  0.20865634]
 [0.40696316 0.43234252 0.11801561]
 [0.71267417 0.65440634 0.03272594]
 [0.54814046 0.57282877 0.12560244]]

ex_arr3 after reshaping from 2by3 to 3by2
[[0.89687164 0.3167747 ]
 [0.20865634 0.40696316]
 [0.43234252 0.11801561]]

ex_arr5 after converting to 1D array using ravel function
[0.89687164 0.3167747  0.20865634 0.40696316 0.43234252 0.11801561
 0.71267417 0.65440634 0.03272594 0.54814046 0.57282877 0.12560244]


### 3. Performing Operations on Arrays

* We can perform a range of operations on arrays, including: 
  * Arithmetic operations between arrays
  * Boolean operations 
  * Universal functions
  * Mathematical and Statistical operations 
* We will cover examples of these below; these are not the only operations that can be performed but will provide you with a good toolkit for a range of operations.

In [97]:
# Any arithmetic operations between equal-size arrays applies the operation element-wise:
r + q

array([ 0.        ,  2.        ,  4.        ,  3.95910292,  4.80030274,
        5.82456898,  9.        , 11.        , 13.        ,  1.88337496,
        1.23510537,  0.92988899,  6.92427204,  7.43480263,  8.10532002,
        1.6387434 ,  0.02774319,  1.30238723])

In [98]:
# we can also use universal functions (ufuncs), a function that performs element-wise operations on data in ndarrays
np.add(m,n)

array([[0.95910292, 1.80030274, 2.82456898],
       [3.92427204, 4.43480263, 5.10532002],
       [6.8193717 , 7.01387159, 8.65119362]])

In [99]:
# the same applies to other arithmetic functions
m * n 

array([[0.        , 0.80030274, 1.64913796],
       [2.77281612, 1.73921051, 0.52660009],
       [4.91623021, 0.09710116, 5.20954892]])

In [100]:
# with the ufunc equivalent 
np.multiply(r,q)

array([0.00000000e+00, 1.00000000e+00, 4.00000000e+00, 2.87730877e+00,
       3.20121097e+00, 4.12284489e+00, 1.80000000e+01, 2.80000000e+01,
       4.00000000e+01, 8.86472014e-01, 3.47973736e-01, 8.68436188e-02,
       5.54563223e+00, 3.04361840e+00, 8.42560136e-01, 6.71369986e-01,
       1.92421110e-04, 4.24053125e-01])

In [101]:
# the same principles apply to subtraction and division 
# Arithmetic operations with scalars apply the scalar argument to each element in the array
1 / a

array([0.2, 1. , 0.5])

In [102]:
# another example, raising a to the power 0.5
a ** 0.5

array([2.23606798, 1.        , 1.41421356])

In [103]:
# here are a couple more examples - calculate the exponential of all array elements
np.exp(a)

array([148.4131591 ,   2.71828183,   7.3890561 ])

In [104]:
# Elementwise square root
np.sqrt(r)

array([0.        , 1.        , 1.41421356, 0.979338  , 0.89459641,
       0.90805781, 1.73205081, 2.        , 2.23606798, 0.96139068,
       0.65939565, 0.32453046, 2.44948974, 2.64575131, 2.82842712,
       0.90519153, 0.11777773, 0.80696568])

In [105]:
# create two arrays
s = np.random.randn(8)
t = np.random.randn(8)

# compare the arrays 
np.maximum(s, t)

array([ 0.95285687,  1.27965742,  0.96866644,  1.65433082,  1.24401458,
        1.12387922,  0.29605999, -0.50882806])

In [106]:
# statistical methods are available, such as mean(), sum() and cumsum() - cumulative sum()
a.mean()

2.6666666666666665

In [107]:
# this format also works for all the statistical methods
np.sum(a)

8

In [112]:
# we can use boolean operations to filter valies
u = np.array([[1,2], [3, 4], [5, 6]])

# print array
print(u)

# first find the elements of a that are bigger than 2
print("Find elements of array bigger than 2")
print((u > 2))

[[1 2]
 [3 4]
 [5 6]]
Find elements of array bigger than 2
[[False False]
 [ True  True]
 [ True  True]]


In [113]:
# then use the mask to get the actual value
print("Values of array elements bigger than 2")
print(u[u > 2])

Values of array elements bigger than 2
[3 4 5 6]


#### EXERCISE: Performing Operations on Arrays
* Create two arrays
* Perform an operation between the two arrays
* Apply a scalar operation to one of the arrays
* Use boolean indexing to select a subset of one of the arrays

In [121]:
## EXERCISE CODE GOES HERE 
ex_arr6 = np.ones((3,3))
ex_arr7 = np.eye(3)

print("ex_arr6")
print(ex_arr6)
print()
print("ex_arr7")
print(ex_arr7)

print()
print("Performing addition of above 2 arrays")
print(ex_arr6 + ex_arr7)

print()
print("Performing sclar operation on array ex_arr6")
print(ex_arr6 * 2)

print()
print("Use boolean indexing to select a subset of ex_arr7")
print(ex_arr7[ex_arr7 > 0])

ex_arr6
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

ex_arr7
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

performing addition of above 2 arrays
[[2. 1. 1.]
 [1. 2. 1.]
 [1. 1. 2.]]

performing sclar operation on array ex_arr6
[[2. 2. 2.]
 [2. 2. 2.]
 [2. 2. 2.]]

Use boolean indexing to select a subset of ex_arr7
[1. 1. 1.]


### Review and Further Reading
* We now have a good background in numpy, in particular what the library is and the benefit of vectorized operations over the approach taken by python on its own; how to create and locate values in arrays; how to perform operations on arrays; and how to broadcast between different numpy arrays.
* There is of course much more to learn. We would suggest looking next at:
  * `fancy indexing`
  * `Broadcasting` (https://numpy.org/doc/stable/user/basics.broadcasting.html)
  * Exploring `ufuncs`, mathematical and statistical operations in greater detail  https://numpy.org/doc/stable/reference/ufuncs.html
  * Array-Oriented Programming with Arrays
  * Expressing Conditional Logic as Array Operations
  * Linear Algebra



* Here are some follow-up resources:
  * http://cs231n.github.io/python-numpy-tutorial/
  * the books linked to at the start of the tutorial

