# 03 Numpy 


## Plan for this lecture

1. Numpy Introduction 

2. Contrast Array and Python List 

3. Exercises

## Introduction to Numpy 

![numpy_logo](https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/NumPy_logo_2020.svg/320px-NumPy_logo_2020.svg.png)
* Numerical Python (NumPy) is a package full of methods that can perform useful operations on data.  

* NumPy provides a convenient API (Application Programmable Interface) that provides a way to ‘interface’ with / operate on data. 

* It reintroduces types which is more coding but more efficient way to search/sort/store data than the ‘loosely’ typed nature of Python that we’ve seen so far. 

* More documentation available at: https://numpy.org 


## Numpy Arrays vs Python List

* NumPy arrays are different to Python Lists. 

* NumPy arrays reintroduce the ‘typed’ nature of more ‘verbose’ languages (C, C++, Java), where everything is explicitly typed. 

* NumPy arrays operate like arrays from C and Java where they declared to store data of one type (only integers), unlike Python and JS, which can store data of different types. 

* NumPy arrays therefore data is ‘cast’ – floating point numbers to integers, or in some cases – an error is produced (strings to integers).

## Getting started with Numpy

* You'll either need to install this if you're in VSC. 

* OR if you're in Anaconda or Google Colab, you should have access to Numpy already... just need to import it.

`pip install numpy`

`python3 -m pip install -U numpy --user`

In [4]:
import numpy as np 
np

<module 'numpy' from '/Users/nick/Library/Python/3.9/lib/python/site-packages/numpy/__init__.py'>

In [76]:
a = np.array([1,2,3,4,5,6]) 
a


array([1, 2, 3, 4, 5, 6])

## Upcasting to floating points

Notice below how one floating point number will upcast all the integers to floats

In [77]:
a = np.array([3.14,2,3,4,5]) 
a


array([3.14, 2.  , 3.  , 4.  , 5.  ])

In [78]:
a = np.array([1,2,3],dtype='float32')
a


array([1., 2., 3.], dtype=float32)

## 2-Dimensions!

In [79]:
a = np.array([[1,2,3],[4,5,6]]) 
a


array([[1, 2, 3],
       [4, 5, 6]])

## Array functions

In [80]:
a = np.arange(3) 
a


array([0, 1, 2])

In [81]:
a = np.arange(10) 
a


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [32]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [31]:
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [82]:
a = np.ones((3, 2)) 
a


array([[1., 1.],
       [1., 1.],
       [1., 1.]])

In [83]:
a = np.full((2, 2), 5) 
a


array([[5, 5],
       [5, 5]])

In [84]:
a = np.full((3, 3), 7) 
a


array([[7, 7, 7],
       [7, 7, 7],
       [7, 7, 7]])

In [30]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

## Random

In [85]:
a = np.random.random((2,2)) 
a


array([[0.64254536, 0.71292754],
       [0.27729644, 0.96306581]])

In [86]:
a = np.random.randint((2,2)) 
a


array([0, 1])

In [87]:
a = np.random.randint(2, size=10) 
a


array([1, 1, 1, 1, 0, 1, 0, 0, 1, 0])

In [88]:
a = np.random.randint(2, size=(2,2)) 
a


array([[1, 0],
       [1, 1]])

In [89]:
a = np.random.randint(2, size=(3,2)) 
a


array([[1, 0],
       [1, 0],
       [1, 0]])

In [90]:
a = np.random.randint(2, size=(2,3)) 
a


array([[0, 1, 0],
       [1, 1, 0]])

In [91]:
a = np.random.randint(3, size=(3,3)) 
a


array([[1, 2, 2],
       [1, 2, 1],
       [0, 2, 1]])

In [92]:
a = np.random.randint(9, size=(3,3)) 
a


array([[8, 0, 1],
       [3, 7, 6],
       [8, 2, 8]])

In [93]:
a = np.random.randint(9, size=(3,3,3))
a


array([[[5, 4, 6],
        [6, 6, 6],
        [4, 8, 4]],

       [[2, 6, 8],
        [4, 4, 6],
        [7, 8, 0]],

       [[0, 8, 5],
        [4, 7, 3],
        [2, 7, 1]]])

In [94]:
a = np.linspace(0, 1, 11)
a


array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

## Array attributes

In [95]:
a = np.random.randint(9, size=(3,3))
print("size:", a.size)
print("shape:", a.shape)
print("dimensions:", a.ndim)


size: 9
shape: (3, 3)
dimensions: 2


In [96]:
a = np.random.randint(9, size=(3,3,3))
print("size:", a.size)
print("shape:", a.shape)
print("dimensions:", a.ndim)

size: 27
shape: (3, 3, 3)
dimensions: 3


## Array slicing `[ : ]`

* `array_name[ start : stop : interval]`

In [97]:
a = np.arange(10)
a


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [101]:
print(a[3])


3


In [102]:
a[:3]


array([0, 1, 2])

In [103]:
a[3:]


array([3, 4, 5, 6, 7, 8, 9])

In [104]:
a[3:6]


array([3, 4, 5])

Steps/intervals of 2:

In [105]:
a[3:6:2]


array([3, 5])

Mutiples of 3

In [106]:
a[::3]


array([0, 3, 6, 9])

## Row and Col access

In [107]:
a = np.random.randint(9, size=(3,3)) 
a


array([[2, 0, 5],
       [5, 2, 3],
       [6, 1, 1]])

All of the first column (column 0) - displayed in a one-dimensional form

In [112]:
a[:,0]


array([2, 5, 6])

All of the first row (row 0)

In [113]:
a[0,:]


array([2, 0, 5])

All of the second row (row 1)

In [114]:
a[1,:]


array([5, 2, 3])

## Iteration vs Vectorisation?

* NumPy leverages vectorization, allowing operations to be performed on entire arrays without explicit Python loops. These operations are implemented in optimized C and Fortran code under the hood.

* Python Lists require explicit loops or list comprehensions for element-wise operations, which are executed in the slower Python interpreter.


In [40]:
import numpy as np
import time
import timeit

# Define the size of the data structures
size = 1000000

# Create a Python list and a NumPy array with the same elements
py_list = list(range(size))
np_array = np.arange(size)

In [41]:
def list_addition(py_list):
    return [x + 1 for x in py_list]

# Time the list addition
start_time = time.time()
list_result = list_addition(py_list)
py_add_time = time.time() - start_time
print(f"Python List Addition Time: {py_add_time:.6f} seconds")

Python List Addition Time: 0.035337 seconds


In [42]:
def numpy_addition(np_array):
    return np_array + 1

# Time the NumPy addition
start_time = time.time()
array_result = numpy_addition(np_array)
np_add_time = time.time() - start_time
print(f"NumPy Array Addition Time: {np_add_time:.6f} seconds")

NumPy Array Addition Time: 0.007479 seconds


In [43]:
def list_multiplication(py_list):
    return [x * 2 for x in py_list]

# Time the list multiplication
start_time = time.time()
list_mul_result = list_multiplication(py_list)
py_mul_time = time.time() - start_time
print(f"Python List Multiplication Time: {py_mul_time:.6f} seconds")

Python List Multiplication Time: 0.032827 seconds


In [45]:
def numpy_multiplication(np_array):
    return np_array * 2

# Time the NumPy multiplication
start_time = time.time()
array_mul_result = numpy_multiplication(np_array)
np_mul_time = time.time() - start_time
print(f"NumPy Array Multiplication Time: {np_mul_time:.6f} seconds")

NumPy Array Multiplication Time: 0.003948 seconds


In [46]:
def list_sum(py_list):
    return sum(py_list)

def list_mean(py_list):
    return sum(py_list) / len(py_list)

# Time the list sum and mean
start_time = time.time()
list_sum_result = list_sum(py_list)
list_mean_result = list_mean(py_list)
py_agg_time = time.time() - start_time
print(f"Python List Sum and Mean Time: {py_agg_time:.6f} seconds")

Python List Sum and Mean Time: 0.015815 seconds


In [47]:
def numpy_sum(np_array):
    return np.sum(np_array)

def numpy_mean(np_array):
    return np.mean(np_array)

# Time the NumPy sum and mean
start_time = time.time()
array_sum_result = numpy_sum(np_array)
array_mean_result = numpy_mean(np_array)
np_agg_time = time.time() - start_time
print(f"NumPy Array Sum and Mean Time: {np_agg_time:.6f} seconds")

NumPy Array Sum and Mean Time: 0.007957 seconds


Use NumPy Arrays When:
* Performing numerical computations on large datasets.
* Needing to leverage vectorized operations for performance.
* Requiring efficient memory usage.

Use Python Lists When:
* Dealing with heterogeneous data types.
* Performing operations that require dynamic resizing.
* Managing data that doesn’t require intensive numerical computations.

## Types 

* NumPy Arrays are homogeneous, meaning all elements are of the same data type. This uniformity allows for more efficient storage and computation.

* Python Lists are heterogeneous, capable of storing elements of different data types, which adds overhead to manage type information.

## Contiguous memory allocation? 

* NumPy Arrays store data in contiguous blocks of memory, which enhances cache performance and allows for more efficient access and manipulation.

* Python Lists, on the other hand, store references to objects scattered throughout memory, leading to slower access times, especially for large datasets.

* NumPy arrays are generally faster and more efficient for numerical operations compared to Python lists, primarily due to their contiguous memory layout and vectorized operations implemented in optimized C code. 

In [35]:
py_list = [10, 20, 30, 40, 50]

# Print memory addresses of list elements
for i, element in enumerate(py_list):
    print(f"Element {i}: {element}, Memory Address: {id(element)}")

Element 0: 10, Memory Address: 4300778064
Element 1: 20, Memory Address: 4300778384
Element 2: 30, Memory Address: 4300778704
Element 3: 40, Memory Address: 4300779024
Element 4: 50, Memory Address: 4300779344


In [23]:
np_arr = np.array([10,20,30,40,50])

In [28]:
base_address = np_arr.__array_interface__['data'][0]

# Check memory address of the first element in the array
print("NumPy array memory address:", base_address)

NumPy array memory address: 5687035680


In [29]:
np_arr.dtype

dtype('int64')

In [31]:
element_size = np_arr.itemsize
print(element_size)

8


* 64 bits = 8 bytes

In [32]:

for i in range(len(np_arr)):
    element_address = base_address + i * element_size
    print(f"Element {i}: {np_arr[i]}, Memory Address: {element_address}")

Element 0: 10, Memory Address: 5687035680
Element 1: 20, Memory Address: 5687035688
Element 2: 30, Memory Address: 5687035696
Element 3: 40, Memory Address: 5687035704
Element 4: 50, Memory Address: 5687035712


* 64 bit = 8 bytes 

* Therefore the addresses are 8 bytes apart. 

## Differences in memory usage

* Numpy built on C language

In [41]:
import numpy as np 
import sys

In [34]:
tuple_ex = tuple(range(1000))
list_ex = list(range(1000))
numpy_ex = np.array([range(1000)])
print("Space taken by tuple =",tuple_ex.__sizeof__()," bytes")
print("Space taken by list =",list_ex.__sizeof__()," bytes")
print("Space taken by NumPy array =",numpy_ex.__sizeof__()," bytes")

Space taken by tuple = 8024  bytes
Space taken by list = 8040  bytes
Space taken by NumPy array = 8128  bytes


In [40]:
a = np.array([1,2,3,4,5])

In [26]:
type(a[3])

numpy.int64

In [28]:
a[3].nbytes

8

In [32]:
a.size

5

In [33]:
a.nbytes

40

In [37]:
py_l = [1,2,3,4,5]

In [36]:
type(py_l[3])

int

In [49]:
sys.getsizeof(py_l[3])

28

In [50]:
sys.getsizeof(py_l)

120

In [56]:
py_l[3].bit_length()

3

In [63]:
py_l = [123456789,12]

In [64]:
py_l[0].bit_length()

27

In [38]:
py_l[1].bit_length()

2

In [39]:
import numpy as np
import sys

# a. Create structures
size = 1000000
py_list = list(range(size))
np_array = np.arange(size)

# b. Check memory usage
print("Memory size of Python list:", sys.getsizeof(py_list), "bytes")
print("Memory size of NumPy array:", np_array.nbytes, "bytes")

Memory size of Python list: 8000056 bytes
Memory size of NumPy array: 8000000 bytes


In [21]:
import numpy as np
import statistics
import time

# Create a NumPy array and a Python list
data_size = 1000000
np_array = np.random.rand(data_size)
py_list = np_array.tolist()

# a. NumPy aggregation
start_time = time.time()
np_sum = np_array.sum()
np_mean = np_array.mean()
np_std = np_array.std()
np_time = time.time() - start_time
print("NumPy - Sum:", np_sum, "Mean:", np_mean, "Std Dev:", np_std)
print("NumPy Aggregation Time:", np_time, "seconds")

# b. Python list aggregation
start_time = time.time()
py_sum = sum(py_list)
py_mean = py_sum / len(py_list)
py_std = statistics.stdev(py_list)
py_time = time.time() - start_time
print("Python List - Sum:", py_sum, "Mean:", py_mean, "Std Dev:", py_std)
print("Python List Aggregation Time:", py_time, "seconds")

NumPy - Sum: 500008.33116532874 Mean: 0.5000083311653287 Std Dev: 0.28875364043318225
NumPy Aggregation Time: 0.005028247833251953 seconds
Python List - Sum: 500008.33116533293 Mean: 0.5000083311653329 Std Dev: 0.28875378481011077
Python List Aggregation Time: 1.0715599060058594 seconds


In [22]:
import numpy as np

# a. NumPy array and Python list
np_array = np.array([1, 2, 3])
py_list = [4, 5, 6]

# Concatenate using np.concatenate
combined = np.concatenate((np_array, py_list))
print("Combined with np.concatenate:", combined)
print("Data type:", combined.dtype)

# Alternative: Extend the list with array
py_list_extended = py_list.copy()
py_list_extended.extend(np_array)
print("List extended with NumPy array:", py_list_extended)
print("Data types in extended list:", [type(x) for x in py_list_extended])

Combined with np.concatenate: [1 2 3 4 5 6]
Data type: int64
List extended with NumPy array: [4, 5, 6, np.int64(1), np.int64(2), np.int64(3)]
Data types in extended list: [<class 'int'>, <class 'int'>, <class 'int'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>]


## Exercise 

Compare `list.range()` function with `np.arange()`. Create both a Python list and a numpy array with values `[0,1,2,3,4,5,6,7,8,9]`. Remember to `import numpy as np` (if you haven't run any code cells above).

## Exercise 

Create a one-dimensional numpy array of 50 random integers between 0-9. Furthemore, specify the integers to be of `numpy.int8` type to save space. Check to see whether the elements are stored 1 byte apart (8 bits).

## Exercise 

Perform element-wise operations the numpy array. For example, add 5 to each of the elements. Now compare this with the same approach applied to the Python list. Do they operate in the same way? 

In [5]:
import numpy as np

np_array = np.arange(10)
np_array_plus_5 = np_array + 5

py_list = list(range(10))
py_list_plus_5 = [x + 5 for x in py_list]


In [6]:
np_array_plus_5

array([ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [7]:
py_list_plus_5

[5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

In [19]:
import numpy as np
import time

# a. Create large structures
size = 1000000
py_list = list(range(size))
np_array = np.arange(size)

# Define another list and array for operations
py_list2 = list(range(size))
np_array2 = np.arange(size)

# b. Element-wise addition
start_time = time.time()
py_list_sum = [x + y for x, y in zip(py_list, py_list2)]
py_add_time = time.time() - start_time
print("Python List Addition Time:", py_add_time)

start_time = time.time()
np_sum = np_array + np_array2
np_add_time = time.time() - start_time
print("NumPy Array Addition Time:", np_add_time)

# c. Element-wise multiplication
start_time = time.time()
py_list_mul = [x * y for x, y in zip(py_list, py_list2)]
py_mul_time = time.time() - start_time
print("Python List Multiplication Time:", py_mul_time)

start_time = time.time()
np_mul = np_array * np_array2
np_mul_time = time.time() - start_time
print("NumPy Array Multiplication Time:", np_mul_time)

Python List Addition Time: 0.0452427864074707
NumPy Array Addition Time: 0.0036542415618896484
Python List Multiplication Time: 0.03597521781921387
NumPy Array Multiplication Time: 0.0010449886322021484


## Exercise 

Now create a one dimensional numpy array with 64 elements. Initialise these to empty characters (empty `str` in Python). Then `reshape` this one-dimensional array into an 8 x 8 two-dimensional array.  

In [54]:
chess = np.full(64, " ")

In [55]:
chess

array([' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
       ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
       ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
       ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
       ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' '],
      dtype='<U1')

In [56]:
chess.reshape((8,8))

array([[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' '],
       [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' '],
       [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' '],
       [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' '],
       [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' '],
       [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' '],
       [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' '],
       [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']], dtype='<U1')

## Scenario Exercise - Chess Board

<img src="https://lh3.googleusercontent.com/proxy/REMapHsJd3Wf2rVl5OLw8tYsjEJRcDeOqlh2io-YvuIboUZXn_1flhYeuiKDGXpfkr4ADD_2DBXlpp6bEkA-j7ueo1AP12ijDeVhLZXhudwGEM6gJ67QikCgccSmyk7sBL0" alt="knight_chess" width="250"> 

Write a function that would move a knight chess piece to a given space on a chess board. Remember that knights move in an L shape. Two spaces then one space. 

To simulate this, add a 'K' character to any position in your 8 x 8 two-dimensional array, and apply your function. 

To start with, choose one of combinations and focus on getting this right. Then start to add more combinations. Perhaps you could let the user select which grid coordinate they want to move to?


## Extension: Web Chess

Can you draw this chess board in Flask, using your 8 x 8 numpy array?

#### 4. Create an array of 10 fives

array([ 5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.])

#### 6. Create an array of all the even integers from 10 to 50

array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42,
       44, 46, 48, 50])

#### 7. Create a 3x3 matrix with values ranging from 0 to 8

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

#### 10. Use NumPy to generate an array of 25 random numbers

array([ 1.32031013,  1.6798602 , -0.42985892, -1.53116655,  0.85753232,
        0.87339938,  0.35668636, -1.47491157,  0.15349697,  0.99530727,
       -0.94865451, -1.69174783,  1.57525349, -0.70615234,  0.10991879,
       -0.49478947,  1.08279872,  0.76488333, -2.3039931 ,  0.35401124,
       -0.45454399, -0.64754649, -0.29391671,  0.02339861,  0.38272124])

In [38]:
mat = np.arange(1,26).reshape(5,5)
mat

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

## Exercise 4: 

Without using any functions provided by libraries (e.g. `reverse()`), implement a function that will reverse the contents of an array. 


## Exercise 7: 
Given an image represented by an NxN matrix, where each pixel in the image is 4 bytes, write a function to rotate the image by 90 degrees. 

Extension: Can you do this in place? (without additional data structures)

## Exercise 

Write an algorithm such that if an element in an MxN matrix is 0, its entire row and column are set to 0. 