# C1 W1 Lab 1 - Intro NumPy Arrays (matrices)

<a name='1'></a>
# 1 - NumPy Basics #

NumPy = main package for scientific computing in Python.

<u>Key functions</u>:

- __Creating arrays, slicing, indexing, reshaping and stacking__

<a name='1-1'></a>
## 1.1 - Importing the library and 1-D array ##

In NumPy, array object is called `ndarray` i.e., *'n-dimensional array'*.
- Most common array types: __one-dimensional array ('1-D')__. 

In [None]:
# 1.1 - import NumPy
import numpy as np

# 1.2 -'1-D' array.
one_dimensional_arr = np.array([1, 2, 3, 4, 5])
print(one_dimensional_arr)

<a name='1-2'></a>
## 1.2 - NumPy Advantages ##

__Array__ = core data structure of NumPy, <u>_essential for data organisation_</u>.

- __Visualisation__ - a grid of values, *all elements are of the same data type*.

- __Lists operational complexity__ - lists are limited in functions + greater space and time complexity
- __Arrays operational complexity__ - array objects *much* faster and more compact than Python lists.
    + Because of their underlying implemendation in <u>*optimized C code*</u> and *efficient memory management*.
- __Huge assortment of built-in functions__ - Fast/easy computing with only a few lines of code.
    + Game changer for performing math operations on <u>large datasets</u>. 

<a name='1-3'></a>
## 1.3 - Array creation functions - array(), arange(), linspace(), dtype ##

* __`np.arange()`__ - takes a list of values as an arg and returns a 1-D array.

* __`np.arange(start, stop, step)`__ - allows you to specify the __number__ of the steps.
    + Works similar to range().
    + <u>_Stop value is exclusive._</u>
    + Potential errors when using non-int steps due to floating-point inaccuracies.

* __`np.linspace(start, stop, num)`__ - allows you to specify the __size__ of the steps.
    + `num` = number of samples. NumPy auto-calcs the required step size to fit 'num' samples into the interval.
    + <u>_Stop value is inclusive by default_</u>
    + Default value type = floating point `(np.float64)`. Thus, better suited for floats then arange()!
    + __Any value that is a `float` will have `.`next to it!!!!__ See example below.

* __`dtype`__ - data type object.
    + `dtype`= - optional parameter you can change the default argument for.
    + `array.dtype` - to find the data type of the array.

In [28]:
# Create and print a NumPy array 'a' containing the elements 1, 2, 3.
a = np.array([1, 2, 3])
print(a)

# Create an array with 3 integers, starting from the default integer 0.
b = np.arange(3)
print(b)

# Create an array that starts from the int 1, ends at 20, incremented by 3. 
c = np.arange(1, 20, 3)
print(c)

# Create an array with 5 samples evenly spaced between 0 and 100.
lin_spaced_arr = np.linspace(0, 100, 5)
print(lin_spaced_arr)

# Messing around with the dtype parameter.
lin_spaced_arr_int = np.linspace(0, 100, 5, dtype=int)
b_float = np.arange(3, dtype=float)
c_int = np.arange(1, 20, 3, dtype=int)
char_arr = np.array(['Welcome to Math for ML!'])

print(lin_spaced_arr_int)
print(b_float)
print(c_int)
print(char_arr)
print(f"{char_arr.dtype}\n"
      f"\t<  : Little-endian (indicates byte order). '<' is standard in (x86 and ARM computer architecture)\n"
      f"\t                 'byte order' = the order in which multi-byt data types are stored in memory\n"
      f"\tU  : Unicode character string\n"
      f"\t23 : 23 items (characters) long")

[1 2 3]
[0 1 2]
[ 1  4  7 10 13 16 19]
[  0.  25.  50.  75. 100.]
[  0  25  50  75 100]
[0. 1. 2.]
[ 1  4  7 10 13 16 19]
['Welcome to Math for ML!']
<U23
	<  : Little-endian (indicates byte order). '<' is standard in (x86 and ARM computer architecture)
	                 'byte order' = the order in which multi-byt data types are stored in memory
	U  : Unicode character string
	23 : 23 items (characters) long


<a name='1-4'></a>
## 1.4 - More on NumPy arrays ##

One of the advantages of using NumPy is that you can easily create arrays with built-in functions such as: 
- `np.ones()` - Returns a new array setting values to one.
- `np.zeros()` - Returns a new array setting values to zero.
- `np.empty()` - Returns a new uninitialized array. 
    + NB: **`np.empty()` != `np.zeros()`!**
        + The former does <u>NOT intialize</u> the content of the array; the conents are arbitrary "garbage" values left in the allocated memory!
        + The latter <u>does initialize</u> the elements to zero. 
- `np.random.rand()` - Returns a new array with values chosen at random.

In [39]:
# Return a new array of shape 3, filled with ones. 
ones_arr = np.ones(3)
print(ones_arr)

# Return a new array of shape 3, filled with zeroes.
zeros_arr = np.zeros(3)
print(zeros_arr)

# Return a new array of shape 3, without initializing entries.
empt_arr = np.empty(3)
print(empt_arr)

# Return a new array of shape 3 with random numbers between 0 and 1.
rand_arr = np.random.rand(3)
print(rand_arr)

[1. 1. 1.]
[0. 0. 0.]
[0. 0. 0.]
[0.66206761 0.08735222 0.70532234]


<a name='2'></a>
# 2 - Multidimensional Arrays #

Think of a multidimensional array as an excel sheet where *each row/column represents a dimension*.


![1-D, 2-D, and 3-D Array Visualization](./multidimensional_arrays.png)

<a name='2-1'></a>
## 2.1 - Creating multidimensional arrays. ##

### Method #1:
Use `np.array` as usual, and add your new 'rows' by separating it with a comma `,`.

### Method #2:
Use __`np.reshape`__ to reshape an initial 1-D array (its elements).

In [None]:
# 1-D array
one_d_arr = np.array([1, 2, 3, 4, 5, 6])

# Creating a 2-D array
two_d_arr = np.array([[1, 2, 3], [4, 5, 6]])
print(two_d_arr)

# Creating a 3-D array
three_d_arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(three_d_arr)

# Multidimensional array using reshape()
multi_d_arr = np.reshape(
                one_d_arr, # the array to be reshaped
               (2,3) # dimensions of the new array: 2 rows, 3 columns
              )
# Print the new 2-D array with two rows and three columns
print(multi_d_arr)

<a name='2-2'></a>
## 2.2 - Finding size, shape and dimension. ##

These are all atrributes of a `ndarray` and can be accessed as follows:
- `ndarray.ndim` - Stores the number dimensions of the array. 
- `ndarray.shape` - Stores the shape of the array. Each number in the tuple denotes the lengths of each corresponding dimension.
- `ndarray.size` - Stores the number of elements in the array

In [None]:
# A clean 2x3 array using nested lists
multi_dim_arr = np.array([
    [1, 2, 3], 
    [4, 5, 6]
])
# Dimension of the 2-D array multi_dim_arr
print(multi_dim_arr.ndim)

# Shape of the 2-D array multi_dim_arr; Returns shape of 2 rows and 3 columns
print(multi_dim_arr.shape)

# Size of the array multi_dim_arr; Returns total number of elements
print(multi_dim_arr.size)

<a name='3'></a>
# 3 - Array math operations #

NumPy allows you to *quickly* perform __elementwise__ addition, substraction, multiplication and division for both 1-D and n-Dim arrays. 

- Use standard operators (math symbols) for each: `+`, `-`, `/` and `*`. 
- _Time and Space complexity_ = __O(n)__

<u>Comparison to Python Lists</u>:
- *Adding lists appends them!*
- Subtraction and multiplication of Python lists do not work. 

In [None]:
arr_1 = np.array([2, 4, 6])
arr_2 = np.array([1, 3, 5])

# Adding two 1-D arrays
addition = arr_1 + arr_2
print(addition)

# Subtracting two 1-D arrays
subtraction = arr_1 - arr_2
print(subtraction)

# Multiplying two 1-D arrays elementwise
multiplication = arr_1 * arr_2
print(multiplication)

<a name='3-1'></a>
## 3.1 - Multiplying vector with a scalar (broadcasting) ##

<u>**Broadcasting**</u>

Allows you to perform (element-wise) operations on *arrays of different shapes*. 
- By virtually _"stretching"_ or _"replicating"_ the smaller array to match the larger one, <u>avoiding explicit loops</u> and making code concise + efficient.

<u>E.g., you want to convert an array in miles into kilometers</u>:
- Multiply the array with a single number (the conversion rate, i.e., a scalar [1 mile = 1.6 km]).
- **NumPy computes each multiplication within each cell**. 

![NumPy Broadcasting](./Broadcasting.jpg)

<a name='4'></a>
# 4 - Indexing and Slicing - The Bread and Butter#

## 4.1 - Indexing ##
Allows you to select specific elements from an array. 
- It also lets you select <i>__entire rows/columns or planes__</i> for multidimensional arrays. 

For multi_dim arrays of shape `n`, you must put `n` indices to index a specific element. __One for each dimension!__

In [None]:
# Select the third element of the array. Remember the counting starts from 0.
a = ([1, 2, 3, 4, 5])
print(a[2])

# Select the first element of the array.
print(a[0])

# Indexing on a 2-D array
two_dim = np.array(([1, 2, 3],
          [4, 5, 6], 
          [7, 8, 9]))

# Select element number 8 from the 2-D array using indices i, j.
print(two_dim[2][1])

<a name='4-2'></a>
## 4.2 - Slicing ##
Slicing gives you a **sublist of elements** that you specify from the array. 

The slice notation specifies a `start` and `end` value, and copies the list from `start` up to but not including the end *(end-exclusive)*. Syntax:

`array[start:end:step]`

**Default Arguments** (if no value passed)
- `start = 0`
- `end = length of array`
- `step = 1`.

In [None]:
a = ([1, 2, 3, 4, 5])

# Slice the array a to get the array [2,3,4]
sliced_arr = a[1:4]
print(sliced_arr)

# Slice the array a to get the array [1,2,3]
sliced_arr = a[:3]
print(sliced_arr)

# Slice the array a to get the array [3,4,5]
sliced_arr = a[2:]
print(sliced_arr)

# Slice the array a to get the array [1,3,5]
sliced_arr = a[::2]
print(sliced_arr)

# Note that a == a[:] == a[::]
print(a == a[:] == a[::])

two_dim = np.array([
    [1, 2, 3], 
    [4, 5, 6],
    [7, 8, 9]
])

# Slice the two_dim array to get the first two rows
sliced_arr_1 = two_dim[0:2]
print(sliced_arr_1)

# Slice the two_dim array to get the last two rows
sliced_two_dim_rows = two_dim[1:3]
print(sliced_two_dim_rows)

# Slice the two_dim array to get 2nd column
sliced_two_dim_cols = two_dim[:,1]
print(sliced_two_dim_cols)

<a name='5'></a>
# 5 - Stacking #
A feature of NumPy to further customize arrays. 
- I.e., to <u>join</u> two or more arrays, either horizontally or vertically, meaning that it is done along a **new axis**. 

- `np.vstack()` - stacks vertically
- `np.hstack()` - stacks horizontally
- `np.hsplit()` - splits an array into several smaller arrays

![Array stacking](./Stacking.png)

In [None]:
# Stacking arrays
a1 = np.array([[1,1], 
               [2,2]])
a2 = np.array([[3,3],
              [4,4]])
print(f'a1:\n{a1}')
print(f'a2:\n{a2}')

# Stack the arrays vertically
vert_stack = np.vstack((a1, a2))
print(vert_stack)

# Stack the arrays horizontally
horz_stack = np.hstack((a1, a2))
print(horz_stack)