# Numpy I

## Arrays and Lists

Before getting introduced to Numpy library, we need to be familiar with a very widely used data structure called 'array'. An **array** is a collection of homogenous elements. Here homogenous means elements of the same data type. And so an array can be a collection of integers (*int* datatypes), collection of fractions/decimal values (*float* datatypes) or a collection of characters (*char* datatype) also referred to as a *string*.

A **list** is very similar to an array in structure differing only by the characteristic that, a list is a collection of heterogenous elements, i.e. elements with various datatypes.

<img src="../../../images/arrayvslist.png" width="800"><br>

Both arrays and lists can be accessed in the same way, using an index, which is a reference to the position of the element within the collection. Index begins from zero and ends with n-1, where n is the total length of the array (which is the number of elements in the array or list).

## 1. NumPy Introduction

NumPy is a library in Python that supports creation and operations on large, multidimensional arrays and matrices. It facilitates scientific and numeric computing with high level mathematical functions to operate on these arrays.

A NumPy object is generally a multi-dimensional array. It is a table of elements with the same data type (int, float, char, etc.), indexed by a tuple of positive integers (also called *indices*). In Numpy, dimensions are called *axes*. The number of axes is *rank*. (ref: scipy.org)

Consider a 3D space of x, y and z coordinates. A  point in 3D space [1.0, 3.0, 5.0] is an array of rank 1. If there are several such points as shown below, then the dimensions are of the nature m by n. In the example shown below, there are 4 rows and 3 columns (i.e., 3 elements or values for each point/observation), this translates to an n-dimensional array with a shape of 4 by 3.
```python
[[ 1.0, 0.3, 4.5],
 [ 0.5, 1.5, 2.3],
 [ 6.0, 4.6, 3.5],
 [ 4.5, 3.5, 6.3]]
```
To use a library, we need to import them. Any python library can be referenced by an alias that is mentioned during the import. For example, NumPy library is most commonly imported in the short form as np:
```python
# Importing the numpy library
import numpy as np
```

There are several ways to initialize an array in numpy :
* a = np.array([0,1,2,3])   ...   creates an array of rank (or dimensionality) 1
* a = np.array([[0,1,2,3],[4,5,6,7]])   ...   creates a 2x4 matrix  
* a = np.ones((3,3))   ...   creates a 3x3 matrix with all 1s
* a = np.zeroes((2,2))   ...   creates a 2x2 matrix with all 0s
* a = np.eye(3)   ...   creates a 3x3 matrix with 1s at the diagonal and 0s otherwise (i.e. an identity matrix)
  
Ref: http://www.numpy.org

We will look at various functions (most widely used functions) within the numpy library and practice applying them.

## 2. Casting a non-array datatype into an array

The np.array() method can be used to convert a variable of any data type into an n-dimensional array. By simply passing the variable as an argument to this function, we can convert the variable which can be an integer, float, list, series or dataframe into an n-dimensional array.

For Example:

```python
a = 5
type(a)
>>> int

a = np.array(a)
type(a)
>>> numpy.ndarray

b = [1.2, 3.4, 5.6]
type(b)
>>> list

b = np.array(b)
type(b)
>>> numpy.ndarray
```

### Exercise:

Given the list of numbers use the np.array() method to cast the list into a numpy array


In [1]:
import numpy as np

# Cast the below given list into an array
a = ['Hello',4,5,17.32,25.21,'c']

### Solution code

```python
a = np.array(a)
type(a)
```

<br>
### Wait! The above list was heterogenous...

Remember that an array is a collection of homogenous elements. However, in the above exercise the list 'a' was a collection of heterogenous elements/values (values of different datatypes). The np.array() method worked anyway and converted the list into an array. The method acheives this by converting all the list elements into string data types. Another interesting behavior to note is that once the elements are cast as a string data types, they cannot be individually re-cast into integer, float or other data types. All the elements of the array need to be converted.

i.e., In the above example:

```python
print(a[2],type(a[2]))
>>> 5 <class 'numpy.str_'>

# Casting to 'int'
a[2] = int(a[2])
print(a[2],type(a[2]))
>>> 5 <class 'numpy.str_'>
```
There is no change in array element data type, by individual recasting.

## 3. Casting a list of lists into a 2-dimensional array

A list of lists structure can be converted into a 2-D array (or matrix) using the np.array() method.

```python
my_list2 = [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]
list_of_lists = np.array(my_list2)
print(list_of_lists,"\n",type(list_of_lists))
>>> [[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]
  [10 11 12]]
  <class 'numpy.ndarray'>
```

## 4. Checking the shape and data type of a numpy array

Shape of an array can be described as the list of lengths of every dimension pertaining to the array. <br/>
For a 2-dimensional array, shape would be number of rows * number of columns, denoted as 'mxn' and read as 'm by n'.
The shape attribute returns the list of all lengths of each dimension of the array, as a 'tuple'.

<b>Note:</b> The length of the tuple returned by 'shape' attribute denotes the dimensionality of the array.

<img src="../../../images/numpy_1-shape_of_array.png">
<br>

The command shape is used to determine the dimensions of the array.

```python
print(list_of_lists.shape)
>>> (4, 3)
```

The data type of each individual elements in an array can be identified by using the 'dtype' attribute.

```python
my_list2 = [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]
list_of_lists = np.array(my_list2)
print(list_of_lists.dtype)
>>> int32
```

### Exercise:

Find the shape of the array below and assign to the variable: two_d_shape and print it out.
Also find the data type of the elements in the array assign to the variable: two_d_datatype and print it out.

```python
[[1, 0.3, 4.5]
 [2, 4.0, 6.0]]
```

In [4]:
import numpy as np

two_d_array = np.array([[ 1 ,  0.3,  4.5],
                        [2, 4.0, 6.0]])

### Solution code

```python
two_d_shape = two_d_array.shape
two_d_datatype = two_d_array.dtype
print(two_d_shape,two_d_datatype)
```

## 5. Re-shaping an array

The reshape() function in numpy helps us reshape a given array into an array with a specified new shape. For example,

<img src="../../../images/numpy_2-reshaping_array.png" width="500">
<br>
```python
shape_shifter = np.random.rand(12)
shape_shifter
>>> array([ 0.906423  ,  0.55807204,  0.28928162,  0.47020116,  0.27403332,
>>>         0.94178672,  0.81342077,  0.5859645 ,  0.63569185,  0.84614272,
>>>         0.36454835,  0.63664789])

shape_shifter.shape
>>> (12,)

shape_shifter.reshape(3,4)
>>> array([[ 0.906423  ,  0.55807204,  0.28928162,  0.47020116],
>>>        [ 0.27403332,  0.94178672,  0.81342077,  0.5859645 ],
>>>        [ 0.63569185,  0.84614272,  0.36454835,  0.63664789]])

shape_shifter.reshape(4,3)
>>> array([[ 0.906423  ,  0.55807204,  0.28928162],
>>>        [ 0.47020116,  0.27403332,  0.94178672],
>>>        [ 0.81342077,  0.5859645 ,  0.63569185],
>>>        [ 0.84614272,  0.36454835,  0.63664789]])
```

### Exercise

Change the shape of the given array to 2 rows, 5 columns

In [None]:
# Modify the code below

twor_fivec = np.arange(10)

### Solution code

```python
twor_fivec = twor_fivec.reshape(2,5)
twor_fivec
```

## 6. Indexing and Selection

The numpy array works like the list data structure and elements can be accessed by using their respective indices. The first element of an array is indexed with a '0' index and subsequent elements are indexed as 1,2,3...and so on, the nth element in the array will have an index of 'n-1'.

<img src="../../../images/numpy_2-indexing_array.png" width="400">
<br>
``` python
n_arr = np.array([1, 7, 4, 3, 3])

n_arr[3:5]
# Selects elements from index '3' to '4' (i.e until, but not including the specified end value)
>>> array([3, 3])

n_arr[:3]
# Absence of start value defaults to index '0' (i.e the first element)
>>> array([1, 7, 4])

n_arr[2:]
# Absence of end value defaults to index 'n-1' (i.e the last element)
>>> array([4, 3, 3])

n_arr[:]
>>> array([1, 7, 4, 3, 3])

n_arr[-1]
# Negative indexing corresponds to counting from the last
>>> 3
```

Elements of a numpy array can also be selected (or conditionally retrieved)  by using a condition in place of an index. When an array is subject to a condition (as we will show below), each element of the array will be validated against the said condition and a boolean array is generated which reflects the satisfaction of the set condition by every element of the array. When an 'array condition' is used in place of an index, the boolean array so generated gets passed to the outer array, and all elements which lie in the 'True' positions of the boolean array get retrieved. The below examples will clear this concept.

```python
array_one = np.array([16,  1,  8,  1, 17, 10,  8, 15,  6, 14])
array_one > 10
>>> array([ True, False, False, False, True, False, False, True, False, True])

array_one[array_one>10]
>>> array([16, 17, 15, 14])

array_two = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
array_two < 5
>>> array([ True,  True,  True,  True,  True, False, False, False, False, False])

array_two[array_two < 5]
>>> array([0, 1, 2, 3, 4])
```

### Exercise

Retrieve all elements in the given array that are greater than or equal to 25.63

In [None]:
array_three = np.array([46.56311588, 49.66285409, 28.01145694, 15.4632352, 16.36194605, 23.26915095, 36.77562698, 41.97868793, 35.6520983, 24.85098496])

### Solution code

```python
array_three[array_three >= 25.63]
```

## 7. Re-casting, Broadcasting and Duplicating arrays

Re-casting and broadcasting are two ways to change the values of an array. If one or more values (but not all) of an array are changed, it is called **re-casting**. If all values of the array are changed, it would be called as **broadcasting**. The above scenario where we conditionally extracted elements of array could be modified to conditionally re-cast certain elements of an array. Refer to the below examples:

<img src="../../../images/numpy_2-recasting_array.png" width="600">
<br>
* <b>Re-casting:</b>
```python
array_rec = np.array([16,  1,  8,  1, 17, 10,  8, 15,  6, 14])
array_rec[3:6] = 100
array_rec
>>> array([ 16,   1,   8, 100, 100, 100,   8,  15,   6,  14])
```

* <b>Broadcasting:</b>
```python
array_rec = np.array([16,  1,  8,  1, 17, 10,  8, 15,  6, 14])
array_rec[:] = 100
array_rec
>>> array([100, 100, 100, 100, 100, 100, 100, 100, 100, 100])
```

* <b>Conditional re-casting:</b>
```python
array_rec = np.array([16,  1,  8,  1, 17, 10,  8, 15,  6, 14])
array_rec[array_rec>10] = 100
array_rec
>>> array([100,   1,   8,   1, 100,  10,   8, 100,   6, 100])
```

**Duplicating** a numpy array is a tricky thing. As per normal programming routines, the value of a variable can be assigned to another variable, thus creating a copy. See example below:
```python
a = 10
b = 10
print(b)
>>> 10
```
However, when the same logic is used in assigning arrays, the values are not assigned but rather the pointers (or addresses) of original array elements are stored in the new array. It is for this reason that, any change in the second array will also reflect in the first array.
```python
arr_1 = np.array([1,2,3,4,5,6,7,8,9])
arr_2 = arr_1
arr_2[3:6] = 4444
print(arr_1,arr_2)
>>> [1, 2, 3, 4444, 4444, 4444, 7, 8, 9] [1, 2, 3, 4444, 4444, 4444, 7, 8, 9]

or

arr_1 = np.array([1,2,3,4,5,6,7,8,9])
arr_2 = arr_1
arr_2[:] = [1,22,333,4444,55555,666666,7777777,88888888,999999999]
print(arr_1,arr_2)
>>> [1, 22, 333, 4444, 55555, 666666, 7777777, 88888888, 999999999] [1, 22, 333, 4444, 55555, 666666, 7777777, 88888888,
>>>  999999999]
```

Hence, when a separate copy of an array is to be made, then the .copy() function needs to be used so as to create a new copy of the array which can be changed, without affecting the original array.

```python
arr_1 = np.array([1,2,3,4,5,6,7,8,9])
arr_2 = arr_1.copy()
arr_2[3:6] = 4444
print(arr_1,arr_2)
>>> [1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3, 4444, 4444, 4444, 7, 8, 9]
```

### Exercise

Given array 'tran_arr', create two copies of the array - one copy just referencing the values of the original array ('tran_arr') and another copy which duplicates the values of the array using .copy() function. Set 5th element of first copy array to 25, and set 5th element of second copy array to 50. Print all 3 arrays and observe the changes. 

In [None]:
# Modify this code

tran_arr = np.arange(1,21)
copy_1 = []
copy_2 = []

### Solution code

```python
copy_1 = tran_arr
copy_2 = tran_arr.copy()
copy_1[4] = 25
copy_2[4] = 50
print(tran_arr,"\n",copy_1,"\n",copy_2)
```

## 8. Arrays beyond matrices

An array is a list or collection of homogenous elements, i.e., same type of items. An N-dimensional array is a collection of such arrays, and in simplest terms can be described as an array of arrays.

A two dimensional array, also called a matrix (plural: matrices), is very common and most of us would be familiar with it. An array of matrices can be visualized as a 3 dimensional array. An array can be defined using the '.array' method of the numpy module. A range of functions such as dtype, shape, size, etc., are available to find out about various attributes of the array.

For more details, please refer to the documentation https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html of the ndarray method.

Looking at an example would help us understand in better detail.

### Exercise 1

* Create a three dimensional array, named tdarray consisting of 3 matrices - 
    1. $[[1,2,3],[a,b,c]]$
    2. $[[4,5,6],[d,e,f]]$
    3. $[[7,8,9],[g,h,i]]$
* Access the element 'h' using indices and store it into a variable called 'target'. Print target out.
* Print the data type of the array using the '.dtype' method, and the shape of the array using the '.shape' method.

In [None]:
# Importing numpy library
import numpy as np

### Solution code
```python
tdarray = np.array([[['1','2','3'],['a','b','c']],[['4','5','6'],['d','e','f']],[['7','8','9'],['g','h','i']]])
target = tdarray[2][1][1]
print(target,"\nDatatype of the array is: ",tdarray.dtype,"\nShape of the array is: ",tdarray.shape)
```

It is not possible to physically represent or visualize a n-dimensional array (where n>3). However, it is simple to code them. Let us try.

### Exercise 2

* Create a 4-dimensional array, named fdarray, with elements 1,2,3,4...so on and with a shape 4,3,2,2.
* Print out the shape of the array to verify your answer. Also access the element '10' using appropriate indices, assign it to a variable target2 and print out target2.

In [None]:
# Manual creation - Bad example
# Use the range function to create the data and reshape function to give it necessary dimensions

### Solution code

```python
# data creation
a = np.arange(1,(4*3*2*2)+1)

# Reshaping the 1-dimensional array 'a'
fdarray = np.reshape(a,(4,3,2,2))

# Retrieving target variable '10' using appropriate indices
target2 = fdarray[0][2][0][1]

# Printing results
print("Shape of the array is: ",fdarray.shape,"\n",target2)
```