# Math  1376: Programming for Data Science
---

## Module 02: Python basics

## Learning Objectives for Part (c)

- Understand the indexing of arrays of data.


- Be able to properly "slice" through arrays of data.


- Perform several useful operations on arrays of data.

## Notebook contents <a id='Contents'>

* <a href='#Indexing'>Part (c): The indexing of an array.</a>

  * <a href='#activity-array-arithmetic'>Activity: Experiment with array arithmetic and order of operations</a>
  
  * <a href='#activity-summary'>Activity: Summary</a>


## Part (c): The indexing of an array. <a id='Indexing'>
---
    
**Expected time to completion: 3 hours**
    
<span style='background:rgba(255,255,0, 0.25); color:black'> Run the code cell below and click the "play" button to see the recorded lecture associated with this notebook.</span> 

In [None]:
# 1. Running this cell with embed the short recorded lecture associated with this part of the notebook
# 2. Press on the "play" button to start the video.

from IPython.display import YouTubeVideo

YouTubeVideo('DJSJrxIu1lY', width=800, height=450)

### Python indexing is 0 based! 
---

<span style='background:rgba(255,0,255, 0.25); color:black'> ***Key Points:*** <span>

- The first entry in a row/column is indexed by 0. So, when we mention what would mathematically be referred to as the "$(1,1)$ component" (meaning the entry in the first row and first column) of a 2-dimensional array, we must use [0,0] (note the use of square brackets is *necessary*) to access that specific component.

  ***REMEMBER THIS!!!***
  
  More generally, this means that for the "$(i,j)$ component" of a 2-dimensional array (meaning the entry in the $i$th row and $j$th column), we must use $[i-1,j-1]$ to access that specific component.


- There is also another way to access components of an array using *negative integers*. It may seem a little weird at first, but it is very useful in a variety of practices. It basically works like this: if you want to access the *last* entry of an array of length $n$, you can use a $-1$ instead of $n-1$ (because of the indexing starting at zero) to access it. Use $-2$ instead of $n-2$ to get the second-to-last entry; use $-3$ instead of $n-3$ to get the third-to-last entry, and so on. 

### A visual aid
---

The following diagram may prove useful for typical 2-D arrays.


![Visual of indexing array goes here](sample_array_basics.png "Visual of indexing array")

In [None]:
import numpy as np

In [None]:
arr_2d = np.array( [ [1,2], [3,4], [5,6] ] )

print()
print(' arr_2d =\n', arr_2d )

print()
print( '(1,1) component of arr_2d is given by arr_2d[0,0] =',  arr_2d[0,0] )

print()
print( '(1,2) component of arr_2d is given by arr_2d[0,1] =', arr_2d[0,1] )

print()
print( '(2,1) component of arr_2d is given by arr_2d[1,0] =', arr_2d[1,0] )

print()
print( '(3,2) component of arr_2d is given by arr_2d[2,1] =', arr_2d[2,1] )

print()
print( '(3,2) component of arr_2d is also given by arr_2d[-1,1] =', arr_2d[-1,1] )

In [None]:
#Try uncommenting the next line to see an error. Can you explain it?
# print( arr_2d[2,2] ) 

In [None]:
arr_3d = np.array( [ [ [1,2,3], [4,5,6] ],[ [7,8,9],[10,11,12] ] ] )

print()
print(' arr_3d =\n\n', arr_3d )

print()
print( '(1,1,1) component of arr_3d is given by \n\n arr_3d[0,0,0] =\n\n',  arr_3d[0,0,0] )

print()
print( '(1,1,2) component of arr_3d is given by \n\n arr_3d[0,0,1] =\n\n', arr_3d[0,0,1] )

In [None]:
arr_2d = np.array( [ [1,2], [3,4], [5,6] ] )

print()
print(' arr_2d =\n\n', arr_2d )

# Using a colon : by itself when calling entries of an array will access all entries in the row/column
# where the colon appears.
print()
print( 'first row of arr_2d is given by arr_2d[0,:] =\n\n', arr_2d[0,:] ) #preferred 

print()
print( 'first row of arr_2d is also given by arr_2d[0] =\n\n', arr_2d[0] ) #not preferred

print()
print( 'first column of arr_2d is given by arr_2d[:,0] =\n\n', arr_2d[:,0] )

### Array Slicing
---

<span style='background:rgba(255,0,255, 0.25); color:black'> ***Key Points:*** <span>

- The colon `:` operator is used to specify a range of values in an array.


- The entry to the left of the `:` is included but the entry to the right of the `:` is not.


- Think of `i:j` being interpreted as "all entries starting at i up to, but ***not*** including, j"

    - Also, think of `i:j` being interpreted as "all entries starting at `i` up to, but ***not*** including, `j`"

    - And, think of `i:j` being interpreted as "all entries starting at `i` up to, but ***not*** including, `j`"

    - Before we forget, you should also think of `i:j` being interpreted as "all entries starting at `i` up to, but ***not*** including, `j`"


### Short quiz 

- How should you think of `i:j`?

In [None]:
A = np.reshape(range(1,13),(3,4)) # create a 3x4 array (i.e., a matrix) with the 1st 12 integers in it.

print()
print( 'The full matrix A is given by \n\n', A )

print()
print( 'To display just the 2nd and 3rd columns of A,\n' +
       'recall that we begin indexing from 0, so we use A[:,1:3] \n\n',
        A[:,1:3])   #All rows, 2nd and 3rd (but not 4th) column

In [None]:
B = A[[0, -1], :]

C = B[:, [1, 2]]

print (C)

## Omitting a (later) index: just because you can does not mean you should
---

<span style='background:rgba(255,0,255, 0.25); color:black'> ***Key Points:*** <span>
    
- If you have a multi-dimensional array, say `A` is a 2-dimensional array of shape $(3,4)$, then you can just use `A[0:2]` or `A[0:2,]` to perform slicing *assuming* that you wanted to have the entire first and second row of `A`. Basically, the ignored dimensions that *follow* the dimensions you have sliced are treated as if you had specified them with colons.

    If multiple rows and columns are being sliced, then this will ***generally*** give you the result that you expect. 
    
    But, if a single row/column is being sliced, then things can go a bit awry and unexpected results may ruin your code.
    
    
- A key takeaway from running the cells below is this: it is best to be explicit about what you want in your code and use "shortcuts" in code very carefully because it is easy to get strange results when you are being a bit too clever/slick in writing code. 

In [None]:
# Everything here "works as expected"
print()
print( 'A[0:2] =\n\n', A[0:2] )

print()
print( 'A[0:2,] =\n\n', A[0:2,] )

print()
print( 'A[0:2,:] =\n\n', A[0:2,:] )

print()
print( 'A[0:2]-A[0:2,:] =\n\n', A[0:2] - A[0:2,:])

In [None]:
# Everything here may look okay on first glance, but try printing the shapes
# to see that not is all as it may seem. The subtleties here are unlikely to 
# produce problems though. Check the next cell for issues with slicing single columns.
print()
print( 'A[0] =\n\n', A[0] ) # The first row

print()
print( 'A[0,] =\n\n', A[0,] ) # Also the first row? Compare shape of array to above

print()
print( 'A[0:1,:] =\n\n', A[0:1,:] )

print()
print( 'A[0]-A[0:1,:] =\n\n', A[0] - A[0:1,:])

In [None]:
# Slicing individual columns is more likely to produce strange results if you
# are not careful
print()
print( 'A[:,1] =\n\n', A[:,1] )

print()
print( 'A[:,1].shape =', A[:,1].shape )

print()
print( 'A[:,1:2] =\n\n', A[:,1:2] )

print()
print( 'A[:,1:2].shape =', A[:,1:2].shape )

print()
print( 'A[:,1]-A[:,1:2] =\n\n', A[:,1] - A[:,1:2] )

### Re-examining single column slicing
---

Looking at the execution of the previous code cell above, we arrive at the following conclusion:

> If we want to use slicing of a 2-D array of shape $(n,m)$ to extract the $j$th column and keep the shape as if it is a column vector of shape $(n,1)$, then we should slice the original array using `[j-1:j]`. 

### Slicing from the end of an array
---

<span style='background:rgba(255,0,255, 0.25); color:black'> ***Key Points:*** <span>

- The -n slice allows you to access entries from the last valid index.

    - Using a -1 in a slice will select the very last entry in the array.

    - This implies that if you want to index from the 3rd entry in an array up to, *but not including*, the last entry in the array, then you would use `2:-1` in the slice. If you want to index from the 3rd entry in array all the way *through* the last entry (i.e., including the last entry), then use `2:`. 

In [None]:
print()
print( A )

print()
print( A[-1,:] ) #the last row

print()
print( A[:,-2:] ) #the last two columns

print()
print( A[:,1:] ) #the second column through the last

print()
print( A[:,1:-1] ) #the second column up to, but not including, the last

We can give arrays/lists as inputs to select some specific rows, columns, etc. 

In [None]:
print()
print( A )

print()
print( A[:,[1,3]] )

### Elementwise vs standard operations in `numpy`
---

While matrix-algebra is not a significant portion of this course, it is worth familiarizing yourself with it as it is the foundation of much of scientific computing. You do *not* need to know linear algebra in order to understand how to do basic arithmetic operations with matrices. It just requires a bit of practice. For example, review this: https://en.wikipedia.org/wiki/Matrix_(mathematics)#Basic_operations

<span style='background:rgba(255,0,255, 0.25); color:black'> ***Key Points:*** <span>
 

- `numpy` has functions for both elementwise multiplication of arrays (of the same size) and standard matrix-matrix/vector multiplication (where inner dimensions agree). Arrays are basically just matrices and vectors are just special types of matrices.


- Two arrays can be added together if they are the same shape. The result is an array of the same shape with each component equal to the sum of the corresponding components from the individual arrays. 


- Use the `np.multiply` function for ***elementwise*** multiplication (you can also just use `*`). The arrays need to be the same shape. Similar to the sum operation, the elements of the result are equal to the product of the corresponding elements of the individual arrays.


- Use the `np.dot` function for standard matrix-matrix, or matrix-vector multiplication. This requires compatibility of certain parts of the shapes and the *order* matters. We'll see an example below, but we will not dwell on this here.

Array multiplication is **not** matrix multiplication.  It is an *elementwise* operation.

In [None]:
A = np.reshape(range(1,13),(3,4)) # create a 3x4 array (i.e., a matrix) with the 1st 12 integers in it.
B = np.reshape(range(1,13),(4,3)) # create a 4x3 array (i.e., a matrix) with the 1st 12 integers in it.
C = np.reshape(range(1,17),(4,4)) # ??

print()
print( A )

print()
print( B )

print()
print( C )

In [None]:
print()
print( np.multiply(A,A) ) # elementwise multiplication

print()
print( A*A ) # also elementwise multiplication

print()
print( C+C ) # elementwise addition

In [None]:
print()
print( np.dot( A, B ) ) # standard matrix-matrix multiplication. This works because (3x4)x(4x3) "makes sense"

print()
print( np.dot( B, A ) ) # standard matrix-matrix multiplication. This works because (4x3)x(3x4) "makes sense"

<hr style="border:5px solid cyan"> </hr>

## <span style='background:rgba(0,255,255, 0.5); color:black'>Activity: Experiment with array arithmetic and order of operations</span> <a id='activity-array-arithmetic'>

1. Follow the instructions in the code cells below to try taking "dot" products and elementwise products of arrays A, B, and C in various orders.

2. Use the Markdown cell below these code cells to summarize your findings.

In [None]:
# Try taking "dot" products of A or B with C in various orders. 
np.dot(A, B)

In [None]:
np.dot(B, A)

In [None]:
np.dot(B, C)

In [None]:
np.dot(C, B)

In [None]:
np.dot(A, C)

In [None]:
np.dot(C, A)

In [None]:
np.dot(A, A)

In [None]:
np.dot(B, B)

In [None]:
np.dot(C, C)

In [None]:
# Try elementwise products - KEEP GOING TO CHECK ALL COMBINATIONS IN ALL ORDERS
np.multiply(A, B)

In [None]:
np.multiply(B, A)

In [None]:
# START FILLING IN THE MISSING COMBINATIONS HERE TO SEE WHAT WORKS AND WHAT DOES NOT

<hr style="border:5px solid cyan"> </hr>


### More numpy functions and subpackages

https://docs.scipy.org/doc/numpy/reference/ is a great reference. In particular, you should check out the available documentation on the `matlib`, `linalg`, and `random` subpackages. 
These are extremely useful subpackages that can do most of your everyday computations in undergraduate/beginning graduate mathematics.
<br>

- The page on N-dimensional arrays (https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html) is also very useful. Arrays inherent many methods (i.e., functions) from the `numpy` namespace that allow for quicker access to certain functionality (and shorter, more readable code).
    <br>
    
    - Many methods can be applied in one line of code where the order of operation is specified by the order in which the methods appear from left to right. ***We show some examples below.***

In [None]:
print()
print( A )

print()
print( np.mean(A) )

print()
print( A.mean() )

print()
print( np.transpose(A) )

print()
print( A.transpose() )

print()
print( A.max() )

### Thinking in terms of "axes"
---

<span style='background:rgba(255,0,255, 0.25); color:black'> ***Key Points:*** <span>
 
- We say that the rows are aligned with ***axis 0*** and the columns are aligned with ***axis 1***. If you have a 3-D array, then the third dimension is aligned with axis 2.

- ***We sometimes only want to apply a function across rows or columns.***

    - It is common to arrange a set of samples as an array where each row defines a single sample, and the columns define the various quantitative entries associated with that sample. 
For example, if you have had multivariate calculus, think of how a Jacobian matrix is ordered.
Or, if you think of how data is entered into spreadsheets, each row is typically a single individual "entity" with each column defining the various attributes associated with that individual. 
We will certainly see examples of this later when we enter into the more "data science-y" part of the course.
    
    - Assuming the "standard" way of arranging data in an array described above, then when we want to determine some "statistic" for a characteristic across a population, we often want to compute *across the columns*. A summary statistic for each of the individuals is then often computed *across the rows*.

- It is again useful to remember that we index from 0 and the rows are the first index (axis=0) and the columns are the second index (axis=1) in the array when you specify which axis you want to perform computations *across*.

![Visual of axis numbers for an array go here](sample_array.png "The axis numbers for an array")

In [None]:
print()
print( A )

print()
print( np.mean(A, axis=1) )

print()
print( A.mean(axis=1) )

print()
print( A.mean(axis=0) )

# We can use multiple built-in functions all in one statement as long as we 
# remember that the functions work from left to right.
print()
print( A.transpose().mean(axis=1) ) # transpose first and then compute mean across axis=1

print()
print( A.transpose().mean(axis=0) ) # transpose first and then compute the mean across axis=0

<hr style="border:5px solid cyan"> </hr>

## <span style='background:rgba(0,255,255, 0.5); color:black'>Activity: Summary</span> <a id='activity-summary'/>

Summarize some of the key takeaways/points from this notebook in a list below and prepare a few code examples related to these takeaways/points in the code cells below. You need to have at least one example for each of your summary points and you need at least three summary points.

In this notebook, we have seen the following:

- [Your summary point 1 goes here]




- [Your summary point 2 goes here]




- [Your summary point 3 goes here]

<hr style="border:5px solid cyan"> </hr>


### <a href='#Contents'>Click here to return to Notebook Contents</a>