## CSCI 470 Activities and Case Studies

1. For all activities, you are allowed to collaborate with a partner. 
1. For case studies, you should work individually and are **not** allowed to collaborate.

By filling out this notebook and submitting it, you acknowledge that you are aware of the above policies and are agreeing to comply with them.

Some considerations with regard to how these notebooks will be graded:

1. Cells in which "# YOUR CODE HERE" is found are the cells where your graded code should be written.
2. In order to test out or debug your code you may also create notebook cells or edit existing notebook cells other than "# YOUR CODE HERE". We actually highly recommend you do so to gain a better understanding of what is happening. However, during grading, **these changes are ignored**. 
3. You must ensure that all your code for the particular task is available in the cells that say "# YOUR CODE HERE"
4. Every cell that says "# YOUR CODE HERE" is followed by a "raise NotImplementedError". You need to remove that line. During grading, if an error occurs then you will lose points for your work in that section.
5. If your code passes the "assert" statements, then no output will result. If your code fails the "assert" statements, you will get an "AssertionError". Getting an assertion error means you will not receive points for that particular task.
6. If you edit the "assert" statements to make your code pass, they will still fail when they are graded since the autograder will ignore the modified "assert" statement. Make sure you don't edit the assert statements.
7. We may sometimes have "hidden" tests for grading. This means that passing the visible "assert" statements is not sufficient. The "assert" statements are there as a guide but you need to make sure you understand what you're required to do and ensure that you are doing it correctly. Passing the visible tests is necessary but not sufficient to get the grade for that cell.
8. When you are asked to define a function, make sure you **don't** use any variables outside of the parameters passed to the function. You can think of the parameters being passed to the function as a hint. Make sure you're using all of those variables.
9. The **Grading** section at the end of the document (before the **Feedback** section) contains some code for our autograder on GradeScope. You are expected to fail this block of code in your Jupyter environment. DO NOT edit this block of code, or you may not get points for your assignment.
10. Finally, **make sure you run "Kernel > Restart and Run All"** and pass all the asserts before submitting. If you don't restart the kernel, there may be some code that you ran and deleted that is still being used and that was why your asserts were passing.

# Numpy

The Python module, [Numpy](https://numpy.org/doc/stable/), provides a means to efficiently work with vectors, matrices, or high-dimensional tensors (all generically termed "arrays"). Compared to raw Python, Numpy oftentimes allows for fewer lines of code and faster execution.

In this activity you'll be asked to perform a variety of tasks using Numpy. __We'll only use matrices of integers rather than floating-point numbers, so the math will be easy for you to do in your head (more or less), and check against results.__

### There are _hidden test cells in this notebook_. You need to pass the asserts, but be sure do to a self-examination of the numeric results of your code, as this what will be scored by the hidden test cells.

For each visible test cell (except Problem 6) there is an accompanying hidden test.

In [1]:
import numpy as np
import time

In [2]:
# Problem 1: Slicing
#
# Given the matrix, X1, use slicing to extract a (3, 2) submatrix that
# is taken from the first 3 rows of X and the last 2 columns of X1.
# Name the results "submatrix".

np.random.seed(0)
X1 = np.random.randint(-9, 10, size=(5, 3))
print('X1:')
print(X1)

# YOUR CODE HERE
submatrix=X1[:3,-2:]

X1:
[[ 3  6 -9]
 [-6 -6 -2]
 [ 0  9 -5]
 [-3  3 -8]
 [-3 -2  5]]


In [3]:
assert submatrix.shape==(3, 2)

In [4]:
# Problem 2: Multiplying
#
# Given two square matrices, X2 and Y2:
#
# 1. Do an element-by-element multiply, and name the result "mult_elements".
# 2. Do a matrix multiply (XY, not YX), and name the result "mult_matrix".

np.random.seed(0)
X2 = np.random.randint(-9, 10, size=(3, 3))
Y2 = np.random.randint(-9, 10, size=(3, 3))
print('X2:')
print(X2)
print('\nY2:')
print(Y2)

# YOUR CODE HERE
mult_elements = X2 * Y2
mult_matrix = np.dot(X2, Y2)

X2:
[[ 3  6 -9]
 [-6 -6 -2]
 [ 0  9 -5]]

Y2:
[[-3  3 -8]
 [-3 -2  5]
 [ 8 -4  4]]


In [5]:
assert mult_elements.shape==(3, 3)
assert mult_matrix.shape==(3, 3)

In [6]:
# Problem 3: Broadcasting
#
# You are given an (m, n) matrix, X3.
# Create a (1, n) array of numbers from 1 to n, naming it "my_array".
# Then multiply each row of the matrix by the array, using broadcasting.
# Name the result "mult_broadcast".

np.random.seed(0)
X3 = np.random.randint(-9, 10, size=(6, 4))
print('X3:')
print(X3)

# YOUR CODE HERE
n = X3.shape[1]
my_array = np.arange(1, n +1).reshape(1,-1)
mult_broadcast = X3 * my_array

X3:
[[ 3  6 -9 -6]
 [-6 -2  0  9]
 [-5 -3  3 -8]
 [-3 -2  5  8]
 [-4  4 -1  0]
 [ 7 -4  6  6]]


In [7]:
assert my_array.shape==(1, X3.shape[1])
assert mult_broadcast.shape==X3.shape

In [8]:
# Problem 4: Transposition, and row- or column-wise operations
#
# Given a matrix, X4, of "true values" of size (m, n), and a matrix, Y4, of
# "predicted values" of size (n, m):
#
# 1. Use np.transpose() to transpose Y4 from (n, m) to (m, n), and name
#    the result "Y_trans".
# 2. Compute the sum-of-squares-error between X4 and Y_trans, for each row.
#    Keep the column dimension, such that the output shape is (m, 1).
#    Name the result "sse_rows".
# 3. Compute the mean-absolute-error between X4 and Y_trans, for each column.
#    Do not keep the row dimension, such that the output shape is (n,).
#    Name the result "mae_cols".

# Note that here we use numpy.random.choice, primarily to just expose you to its usage.

np.random.seed(0)
X4 = np.random.choice(np.arange(0, 3), size=(3, 5), replace=True, p=(0.1, 0.1, 0.8))
Y4 = np.random.choice(np.arange(0, 3), size=(5, 3), replace=True, p=(0.1, 0.1, 0.8))
print('X4:')
print(X4)
print('\nY4:')
print(Y4)

# YOUR CODE HERE
Y_trans = np.transpose(Y4)
sse_rows = np.sum((X4 - Y_trans) ** 2, axis=1, keepdims=True)
mae_cols = np.mean(np.abs(X4 - Y_trans), axis=0)

X4:
[[2 2 2 2 2]
 [2 2 2 2 2]
 [2 2 2 2 0]]

Y4:
[[0 0 2]
 [2 2 2]
 [2 2 2]
 [1 2 1]
 [2 2 2]]


In [9]:
assert Y_trans.shape==X4.shape
assert sse_rows.shape==(X4.shape[0], 1)
assert mae_cols.shape==(X4.shape[1],)

In [10]:
# Problem 5: Find matrix values that meet some criteria, and replace them.
#
# For matrix X5, find all elemant values, x, for which -5 <= x <=5 and
# replace them with values at the commensurate location in matrix Y.
# Use np.where(). Name the new matrix, "replaced".

np.random.seed(0)
X5 = np.random.randint(-9, 10, size=(4, 5))
Y5 = np.random.randint(6, 10, size=(4, 5))
print('X5:')
print(X5)
print('\nY5:')
print(Y5)

# YOUR CODE HERE
condition = (X5 >= -5) & (X5 <= 5)
replaced = np.where(condition, Y5, X5)

X5:
[[ 3  6 -9 -6 -6]
 [-2  0  9 -5 -3]
 [ 3 -8 -3 -2  5]
 [ 8 -4  4 -1  0]]

Y5:
[[6 9 6 9 7]
 [8 9 9 6 8]
 [9 6 7 9 7]
 [9 9 8 9 6]]


In [11]:
assert np.all(np.abs(replaced) >= 5)
assert np.any(replaced < -5)
assert np.any(replaced > 5)

In [12]:
# Problem 6: Compute a confusion matrix
#
# The demo code in the next cell computes a confusion matrix by iterating over every sample in
# arrays of true and prediction classes, sample by sample.
# 
# In the cell below that, you'll take a different approach. You'll create nested for loops
# that iterates over true classes (outer loop) and predicted classes (inner loop). Inside the
# inner loop you find (a) the true samples that match the true class (for that loop) and the
# predicted samples that match the predicted class (for that loop). You'll then find the samples
# that meet both those criteria, sum/count them, and enter that value in the confusion matrix.

np.random.seed(0)
n_samples = 100
labels_true = np.random.choice(np.arange(0, 3), size=(n_samples,), replace=True, p=(0.2, 0.2, 0.6))
# Introduce some possible errors in the predictions
labels_pred = labels_true.copy()
idx_err = np.random.choice(30, size=n_samples//3)
labels_pred[idx_err] = np.random.choice(np.arange(0, 3), size=(n_samples//3,), replace=True, p=(0.2, 0.2, 0.6))

print('Printing just a few of the samples...')
print('\nlabels_true:')
print(labels_true[:30])
print('\nlabels_pred:')
print(labels_pred[:30])

Printing just a few of the samples...

labels_true:
[2 2 2 2 2 2 2 2 2 1 2 2 2 2 0 0 0 2 2 2 2 2 2 2 0 2 0 2 2 2]

labels_pred:
[0 2 1 1 2 2 2 2 2 1 2 0 0 2 2 0 2 2 2 2 2 2 2 2 2 2 1 2 0 1]


In [13]:
# Compute the confusion matrix by iterating over samples
cm = np.zeros((3, 3), dtype=int)
for i in range(len(labels_true)):
    cm[labels_true[i], labels_pred[i]] += 1

print('\nConfusion Matrix:')
print(cm)


Confusion Matrix:
[[21  1  3]
 [ 0 16  0]
 [ 4  3 52]]


In [14]:
# Compute the confusion matrix by nested iteration over classes rather
# than over samples (as described above in the Problem 6 introduction).
#
# Inside the inner loop you could (should):
# 1. Create a Boolean array which has True where labels_true equals class_true, False elsewhere
# 2. Create a Boolean array which has True where labels_pred equals class_pred, False elsewhere
# 3. Do a logical and (np.logical_and) of those arrays, resulting in an array which has True
#    where both the above conditions are true.
# 4. Count the number of True values using np.sum().
# 5. Enter that count into the confusion matrix, "cm2".

cm2 = np.zeros((3, 3), dtype=int)
for class_true in range(3):
    for class_pred in range(3):
        # YOUR CODE HERE
        true_condition = (labels_true == class_true)
        pred_condition = (labels_pred == class_pred)
        combined_condition = np.logical_and(true_condition, pred_condition)
        count = np.sum(combined_condition)
        cm2[class_true, class_pred] = count


print('\nConfusion Matrix:')
print(cm2)


Confusion Matrix:
[[21  1  3]
 [ 0 16  0]
 [ 4  3 52]]


In [15]:
## There is no hidden cell for this problem, unlike all the previous problems.
assert np.array_equal(cm, cm2)

# Grading
The following code block is purely used for grading. If you find any error, you can ignore. DO NOT MODIFY THE CODE BLOCK BELOW.

In [16]:
# Autograding with Otter Grader
import otter
grader = otter.Notebook()
grader.check_all()

ModuleNotFoundError: No module named 'otter'

## Feedback

In [None]:
def feedback():
    """Provide feedback on the contents of this exercise
    
    Returns:
        string
    """
    # YOUR CODE HERE
    return "No feedback"