In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lec_act_2_functions.ipynb")

# Lecture goals

1. (more) Dictionaries for data encapsulation
2. Functions for functionality encapsulation

Resources
- Slides: https://docs.google.com/presentation/d/1ykwwcQ0onMvAjUxfJmKl9tbo-rJPdB5pRwDEmpDsd-g/edit?usp=sharing


## Functions: 
Functions enable encapsulation of, well, functionality.

They're also a useful mental tool for organizing and structuring your thoughts on how to solve a given problem
1. Clearly define a bit of code that takes in some inputs, does some computation, then outputs some data
2. Makes it easier to test that code with different inputs
3. Practicalities: Prevents one of the most common sources of errors - re-using variable names

It's almost never wrong to encapsulate a bit of code in a function. It can slow down (a tiny bit) computation time, but can greatly reduce debugging time, so it's usually worth it.

Python's function syntax is beautifully designed to make it easy to set default values for parameters and pass back as much data as you want. We'll see more of that later; for this assignment we'll use the power of dictionaries to pass back "labeled" data.

In this lecture activity you're essentially going to copy over the code you did in the previous lecture activity, right-shift it to place it in the function, replace the **test_** variable name with the input parameter to the function, then add a return to return the answer. This is a pretty common approach for turning code into a function. 


In [None]:
# Access all numpy functions as np.
import numpy as np

## Question 1: Stats on a list

### Calculate stats on a list

TODO: in calc_stats_from_list function 
- Calculate the mean of the negative and positive values
- Count the total number of negative/positive values
- Store the values in a dictionary

This function calculates the given stats from the list that is passed in. There is test code below this function.

TODO: 
 - Step 1 - copy your code from lecture activity 1 into the function. Right shift it so it is indented properly
 - Step 2 - change **test_list_one** to be **in_list** - the input parameter of the function
 - Step 3 - return the dictionary **dict_save_stats**

In [None]:
def calc_stats_from_list(in_list):
    """ Calculate mean of positive numbers, mean of negatives numbers
    Separate the list into positive and negative numbers. Calculate the mean of each. Return those means, along with
     how many positive/negative numbers there were
    @param in_list : any list type
    @return - A dictionary with the desired stats"""

    # These are the stats we're calculating. This is more elegant/useful than creating four variables - it keeps all
    #  of the values in the same place and assigns a meaningful label (key) to them
    dict_save_stats = {"Mean positive": 0.0, "Mean negative": -0.0, "Count positive": 0, "Count negative": 0}

    # TODO: 
    #   Copy your code from lecture activity 1 here. Don't forget to change the name of test_list_one to be in_list
    ...
    # TODO Do the return here
    return ...    

In [None]:
# Test data
test_list_one = [-0.9, -0.2, 1.0 / 3.0, 2.0 / 3.0, 3.0 / 3.0, 4.0 / 3.0, 5.0 / 3.0]
test_list_res = calc_stats_from_list(test_list_one)

b_tests_passed = True
if not np.isclose(test_list_res["Mean positive"], 3.0 / 3.0):
    b_tests_passed = False
    print(f"Mean positive is not correct, should be {3.0/3.0}, got {test_list_res['Mean positive']}")

if not np.isclose(test_list_res["Mean negative"], -0.55):
    b_tests_passed = False
    print(f"Mean negative is not correct, should be -0.55, got {test_list_res['Mean negative']}")

if test_list_res["Count positive"] != 5:
    b_tests_passed = False
    print(f"Count positive numbers, should be 3, got {test_list_res['Count positive']}")

if test_list_res["Count negative"] != 2:
    b_tests_passed = False
    print(f"Count positive numbers, should be 2, got {test_list_res['Count negative']}")

if b_tests_passed:
    print("All array tests passed!")

In [None]:
# This is an example of what the autograder tests are doing
res = calc_stats_from_list([-1, 2, -3, 4, -5])
assert np.isclose(res["Mean positive"], 3.0)

# Remember that if you are printing anything out in calc_stats_from_list then the autograder will fail

In [None]:
grader.check("list")

# Question 2: Doing it again with a numpy array

## Fill in calc_stats_from_nparray

For this function, assume the input is a numpy array.

TODO: Same as the previous question, but this time copy in the numpy array code you wrote in lecture activity 1. Again:
- NO **if** statements or **for** loops - do this all with numpy operations

As before, test code is below

In [None]:
def calc_stats_from_nparray(in_nparray):
    """ Calculate mean of positive numbers, mean of negatives numbers
    Separate the list into positive and negative numbers. Calculate the mean of each. Return those means, along with
     how many positive/negative numbers there were
    @param in_nparray : numpy array
    @return - A dictionary with the desired stats"""

    # TODO: Copy in the code, change the numpy array name to be the input to this function, and return the dictionary
    ...


In [None]:
test_nparray_one = np.array(test_list_one)  # Convert the previous test list to a numpy array
test_list_res = calc_stats_from_nparray(test_nparray_one)

b_tests_passed = True
if not np.isclose(test_list_res["Mean positive"], 3.0 / 3.0):
    b_tests_passed = False
    print(f"Mean positive is not correct, should be {3.0/3.0}, got {test_list_res['Mean positive']}")

if not np.isclose(test_list_res["Mean negative"], -0.55):
    b_tests_passed = False
    print(f"Mean negative is not correct, should be -0.55, got {test_list_res['Mean negative']}")

if test_list_res["Count positive"] != 5:
    b_tests_passed = False
    print(f"Count positive numbers, should be 5, got {test_list_res['Count positive']}")

if test_list_res["Count negative"] != 2:
    b_tests_passed = False
    print(f"Count positive numbers, should be 2, got {test_list_res['Count negative']}")

if b_tests_passed:
    print("All numpy array tests passed!")

In [None]:
grader.check("nparray")

## Encapsulating data manipulation 

In the next few problems we'll encapsulate what you did in lecture activity 1 in a function and return the information in a dictionary. 

The first part (this problem) is to take the data creation code that made **my_test_data** and encapsulate it into a function

The second part (next problem) extracts, eg, all of the x data, and return it in a dictionary

The third part (final problem) uses those functions to create a dictionary with the desired information, using your **calc_stats_from_nparray** from above

In [None]:
# Making a data set to practice with before lab/homework
#  From lecture activity 2
#  Create data that consists of x, y, z data for t time steps for s samples
#    x is time data (linspace from start to stop)
#    y and z are uniform random sampling
#    Ranges are given for the x, y, and z data
#  The last column is 1 if the sample is good, 0 if it is bad. Every other sample is good
#  The data is stored in a s x [3 * t + 1] array
#  Each row (one row for each sample) looks like this
#    x0 y0 z0 x1 y1 z1 .... x9 y9 z9 1 or 0
def make_xyz_time_data(n_samples=5, n_time_steps=10, x_data_range=(0, 1), y_data_range=(-1, 0), z_data_range=(10, 20)):
    """ Make some fake data to play with, same format as the last problem in lecture activity 1
    @param n_samples - number of samples (rows)
    @param n_time_steps - number of time steps (will be 3 * n_time_steps + 1 columns)
    @param x_data_range - start and stop values for x
    @param y_data_range - min and max values for y
    @param y_data_range - min and max values for z
    @return a n_samples * (3*n_time_steps + 1) numpy array"""

    
    # Make space for all of the data and fill it with zeros
    #   zeros takes a tuple with the data sizes - in this case we are making a 2 dimensional array
    #   with 5 columns (one for each sample) and 10 x,y,z value (30 total) and one extra column for the good/bad
    my_test_data = np.zeros((5, 3 * 10 + 1))

    # Fill in whether or not the sample is good. Every other one is good, the others are bad
    # Since zero is bad - and the array is all zeros - just set every other row, last column
    #   The -1 picks the last column, the 0::2 picks every other row
    #   Note: The left hand side has 3 elements, the right a single number - numpy interprets this to mean
    #     set all of those values to the single number
    my_test_data[0::2, -1] = 1

    # Fill in the x values for each sample with 0, 0.1... 1.0
    #  np.linspace() generates uniformly-spaced samples from start to stop
    #    You can assign values to specific parameters by name if you want
    #    This would be the same as np.linspace(0, 1.0, 10)
    # In this case, the array on the left hand side is 5 x 10, so we're going to use a loop to set each row
    #  to 0, 0.1 etc. one row at a time
    # shape is the size of the array; we want the number of rows so use .shape[0]
    x_data_for_one_row = np.linspace(start=0, stop=1.0, num=10)
    for r in range(0, my_test_data.shape[0]):
        # loop through each row r
        # Fill in column 0 to one before the end (don't overwrite the good/bad), skipping every 3
        my_test_data[r, 0:-2:3] = x_data_for_one_row

    # Fill in the y values for each sample with random values
    #  np.random.uniform() generates random samples between the two values; unlike linsapce, you can set the size
    #   of the numpy array it returns. 
    # The left side is all rows (5 - :) and every 3rd column starting at 1
    y_data_for_all_rows = np.random.uniform(-1.0, 0.0, size=(5, 10))
    my_test_data[:, 1::3] = y_data_for_all_rows

    # Now the z values - notice that we start at column 2 instead of 1
    my_test_data[:, 2::3] = np.random.uniform(10.0, 20.0, size=(5, 10))


In [None]:
# Test 1 - this should work as soon as you fix the return statement
# Uses default paramters, which were the same as the lecture activity 1 code
#   Reminder to re-execute the cell above once you've made a change
my_test_data_check1 = make_xyz_time_data()

# Check number of samples/rows is 5
assert my_test_data_check1.shape[0] == 5

# Check number of columns is 10 * 3 + 1
assert my_test_data_check1.shape[1] == 10 * 3 + 1

In [None]:
# Test 2 - this should work once you replace the hard-wired 5's and 10's (number of samples and number of time steps) with the 
#   input variables n_samples and n_time_steps
#  If you get an array dimensions mis-match error you probably missed replacing a 5 or a 10
#   Reminder to re-execute the function cell when you've made a change

my_test_data_check2 = make_xyz_time_data(n_samples=7, n_time_steps=13)

# Check size of numpy array
assert my_test_data_check2.shape[0] == 7
assert my_test_data_check2.shape[1] == 13 * 3 + 1

In [None]:
# Test 3 - Changge the data creation ranges (the lines that actually make the x, y, z data) to use the input ranges
#  This *should* just be finding the right place to replace, eg, 0 with x_data_range[0] and 1 with x_data_range[1]

my_test_data_check3 = make_xyz_time_data(n_samples=10, n_time_steps=15, x_data_range=(10, 100), y_data_range=(-1, 2), z_data_range=(1.0, 3.0))

# Check that x range for all samples starts at 10 and ends at 100
assert np.all(np.isclose(my_test_data_check3[:, 0], 10.0))
assert np.all(np.isclose(my_test_data_check3[:, -4], 100.0)) 

# Check that all y data is between -1 and 2
n_all_data = 15 * 3 
assert np.all(my_test_data_check3[:, 1:n_all_data:3] >= -1.0)
assert np.all(my_test_data_check3[:, 1:n_all_data:3] <= 2.0)

assert np.isclose(np.min(my_test_data_check3[:, 1:n_all_data:3]), -1.0, atol=0.1 )
assert np.isclose(np.max(my_test_data_check3[:, 1:n_all_data:3]), 2.0, atol=0.1 )
# TODO Check that all z data is between 1 and 3

# Check that every value in the last column is either a 1 or a 0
# TODO Write this test yourself. It's one of the autograder checks.


In [None]:
grader.check("create_data_function")

### Data slice

This is very close to the data slice you did at the end of lab 1. **n_total_dims** will be 3 for our test data (x,y,z), but the function should be able to work with data that has, for example, 4 repeating dimensions (w,x,y,z). 

TODO: 
- Step 1a: Use the scratch cell to create each of the x y z slices, using my_test_data_check1, which you know has 10 time steps
- Step 1b: Check that the dimensions of each array that you create are correct (should be 5x10 for each of them)
- Step 2a: Now replace any hard-wired numbers (eg, 0, 1,2 3, 10) with the appropriate variable (start_index, n_time_steps, n_total_dims)
-   At this point you should have 3 slices that all look exactly the same
- Step 2b: Check that you still get the same answer
- Step 3b: Change every my_test_data_check1 to my_test_data_check2 (or 3) and see if it all still works correctly
- Step 4: Copy your slice into the function, change my_test_data_check to all_data, and return the slice

In [None]:
# SCRATCH CELL
# Try writing the slice here for each of x, y, and z. 
#   HINT: The code in the previous problem already has a slice that gets out just the y, or z data for testing purposes...
# The only tricky part is the x data; remember that you don't want to get the last column... if your x slice is 5x11, remember
#   that you don't want the last column

n_total_dims = 3
# Rember we used different numbers of time steps - you can set this directly OR set it from my_test_data_checkX.shape[1]
n_time_steps = ...

start_index = ...
my_x_slice = ...

start_index = ...
my_y_slice = ...

start_index = ...
my_z_slice = ...


In [None]:
def get_single_channel(all_data, start_index, n_total_dims, n_time_steps):
    """ Get the data for just one channel (eg, wrist torque)
    @param all_data - the my_test_data_XX numpy array
    @param start_index - where to start getting data from (should be 0, 1, or 2 for our test data)
    @param n_total_dims - what the skip value is - the total number of channels (3 in our case)
    @param n_time_steps - number of time steps
    @return Return array should be n_rows X n_timesteps"""

    # TODO Your slice code goes here. Note that the input variables match the names in the scratch cell...
    ...

In [None]:
# Now check by calling the create data and slice data functions

n_samples = 20
n_time_steps = 40
n_total_dims = 3
my_test_data_check_slice = make_xyz_time_data(n_samples=n_samples, n_time_steps=n_time_steps)

my_x_slice_check = get_single_channel(my_test_data_check_slice, start_index=0, n_total_dims=n_total_dims, n_time_steps=n_time_steps)
my_y_slice_check = get_single_channel(my_test_data_check_slice, start_index=1, n_total_dims=n_total_dims, n_time_steps=n_time_steps)
my_z_slice_check = get_single_channel(my_test_data_check_slice, start_index=2, n_total_dims=n_total_dims, n_time_steps=n_time_steps)

# TODO: Check that all of the returned slices are 20 x 40

In [None]:
# SELF TESTS
# TODO: Check that all of the returned slices are 20 x 40

# TODO: Check that the x values start at 0 and end at 1 (see checks above for previous question)

# TODO: Check that the y values are between -1 and 0

# TODO: Check that the z values are between 10 and 20

In [None]:
grader.check("data_slice")

## Uh oh again, broken functions

The following are two pretty common errors when calling functions. Fix them. 

Note: The Lecture_2_functions.ipynb script has examples of where functions can go wrong...

In [None]:
def get_max(all_data):
    """ Return the maximum of the data
    @param all_data - the data"""
    return np.max(my_test_data_check1)

In [None]:
# This is how many time steps we used for my_test_data_check3; should be 15
n_time_steps = my_test_data_check3.shape[1] // 3

# This line should be fine - should return a 10x15 array
x_slice_check3_test = get_single_channel(my_test_data_check3, start_index=0, n_total_dims=3, n_time_steps=n_time_steps)

# This line is broken - remember that my_test_data_check3 was created with x values that went from 10 to 100. So this should
#   return 100. Why doesn't it? What is it returning? 
# HINT: double check which array get_max is finding the max of. 
max_time_step = get_max(x_slice_check3_test)

assert np.isclose(max_time_step, 100.0)

In [None]:
# Create the data
#   5 samples and 10 time steps
my_test_data_check4 = make_xyz_time_data(10, 5)

# Get the data back out
x_slice_check4 = get_single_channel(my_test_data_check4, 0, 3, 10)

# These check works - data starts at 0 and ends at 1
assert np.isclose(x_slice_check4[0, 0], 0.0)
assert np.isclose(x_slice_check4[0, -1], 1.0)

# This one doesn't - the data should be 5x10. Why isn't it?
assert x_slice_check4.shape == (5, 10)

In [None]:
grader.check("fix it")

## Hours and collaborators
Required for every assignment - fill out before you hand-in.

Listing names and websites helps you to document who you worked with and what internet help you received in the case of any plagiarism issues. You should list names of anyone (in class or not) who has substantially helped you with an assignment - or anyone you have *helped*. You do not need to list TAs.

Listing hours helps us track if the assignments are too long.

In [None]:

# List of names (creates a set)
worked_with_names = {"not filled out"}
# List of URLS 2U5 (creates a set)
websites = {"not filled out"}
# Approximate number of hours, including lab/in-class time
hours = -1.5

In [None]:
grader.check("hours_collaborators")

### To submit

- Do a restart then run all to make sure everything runs ok
-  Repeat: Do a restart, run all, save, and THEN turn in
- Save the file (no black dot to the right of the filename)
- Submit just this .ipynb file through gradescope, Lecture activity 2, functions
- You do NOT need to submit the data files - we will supply those

If the Gradescope autograder fails, please check here first for common reasons for it to fail
    https://docs.google.com/presentation/d/1tYa5oycUiG4YhXUq5vHvPOpWJ4k_xUPp2rUNIL7Q9RI/edit?usp=sharing

Most likely failure for this assignment is not naming the data directory and files correctly; capitalization matters for the Gradescope grader. 