In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lec_act_1_arrays.ipynb")

# Lecture goals

1. Understand the benefit of numpy (over lists) for operating over lists of numbers
2. Introduction to numpy-style array operations
3. Dictionaries for data encapsulation
4. Functions for functionality encapsulation

## Functions: 
Functions enable encapsulation of, well, functionality.

They're also a useful mental tool for organizing and structuring your thoughts on how to solve a given problem
1. Clearly define a bit of code that takes in some inputs, does some computation, then outputs some data
2. Makes it easier to test that code with different inputs
3. Practicalities: Prevents one of the most common sources of errors - re-using variable names

It's almost never wrong to encapsulate a bit of code in a function. It can slow down (a tiny bit) computation time, but can greatly reduce debugging time, so it's usually worth it.

Python's function syntax is beautifully designed to make it easy to set default values for parameters and pass back as much data as you want. We'll see more of that later; for this assignment we'll use the power of dictionaries to pass back "labeled" data.


In [None]:
# Access all numpy functions as np.
import numpy as np

# Question 1: Stats on a list

## calculate stats on a list

TODO: in calc_stats_from_list function 
- Calculate the mean of the negative and positive values
- Count the total number of negative/positive values
- Store the values in a dictionary

This function calculates the given stats from the list that is passed in. There is test code below this function.

In [10]:
def calc_stats_from_list(in_list):
    """ Calculate mean of positive numbers, mean of negatives numbers
    Separate the list into positive and negative numbers. Calculate the mean of each. Return those means, along with
     how many positive/negative numbers there were
    @param in_list : any list type
    @return - A dictionary with the desired stats"""

    # These are the stats we're calculating. This is more elegant/useful than creating four variables - it keeps all
    #  of the values in the same place and assigns a meaningful label (key) to them
    dict_ret_stats = {"Mean positive": 0.0, "Mean negative": -0.0, "Count positive": 0, "Count negative": 0}

    mean_negative = 0.0
    mean_positive = 0.0
    count_negative = 0
    count_positive = 0

    for num in lst:
        if num < 0:
            mean_negative += num
            count_negative += 1
        elif num > 0:
            mean_positive += num
            count_positive += 1

    if count_negative > 0:
        mean_negative /= count_negative
    if count_positive > 0:
        mean_positive /= count_positive
        
    return dict_ret_stats


### Test code for list

Create the arrays and test them. Here's another advantage of functions - you can create test data for yourself to make sure the code is working right. Encapsulating the code in a function means you don't 
1. Accidentally change the code when switching from the test data to the real data
2. You can make more than one test 
3. You can run the tests more than once/all the time to double check that you didn't "break" the code

TODO: 
- Fill in the calc_stats_from_list function above
- Run the cell below - it will print out if your values are incorrect

Note that, below, we'll test this code one last time with randomly generated data

In [14]:
# Test data

test_list_one = [-0.75, -0.25, 1.0 / 3.0, 2.0 / 3.0, 3.0 / 3.0]
test_list_res = calc_stats_from_list(test_list_one)

b_tests_passed = True
if not np.isclose(test_list_res["mean positive"], 2.0 / 3.0):
    b_tests_passed = False
    print(f"mean positive is not correct, should be {2.0/3.0}, got {test_list_res['mean positive']}")

if not np.isclose(test_list_res["mean negative"], -0.5):
    b_tests_passed = False
    print(f"mean negative is not correct, should be -0.5, got {test_list_res['mean negative']}")

if test_list_res["count positive"] != 3:
    b_tests_passed = False
    print(f"count positive numbers, should be 3, got {test_list_res['count positive']}")

if test_list_res["count negative"] != 2:
    b_tests_passed = False
    print(f"count positive numbers, should be 2, got {test_list_res['count negative']}")

if b_tests_passed:
    print("All array tests passed!")

NameError: name 'lst' is not defined

In [15]:
grader.check("list")

NameError: name 'grader' is not defined

# Question 2: Doing it again with a numpy array

## Fill in calc_stats_from_nparray

For this function, assume the input is an numpy array.

TODO: Same as the previous question, but this time do it for a numpy array in calc_stats_from_nparray
- NO **if** statements or **for** loops - do this all with numpy operations

You might find "count_nonzero" useful.

As before, test code is below

In [18]:
def calc_stats_from_nparray(in_nparray):
    """ Calculate mean of positive numbers, mean of negatives numbers
    Separate the list into positive and negative numbers. Calculate the mean of each. Return those means, along with
     how many positive/negative numbers there were
    @param in_list : numpy array
    @return - A dictionary with the desired stats"""

    # These are the stats we're calculating. This is more elegant/useful than creating four variables - it keeps all
    #  of the values in the same place and assigns a meaningful label (key) to them
    dict_ret_stats = {"Mean positive": 0, "Mean negative": 0, "Count positive": 0, "Count negative": 0}

    is_positive = in_nparray > 0
    is_negative = in_nparray < 0

    mean_positive = np.mean(in_nparray[is_positive])
    mean_negative = np.mean(in_nparray[is_negative])

    count_positive = np.count_nonzero(is_positive)
    count_negative = np.count_nonzero(is_negative)

    dict_ret_stats = {"Mean positive": mean_positive, 
                      "Mean negative": mean_negative, 
                      "Count positive": count_positive, 
                      "Count negative": count_negative
                      }
    ...
    return dict_ret_stats

### Test code for numpy array
There is a "fancy" way to do this test with the second function without duplicating code, but it's confusing, so... we'll just duplicate it here. This will print out if your function above is returning incorrect values

In [19]:
test_nparray_one = np.array(test_list_one)  # Convert the previous test list to a numpy array
test_list_res = calc_stats_from_nparray(test_nparray_one)

b_tests_passed = True
if not np.isclose(test_list_res["Mean positive"], 2.0 / 3.0):
    b_tests_passed = False
    print(f"Mean positive is not correct, should be {2.0/3.0}, got {test_list_res['Mean positive']}")

if not np.isclose(test_list_res["Mean negative"], -0.5):
    b_tests_passed = False
    print(f"Mean negative is not correct, should be -0.5, got {test_list_res['Mean negative']}")

if test_list_res["Count positive"] != 3:
    b_tests_passed = False
    print(f"Count positive numbers, should be 3, got {test_list_res['Count positive']}")

if test_list_res["Count negative"] != 2:
    b_tests_passed = False
    print(f"Count positive numbers, should be 2, got {test_list_res['Count negative']}")

if b_tests_passed:
    print("All numpy array tests passed!")

All numpy array tests passed!


In [20]:
grader.check("nparray")

NameError: name 'grader' is not defined

## Check that it all works

TODO: If both your functions are correct then the tests below should only output "Done test". Otherwise, it will output which values were different

This next cell is a function to generate random values to test with.

In [21]:
# This bit of code will generate a list or numpy array with random positive and negative values.
def create_data(n_data=10, b_ret_numpy=True):
    """ Create a random mix of positive and negative numbers
    @param n_data - how big to make the list/array
    @param b_ret_numpy - return a list or a numpy array
    @return the list or numpy array"""
    my_data = np.random.random_sample(n_data)

    n_to_convert = np.random.randint(low=1, high=n_data-1)
    n_convert = np.random.randint(low=0, high=n_data-1, size=n_to_convert)
    my_data[n_convert] *= -1.0

    if b_ret_numpy is False:
        return list(my_data)
    return my_data


## Actually do the tests

Check by comparing the results of the list-based function against the numpy-based one (doesn't guarantee it's right, 
but...). Try 10 times. 

In [5]:
# We don't care what the iteration is over so use _ to say "we don't need a variable"
b_tests_passed = True
for _ in range(0, 10):
    # Get some random data
    test_data = create_data()
    test_data_list = list(test_data)  # Notice the cast to a list type

    # Call your two functions with the random data.
    res_list = calc_stats_from_list(test_data_list)
    res_np = calc_stats_from_nparray(test_data)

    # For all four stored values...
    for k, v in res_list.items():
        # Use isclose instead of == because two of these are floating point values, and == never works with
        #  floating point values
        # Since we used the same keys (names) in the two different dictionaries, we can pass that key to the other
        #   dictionary
        if not np.isclose(res_np[k], v):
            b_tests_passed = False
            print(f"Returned different values {k}, {v} and {res_np[k]}")
    if np.isclose(res_list["Mean positive"], 0.0):
        b_tests_passed = False
        print(f"List not implemented\n")
    if np.isclose(res_np["Mean positive"], 0.0):
        b_tests_passed = False
        print(f"Numpy not implemented\n")
        
if b_tests_passed:
    print("Comparison test passed")

NameError: name 'create_data' is not defined

In [None]:
grader.check("tests")

## Hours and collaborators
Required for every assignment - fill out before you hand-in.

Listing names and websites helps you to document who you worked with and what internet help you received in the case of any plagiarism issues. You should list names of anyone (in class or not) who has substantially helped you with an assignment - or anyone you have *helped*. You do not need to list TAs.

Listing hours helps us track if the assignments are too long.

In [None]:

# List of names (creates a set)
worked_with_names = {"not filled out"}
# List of URLS (creates a set)
websites = {"not filled out"}
# Approximate number of hours, including lab/in-class time
hours = -1.5

# for all row, column in all_indices_from_where
#.   if this is the column for wrist torque 
#.      print(f"Row: {r}, Time step: {c // n_time_steps} Successful y/n: {pick_data[r, -1] == 1}, value: {pick_data[r, c]}")

In [None]:
grader.check("hours_collaborators")

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Submit through gradescope, lecture activity 1 Arrays. Be sure to read the info on the autograder before submitting.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(run_tests=True)