# Lecture goals

1. Understand the benefit of numpy (over lists) for operating over lists of numbers
2. Introduction to numpy-style array operations
3. Dictionaries for data encapsulation
4. Functions for functionality encapsulation

## Functions: 
Functions enable encapsulation of, well, functionality.

They're also a useful mental tool for organizing and structuring your thoughts on how to solve a given problem
1. Clearly define a bit of code that takes in some inputs, does some computation, then outputs some data
2. Makes it easier to test that code with different inputs
3. Practicalities: Prevents one of the most common sources of errors - re-using variable names

It's almost never wrong to encapsulate a bit of code in a function. It can slow down (a tiny bit) computation time, but can greatly reduce debugging time, so it's usually worth it.

Python's function syntax is beautifully designed to make it easy to set default values for parameters and pass back as much data as you want. We'll see more of that later; for this assignment we'll use the power of dictionaries to pass back "labeled" data.


In [10]:
# Access all numpy functions as np.
import numpy as np
import otter
grader = otter.Notebook()

### Function 1 (you fill in the body of the function)

This function calculates the given stats from the list that is passed in. There is test code below this function.

In [11]:
def calc_stats_from_list(in_list):
    """ Calculate mean of positive numbers, mean of negatives numbers
    Separate the list into positive and negative numbers. Calculate the mean of each. Return those means, along with
     how many positive/negative numbers there were
    @param in_list : any list type
    @return - A dictionary with the desired stats"""

    # These are the stats we're calculating. This is more elegant/useful than creating four variables - it keeps all
    #  of the values in the same place and assigns a meaningful label (key) to them
    dict_ret_stats = {"Mean positive": 0, "Mean negative": 0, "Count positive": 0, "Count negative": 0}

    # BEGIN SOLUTION
    # Note that I would normally do this with 4 variables, and then create the dictionary at the end and return it,
    #   (which would save a lot of dictionary accesses) but doing this way because it makes setting up the automatic
    #   grading software easier
    for n in in_list:
        if n < 0:
            dict_ret_stats["Mean negative"] += n
            dict_ret_stats["Count negative"] += 1
        else:
            dict_ret_stats["Mean positive"] += n
            dict_ret_stats["Count positive"] += 1

    # Cool tricks you can do when variables are dictionary names that are strings
    for s in ["positive", "negative"]:
        if dict_ret_stats["Count " + s] > 0:
            dict_ret_stats["Mean " + s] /= dict_ret_stats["Count " + s]
    # END SOLUTION
    return dict_ret_stats


### Test code for list

Create the arrays and test them. Here's another advantage of functions - you can create test data for yourself to make sure the code is working right. Encapsulating the code in a function means you don't 
1. Accidentally change the code when switching from the test data to the real data
2. You can make more than one test 
3. You can run the tests more than once/all the time to double check that you didn't "break" the code

In [12]:
# Test data
test_list_one = [-0.75, -0.25, 1.0 / 3.0, 2.0 / 3.0, 3.0 / 3.0]
test_list_res = calc_stats_from_list(test_list_one)
if not np.isclose(test_list_res["Mean positive"], 2.0 / 3.0):
    print(f"Mean positive is not correct, should be {2.0/3.0}, got {test_list_res['Mean positive']}")

if not np.isclose(test_list_res["Mean negative"], -0.5):
    print(f"Mean negative is not correct, should be -0.5, got {test_list_res['Mean negative']}")

if test_list_res["Count positive"] is not 3:
    print(f"Count positive numbers, should be 3, got {test_list_res['Count positive']}")

if test_list_res["Count negative"] is not 2:
    print(f"Count positive numbers, should be 2, got {test_list_res['Count negative']}")

print("Done tests list")

Done tests list


### Function 2 ( you fill in the body of the function)

For this function, assume the input is an numpy array.

NO **if** statements or **for** loops - do this all with numpy operations

You might find "count_nonzero" useful.

As before, test code is below

In [19]:
def calc_stats_from_nparray(in_nparray):
    """ Calculate mean of positive numbers, mean of negatives numbers
    Separate the list into positive and negative numbers. Calculate the mean of each. Return those means, along with
     how many positive/negative numbers there were
    @param in_list : numpy array
    @return - A dictionary with the desired stats"""

    # These are the stats we're calculating. This is more elegant/useful than creating four variables - it keeps all
    #  of the values in the same place and assigns a meaningful label (key) to them
    dict_ret_stats = {"Mean positive": 0, "Mean negative": 0, "Count positive": 0, "Count negative": 0}

    # BEGIN SOLUTION
    dict_ret_stats = {"Mean positive": np.mean(in_nparray[in_nparray >= 0]),
                      "Mean negative": np.mean(in_nparray[in_nparray < 0]),
                      "Count positive": np.count_nonzero(in_nparray >= 0),
                      "Count negative": np.count_nonzero(in_nparray < 0)}
    # END SOLUTION
    return dict_ret_stats

### Test code for numpy array
There is a "fancy" way to do this test with the second function without duplicating code, but it's confusing, so... we'll just duplicate it here

In [20]:
test_nparray_one = np.array(test_list_one)  # Convert the previous test list to a numpy array
test_list_res = calc_stats_from_nparray(test_nparray_one)
if not np.isclose(test_list_res["Mean positive"], 2.0 / 3.0):
    print(f"Mean positive is not correct, should be {2.0/3.0}, got {test_list_res['Mean positive']}")

if not np.isclose(test_list_res["Mean negative"], -0.5):
    print(f"Mean negative is not correct, should be -0.5, got {test_list_res['Mean negative']}")

if test_list_res["Count positive"] is not 3:
    print(f"Count positive numbers, should be 3, got {test_list_res['Count positive']}")

if test_list_res["Count negative"] is not 2:
    print(f"Count positive numbers, should be 2, got {test_list_res['Count negative']}")

print("Done tests numpy array")

Done tests numpy array


### Create data to test with

Ok, now do it for real. This bit of code will generate a list or numpy array with random positive and negative values.

In [21]:
def create_data(n_data=10, b_ret_numpy=True):
    """ Create a random mix of positive and negative numbers
    @param n_data - how big to make the list/array
    @param b_ret_numpy - return a list or a numpy array
    @return the list or numpy array"""
    my_data = np.random.random_sample(n_data)

    n_to_convert = np.random.randint(low=1, high=n_data-1)
    n_convert = np.random.randint(low=0, high=n_data-1, size=n_to_convert)
    my_data[n_convert] *= -1.0

    if b_ret_numpy is False:
        return list(my_data)
    return my_data


### Actually do the tests
Check by comparing results against each other (doesn't guarantee it's right, but...). Try 10 times. 

We don't care what the iteration is so use _ to say "we don't need a variable"

In [22]:
for _ in range(0, 10):
    # Get some random data
    test_data = create_data()
    test_data_list = list(test_data)

    # Call the two functions - notice cast to a list type
    res_list = calc_stats_from_list(test_data_list)
    res_np = calc_stats_from_nparray(test_data)

    # For all four stored values...
    for k, v in res_list.items():
        # Use isclose instead of == because two of these are floating point values - and == never works with
        #  floating point values
        # Since we used the same keys (names) in the two different dictionaries, we can pass that key to the other
        #   dictionary
        if not np.isclose(res_np[k], v):
            print(f"Returned different values {k}, {v} and {res_np[k]}")

print("Done test")

Done test


In [23]:
# HIDDEN
test_data = [-1, 2, -3, 4, -5]
res = calc_stats_from_list(test_data)
assert np.isclose(res["Mean positive"], 3)
assert np.isclose(res["Mean negative"], -3)
assert np.isclose(res["Count positive"], 2)
assert np.isclose(res["Count negative"], 3)

In [24]:
# HIDDEN
test_data = np.array([-1, 2, -3, 4, -5])
res = calc_stats_from_nparray(test_data)
assert np.isclose(res["Mean positive"], 3)
assert np.isclose(res["Mean negative"], -3)
assert np.isclose(res["Count positive"], 2)
assert np.isclose(res["Count negative"], 3)