# Functions

First concept: All of the data/variables/inputs are passed INTO the function. You can create as many variables in the function as you want, but only the values/data in the return statement are returned to the calling function

Implications:

When you are designing your function you should mentally (or on paper) write down
- Input: What data do you need?
- Output: What data are you going to return:
- Body of function: How is the output related to the input?
- Mathematical function
 - Data manipulation (eg, reorganize the data into a dictionary, sort it, etc)
 - Classification/Decision making (eg, does the data contain a zero? How many 1's in the data, etc)
- Practical: Scoping. The variables IN the function are not the same as the variables OUTSIDE the function - even if they have the same name. One way to think about it is that every variable in the function is actually called **func_name.a**, not **a** - and every variable outside the function is called **global.a**, not **a**.

This isn't quite true, but it's pretty close


In [2]:
# Don't forget the imports
import numpy as np

In [None]:
# TODO We will come back to this later - this SHOULD fail the first time you execute this because, well, the variable
# doesn't exist.
print(f"{a_global_variable}")

## Function definition

It's a good idea to put each function in its own cell, and to put all of the functions at the top of the JN.

In this case, we're going to put the function here and the "test" code in the following cell

In [4]:
# First try it and see: input --- output, scope of variables in a function versus not
def func_calc_min_max(data):
    """ Return the min and max of the data array
    @param data - a numpy array with some numbers in it
    @returns min_value, max_value of the numpy array"""
    min_value = np.min(data)
    max_value = np.max(data)

    # TODO: Uncomment the following line. What error does this generate, and why? Where is it generated?
    #   Hint: where is my_data defined?
    # m_v = np.min(my_data)
    return min_value, max_value

In [None]:
# Calling function - note that it's really a terrible idea to have code like this at the "top"/global level, but
#   I'm trying to keep things simple
my_data = np.random.uniform(-0.2, 0.6, (100, 2))  # Generate some random numbers between -0.2 and 0.6

out_min, out_max = func_calc_min_max(my_data)
print(f"Output {out_min} {out_max}")

TODO 1: Try changing the # todo in the func_calc_min_max cell, then re-evalute both the function and the code that calls it. What happened? Why? Put it back the way it was before doing the next TODO

In [None]:
# TODO 2: Try the following line. What error does this generate, and why?
#   Hint: What happens to variables in func_calc_min_max when the function exits? Do min_value and max_value exist
#.   outside of func_calc_min_max?
print(f"Min value {min_value}, max_value {max_value}")

In [None]:
# This doesn't generate an error. What happened to the min/max return value? See hint above
func_calc_min_max(my_data)

In [None]:
# This will fail with the first common error you get with functions - incorrect number of parameters
#  TODO: To fix, you need to pass it only 1 argument, not 2... but the real question usually is: Why did I think it needed
#    2 parameters?
func_calc_min_max(my_data, 3)

In [18]:
# A function that will tell you that data is the "wrong" type
def func_calc_min_max_with_check(data):
    """ Return the min and max of the data array
    @param data - a numpy array with some numbers in it
    @returns min_value, max_value of the numpy array"""
    try:
        min_value = np.min(data)
        max_value = np.max(data)
    except:
        min_value = 0
        max_value = 0
        print(f"Expected data to be of type numpy, but it wasn't")

    return min_value, max_value

In [None]:
# This will fail with the second common error you get with functions - incorrect paramter type
#  TODO: Look at the modified func_calc_min_max_with_check code above - see the "try" "except"? This
#    was added to provide you (the user) with some idea of what went wrong before you get too far into the
#    actual function
func_calc_min_max_with_check("my_data")

In [None]:
# This is what happens 
#  TODO: Scroll through the (very lengthy) error message below. Notice the first green arrow - that is where in the previous 
#    cell the error ocurred. 
#         Find the second arrow - this is the line in func_calc_min_max that the error occured... But that line of code is not wrong... 
#        Scroll a bit further down and you'll see another File ... - this is the numpy min function
#     There's yet another arrow below that - this is where in min it actually made the error.
#     This line is a rather uninformative UFuncTypeError - which is a fancy way of saying "that data type doesn't work"
func_calc_min_max("my_data")

# Parameter passing

Python's input paramter specification and parameter passing is very sophisticated. One of the best things about Python is that you can directly assign variables to parameters by name. Side-stepping all of the fancy things you can do, I'd recommend the following two rules:

- If the function only has one parameter, just pass it in (see example above)
- If the function has more than one parameter, use pass by name.
 - Why: One of the most common (and sometimes difficult) to debug problems is that you *thought* you were passing zero to the **x** parameter, but you actually sent it to the **y** one..


In [None]:
# Next try it and see, parameter ordering
def func_lots_of_params(a, b, c):
    """ Function with three parameters
    @param a - should be a string
    @param b - should be a list
    @param c - should be a dictionary
    @return None - no return value"""

    print("a should be a string: convert to upper case " + a.upper())

    print(f"b has {len(b)} elements")
    for i, b_elem in enumerate(b):
        print(f"{i}th element of b is {b_elem}, should be {b[i]}")

    for k, v in c.items():
        print(f"key {k} has value {v}")
    return None


Now call that function with the "correct" parameters...

In [None]:
a_string = "Hello, I'm a string"
b_list = [3.0, "hi", 10.0]
c_dict = {"key 1":"value 1", "Key 2":0.3}

# This one works and does what you expect (prints out the values of the variables)
func_lots_of_params(a_string, b_list, c_dict)


Now call that function with the "incorrect" parameters...

In [None]:
# TODO: Try this one and see if you can predict what error will be generated. 
#   Hint: does a list have an "upper" method like a string does?
func_lots_of_params(b_list, a_string, c_dict)

To prevent this problem you can use pass by parameter name - note that the parameters do NOT need to be in the
correct order! (Although you probably do want to keep the order the same, anyways...)

In [None]:
# TODO: Rearrange the parameter order - does is matter? What happens if you take out one of the a=?
func_lots_of_params(a=a_string, c=c_dict, b=b_list)

In [None]:
# You don't have to set the parameters to variables, either, you can just set them to values:
func_lots_of_params(a="mystring", b=[0.2, 0.3], c={"as":0.2, "bs":0.3})

# Global variables and breaking function encapsulation

Also known as: Why I dislike Jupyter Notebooks and writing code not inside of functions

It can be really tempting to use global variables rather than passing values in as parameters. Saves typing, but will almost always lead to debugging headaches. 

In [None]:
# Every variable declared like this is accessible *everywhere* after it's declared. Really crazy, if you execute 
#.  this cell here, then go up to the top of the file and add a cell that has in it
#.  print(f"{a_global_variable}")
# .. it will work. Why? 
a_global_variable = 3.0

# TODO: Execute this cell then go to the top of the file and try the print statement - did it work?
#. Should it, really?
#.  What happens if you do "Kernel -> Restart and run all - will it still work? "

TODO: Execute the next two cells. This works - because **a_global_variable** is defined above - and when you call *func_bad_use_global_variable* is uses that variable

In [20]:
def func_bad_use_global_variable(a_np_array):
    """ Multiply the numpy array by the global variable
    @param a_np_array - a numpy array
    @return a numpy array"""
    print(f"Global variable {a_global_variable}")
    return a_np_array * a_global_variable


In [None]:
a_np_array = np.linspace(0, 1, 6)
b_output_array = func_bad_use_global_variable(a_np_array)
print(f"Does what is expected {b_output_array}")

a_global_variable = 10

Ok, technically this IS doing what is expected - unless you didn't notice that **a_global_variable** was set to 10 in the previous cell... what if you thought it was still set to 3 like it was in the initial declaration?

In [None]:
# TODO: Understand why this call returns different values than the previous one
b_output_array_bad = func_bad_use_global_variable(a_np_array)
print(f"Oops: new output array {b_output_array_bad}")