# Week 5

### Table of Contents

1. [Question 1](#bullet1)
2. [Question 2](#bullet2)
3. [How to name functions](#bullet3)
4. [Comments](#bullet4)
5. [Practice](#bullet5)
6. [Looking at function source code](#bullet6)
7. [Conditional Statements](#bullet7)
8. [Types of function args](#bullet8)
9. [Stop functions and send an error](#bullet9)
10. [Args and Kwargs](#bullet10)
11. [Return statements](#bullet11)
12. [Lazy evaluation](#bullet12)
13. [Lexical scoping](#bullet13)

In [202]:
## importing necessary libraries
import numpy as np
import pandas as pd
import scipy

## 1.  Question: What does the following code do?<a class="anchor" id="bullet1"></a>

In [203]:
rng = np.random.default_rng(seed=3252)

a = rng.standard_normal(10)
b = rng.standard_normal(10)
c = rng.standard_normal(10)
d = rng.standard_normal(10)

the_dict = {'a':a,'b':b,'c':c,'d':d}

df = pd.DataFrame(the_dict)

## 2. Question: What does the following code do?<a class="anchor" id="bullet2"></a>

In [204]:
df['a'] = (df['a'] - min(df['a'])) / (max(df['a']) - min(df['a']))
df['b'] = (df['b'] - min(df['b'])) / (max(df['b']) - min(df['b']))
df['c'] = (df['c'] - min(df['c'])) / (max(df['a']) - min(df['c']))
df['d'] = (df['d'] - min(df['d'])) / (max(df['d']) - min(df['d']))

In [205]:
# answer: rescaling the data to have values between 0-1
# issue: there is a mistake in the code above, can you spot it?
# point: writing a function will simplify this code and reduce the chances for an error

In [206]:
#In the previous code we are trying to rescale data to values between 0 and 1
#Let's look at one line of the code.
df['a'] = (df['a'] - min(df['a'])) / (max(df['a']) - min(df['a']))

# Question: How many inputs does the following code have?
# Answer: One! df['a']

In [207]:
# let's write a function which simplifies the above. Before we do so, note that the general syntax for writing functions is.
# Key parts of a function:
# 1. You need to name the function (e.g.-rescale01)
# 2. You need inputs (or arguments) (e.g.-function(x,y,z)
# 3. You need code in the body of the function 

def function_name(arg1, arg2):
    return # Do something!

Your function can have any number of inputs or arguments.  
__Tip 1:__ When we write functions, it's a good idea to name the arguments something generic

In [208]:
# Our function requires a single input, a pd.Series object, which we will name x
def rescale01(x):
    return (x - min(x)) / (max(x)-min(x))

This is a good start. But notice how we are calling min(x) two separate times. This is not ideal in two respects. First, should you ever want to update min(x) with something else, you'll have to change it everywhere in the code. Second, whenever you need the same value more than once, your code will be more readable (and faster!) if you compute it one time and store the result in a variable  
__Tip 2:__ Within your function, make your code as concise as possible and avoid repeating the same computation multiple times 

In [209]:
#rather than using min() and max() to compute the range multiple times, let's use  to shorten the code
def rescale01(x):
    min_ = x.min()  # min and max are keywords, so the "_" can be appended to avoid confusion
    max_ = x.max()
    return ( x - min_ ) / ( max_ - min_ )

In [210]:
# does our function work?
print(df['a'] == rescale01(df['a']))

0    True
1    True
2    True
3    True
4    True
5    True
6    True
7    True
8    True
9    True
Name: a, dtype: bool


In [211]:
print(df['b'] == rescale01(df['b']))

0    True
1    True
2    True
3    True
4    True
5    True
6    True
7    True
8    True
9    True
Name: b, dtype: bool


In [212]:
print(df['c'] == rescale01(df['c']))

0    False
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8    False
9     True
Name: c, dtype: bool


In [213]:
print(df['d'] == rescale01(df['d']))

0    True
1    True
2    True
3    True
4    True
5    True
6    True
7    True
8    True
9    True
Name: d, dtype: bool


In [214]:
# Compare this to the original code.
# question: is it easier to understand what the point of the code is?
# question: is it easier to read/less complex?

# answers:  YES!!
# also note, the code will be easier to change in the future

In [215]:
# example: the following fails - it returns incorrect values because of the np.inf element
x = pd.Series([1, 2, 3, 4, np.inf])
rescale01(x)

0    0.0
1    0.0
2    0.0
3    0.0
4    NaN
dtype: float64

#### Challenge: Fix the function above so that if np.inf values are passed in, the function won't fail

In [216]:
# Let's go back to the original function and recode it to fix the problem:


In [217]:
# The above ensures that our range includes numeric values (not Infs)

Coding best practices: __"Do not Repeat Yourself" (aka "DRY")__. More repitition in code means higher chance of having errors in code.

## 3. How to name functions<a class="anchor" id="bullet3"></a>

In [218]:
# Name should be as short as possible and should make clear what function does

# Verbs better than nouns if possible

# Too short
f()

# Not a verb, or descriptive
my_awesome_function()

# Long, but clear
impute_missing()
collapse_years()


#snake case versus camel case
#snake case is when you write the function name in the following way:

snake_case()

# don't do this
add.period() 

# Camel case is written in the following manner:
# but it's 2023, so not in vouge anymore
camelCase()

# whichever you choose, be consistent rather than moving back and forth
# i.e.- don't do the following:

def col_mins(x,y): 
    return min(x,y)

def rowMaxes(x,y): 
    return max(x,y)



# If you have a group of functions working towards a similar goal
# try to keep the naming conventions in the same order as arguments

# Good
input_select()
input_checkbox()
input_text()

# Not so good
select_input()
checkbox_input()
text_input()

NameError: name 'f' is not defined

In [219]:
# Moreover, try not to write functions that overwrite functions or variables
# that are already a part of python.  

int = 0 # int is used to instantiate an integer

## 4. Comments<a class="anchor" id="bullet4"></a>

In [220]:
# Now a few comments about comments...

# Comments should explain the why of the code rather than the what or the how
# the what or the how should be obvious in the code itself

# to break code into chunks that are easy to read use "-" or "="

# Load data------------------------------------------------------------
code entered here

#Plot data============================================================
code entered here

SyntaxError: invalid syntax (3513960872.py, line 9)

## 5. Practice<a class="anchor" id="bullet5"></a>

In [221]:
# Practice naming functions: Read the source code for each of the following 
# three functions, puzzle out what they do, and then brainstorm better names.

In [222]:
def prefix_match(string, prefix):
    temp = string.find(prefix)
    if temp == -1:
        return False
    else:
        return True 

In [20]:
prefix_match('dog', 'do')

True

In [21]:
prefix_match('dog', 'st')

False

In [22]:
def remove_last(x):
    if len(x) <= 1:
        return None
    return x[:-1]

In [23]:
remove_last("dog")

'do'

In [24]:
remove_last([1, 2, 3, 4, 5])

[1, 2, 3, 4]

In [25]:
def match_length(x, y):
    lst = []
    
    for i in range(len(x)):
        lst.append(y)
    return lst

In [26]:
match_length("dog", "d")

TypeError: 'int' object is not callable

## 6. Looking at function source code<a class="anchor" id="bullet6"></a>

In [89]:
?np.mean

In [90]:
?rescale01 # Our function

## 7. Conditional Statements<a class="anchor" id="bullet7"></a>

In [29]:
#looks like the following:

if (condition):
  # code executed when condition is TRUE
else:
  # code executed when condition is FALSE

IndentationError: expected an indented block after 'if' statement on line 3 (263122859.py, line 5)

In [30]:
# example of a function using if then statements ...

def temp1(x):
    if x % 2 == 0:
        return True
    return False

# why does this work without an else clause?
# what does this function do?

In [31]:
# can search for several conditions using & or |

def temp2(x):
    if (x % 2 == 0) & (x % 5 == 0):
        return True
    return False

def temp3(x):
    if (x % 2 == 0) | (x % 5 == 0):
        return True
    return False
# what is the difference between these functions?

In [32]:
# multiple conditions
if (condition):
    # do this
elif (condition):
    # do this
else:
    # do this

IndentationError: expected an indented block after 'if' statement on line 2 (2076156245.py, line 4)

In [92]:
vec = [1, 3, 5, 7, 9]
if sum(vec) < 1:
    print("empty list")
elif sum(vec) == 1:
    print("I'm a unitary vector")
else:
    print("I'm not a unitary vector")

I'm not a unitary vector


In [34]:
# any() and all() are also helpful if your condition returns a vector and
# you need to collapse to a single TRUE/FALSE statement

In [35]:
any([True, False, False])

True

In [36]:
all([True, False, False])

False

In [37]:
any([True if i > np.mean(vec) else False for i in vec])

True

In [38]:
all([True if i > np.mean(vec) else False for i in vec])

False

##### Question: what is the difference between any and all?

## 8. Types of function args<a class="anchor" id="bullet8"></a>

In [93]:
#two broad types: data args and args that control details of computation

#example:

x = pd.Series([1, 24, 4325, 8432, 34])

def my_mean(lst, condition=False):
    temp = x
    if condition:
        temp = [i for i in lst if i > 1000]
    return sum(temp)/len(temp)
        

my_mean(x, condition=True)

# lst refers to the data and condition=False is set to False by default
# which controls details of computation in function

# data arguments should come first followed by args with details for computation

6378.5

In [97]:
# another example:
# compute confidence interval around mean using normal approximation
def mean_ci(x, conf = 0.95):
    se = np.std(x) / np.sqrt(len(x))
    alpha = 1 - conf
    return np.mean(x) + se * scipy.stats.norm.ppf([alpha/2, 1 - alpha/2])

In [115]:
print("X    Freq")
print(x)
print("\n")
print("Mean: "+  str(my_mean(x)))
print("The 95% CI is (" + str(mean_ci(x, conf=0.95)) + ")")

X    Freq
0       1
1      24
2    4325
3    8432
4      34
dtype: int64


Mean: 2563.2
The 95% CI is ([-395.13852072 5521.53852072])


##### Question: Why is this 95% CI for the mean so wide?

In [125]:
x = np.random.uniform(1,100,1000)
print("Mean: "+  str(round(my_mean(x), 3)))
print("The 95% CI is (" + str(mean_ci(x, conf=0.95)) + ")")

Mean: 50.123
The 95% CI is ([48.31894306 51.92760113])


In [126]:
x = np.random.uniform(1,100,10)
print("Mean: "+  str(round(my_mean(x), 3)))
print("The 95% CI is (" + str(mean_ci(x, conf=0.95)) + ")")

Mean: 56.15
The 95% CI is ([41.32501528 70.9754169 ])


## 9. Stop functions and send an error<a class="anchor" id="bullet9"></a>

In [130]:
len([4,5,6])

3

In [169]:
# It's good practice to check important preconditions, and throw an error if they are not true:
def wt_mean(x, w):
    assert len(x) == len(w), "x and w must be the same length!!!"
    return sum(np.multiply(x, w))/sum(w)


In [172]:
x = [1,2,3]
w = [1,1,1]
print("The weighted mean of x is: " + str(wt_mean(x, w)))

The weighted mean of x is: 2.0


In [170]:
x = [1,2,3]
w = [1,1,1,1]
wt_mean(x, w)

AssertionError: x and w must be the same length!!!

###### Question: What is does the wt_mean() function call below return? 

In [None]:
x = [1,2,3]
w = [1,1,10]
wt_mean(x, w)
print(" ")

## 10. Args and Kwargs<a class="anchor" id="bullet10"></a>

__*args__ is used if you need to pass a variable number of objects into the function. The asterisk * is a tuple __unpacking operator__

In [190]:
def my_sum(*args):
    result = 0
    # Iterating over the Python args tuple
    for x in args:
        result += x
    return result
print(my_sum(1, 2, 3))
print("=======\n" + str(my_sum(1, 2, 3, 4, 5)))

6
15


__**kwargs__ is like args, except you can pass multiple __named__ arguments. The double asterisks ** are a __dictionary unpacking operator__

In [193]:
def concatenate(**kwargs):
    result = ""
    # Iterating over the Python kwargs dictionary
    for arg in kwargs.values():
        result += arg
    return result

print(concatenate(b="Python ", c="Is ", d="Great", e="!"))

Python Is Great!


In [None]:
def concatenate(**kwargs):
    result = ""
    # Iterating over the Python kwargs dictionary
    for arg in kwargs.values():
        result += arg
    return result

In [158]:
def foo1(a, *args):
    print(f"a is: {a}")
    print(f"args are: {args}") # args takes in tuple packing

In [159]:
foo1(1)

a is: 1
args are: ()


In [160]:
foo1(1, 2, 3, 4)

a is: 1
args are: (2, 3, 4)


In [161]:
def foo2(a, **kwargs):
    print(f"a is: {a}")
    print(f"kwargs are: {kwargs}") # kwargs takes in dictionary packing

In [162]:
foo2(1)

a is: 1
kwargs are: {}


In [50]:
foo2(1, b=2, c=3, d=4)

a is: 1
kwargs are: {'b': 2, 'c': 3, 'd': 4}


In [51]:
# can use them together
def foo(a, *args, **kwargs):
    print(f"a is: {a}")
    print(f"args are: {args}")
    print(f"kwargs are: {kwargs}")

In [52]:
foo(1, 2, 3, d=4, e=5)

a is: 1
args are: (2, 3)
kwargs are: {'d': 4, 'e': 5}


In [194]:
# print_list.py
my_list = [1, 2, 3]
print(my_list)

[1, 2, 3]


In [199]:
my_list = [1, 2, 3]
print(my_list)   # print list
print(*my_list)  # print unpacked list, which is just the contents of the previous list

[1, 2, 3]
1 2 3


In [200]:
def my_sum(*args):
    result = 0
    for x in args:
        result += x
    return result

list1 = [1, 2, 3]
list2 = [4, 5]
list3 = [6, 7, 8, 9]

print(my_sum(*list1, *list2, *list3))

45


## 11. Return statements <a class="anchor" id="bullet11"></a>

Throughout all the functions we have seen, there have been return statements. What do they do?

In [53]:
# When create a function you may want to return some information. 

In [201]:
# example
def check(x):
    result = ''
    if x > 0:
        result = 'Positive'
    elif x < 0:
        result = 'Negative'
    else:
        result = 'Zero'
    return result

In [55]:
check(1)

'Positive'

## 12. Lazy evaluation <a class="anchor" id="bullet12"></a>

In a nutshell, lazy evaluation means that the object is evaluated when it is needed, not when it is created.

## 13. Lexical scoping <a class="anchor" id="bullet13"></a>

Not essential you understand this to write functions in Python, but can't hurt

Environment of a function controls how Python finds the value associated with a name. The key concept for function environment is something called "lexical scoping". Lexical scoping means that free variables in a function (i.e. variables that are used in a function but not defined in the function) are looked up in the parent environment of the function.

Functions are therefore tied to the environment of each session of Python.

Usually not a problem, but it is something to be aware of.