# <span style="color:darkblue"> Lecture 8: Local/Global and Apply </span>

<font size = "5">

In the previous lecture we ...

- Worked through the definition of functions
- Illustrated some examples

In this lecture, we will ...

- Discuss the syntax of functions (local/global)
- Apply functions to multiple elements in a data frame
- Introduce ".py" files

## <span style="color:darkblue"> I. Import Libraries </span>

In [1]:
# the "pandas" library is for manipualting datasets

import pandas as pd
import numpy as np


## <span style="color:darkblue"> II. Local/Global Variables </span>

<font size="5"> 

Most of the variables we've defined so far are "global"

- Stored in working environment
- Can be referenced in other parts of the notebook



<font size = "5">
Example:

In [2]:
message_hello = "hello"
number3       = 3

# Global means that they are store in the environment and can be accessed in other notebooks

In [3]:
print(message_hello + " world")
print(number3 * 2)

hello world
6


<font size = "5">

Any "global" variable can be referenced inside functions

- However, this can lead to mistakes
- Preferrably, include **all** the inputs as parameters

<font size = "5">

$f(x,y,z) = x + y + z$

In [4]:
# Correct Example:
def fn_add_recommended(x,y,z):
    return(x + y + z)

print(fn_add_recommended(x = 1, y = 2, z = 5))
print(fn_add_recommended(x = 1, y = 2, z = 10))

# Local variables are things that is available in temparory fashion 

8
13


In [5]:
# Example that runs (but not recommended)
# Python will try to fill in any missing inputs
# with variables in the working environment
def fn_add_notrecommended(x,y):
    return(x + y + z)

z = 5
print(fn_add_notrecommended(x = 1, y = 2))
z = 10
print(fn_add_notrecommended(x = 1, y = 2))

# Have to define z in a global environment while x and y are local environment

8
13


<font size ="5">

Variables defined inside functions are "local"

- Stored "temporarily" while running
- Includes: Parameters + Intermediate variables


<font size = "5">

Local variables supercede global variables

In [6]:
# This is an example where we define a quadratic function
# (x,y) are both local variables of the function
# 
# When we call the function, only the arguments matter.
# any intermediate value inside the function

def fn_square(x):
    y = x**2
    return(y)

x = 5
y = -5

print(fn_square(x = 1))

print(x)
print(y)

# x, y are all local with the define function 
# x and y are in the global environment 
# They would be different 

1
5
-5


<font size = "5">

Local variables are **not** stored in the working environment

In [7]:
# The following code assigns a global variable x
# Inside the function

x = 5
y = 4

print("Example 1:")
print(fn_square(x = 10))
print(x)
print(y)

print("Example 2:")
print(fn_square(x = 20))
print(x)
print(y)

# Even though you have different values of x it is not going to change 

Example 1:
100
5
4
Example 2:
400
5
4


<font size = "5">

To permanently modify a variable, use the "global" command

In [8]:
def modify_x():
    global x
    x = x + 5

x = 1
# Now, running the function wil permanently increase x by 5.
modify_x()
print(x)

# Function (specify that it is a global x)

6


In [9]:
modify_x()
print(x)

# Be careful about what is being stored/ not being stored


11


<font size = "5">

Try it yourself:

- What happens if we run "modify_x" twice?
- What happens if we add "global y" inside "fn_square"?

In [10]:
# Write your own code here
modify_x()
modify_x()
print(x)

# You should get a value of 21

def fn_square(x):
    global y
    y = x**2
    return(y)

fn_square(x=5)  
print(y) 

fn_square(x=25)  
print(y) 

# It changed the value of y as it is a global variable


21
25
625


## <span style="color:darkblue"> III. Operations over data frames (apply/map) </span>


<font size = "5">

Create an empty data frame

In [11]:
data  = pd.DataFrame()

# Empty dataframe

In [12]:
print(data)

Empty DataFrame
Columns: []
Index: []


<font size = "5">

Add variables

In [13]:
# The following are lists with values for different individuals
# "age" is the number of years
# "num_underage_siblings" is the total number of underage siblings
# "num_adult_siblings" is the total number of adult siblings

data["age"] = [18,29,15,32,6]
data["num_underage_siblings"] = [0,0,1,1,0]
data["num_adult_siblings"] = [1,0,0,1,0]


In [14]:
data

Unnamed: 0,age,num_underage_siblings,num_adult_siblings
0,18,0,1
1,29,0,0
2,15,1,0
3,32,1,1
4,6,0,0


<font size = "5">

Define functions

In [15]:
# The first two functions return True/False depending on age constraints
# The third function returns the sum of two numbers
# The fourt function returns a string with the age bracket

fn_iseligible_vote = lambda age: age >= 18
# Argument is age and age >= 18 is output 

fn_istwenties = lambda age: (age >= 20) & (age < 30)
# Seeing whether they are in the 20s

fn_sum = lambda x,y: x + y

def fn_agebracket(age):
    if (age >= 18):
        status = "Adult"
    elif (age >= 10) & (age < 18):
        status = "Adolescent"
    else:
        status = "Child"
    return(status)
# Parameter is age 
# Body is asking

<font size = "5">
Applying functions with one argument: <br>

```python
 apply(myfunction)
 ```
 - Takes a dataframe series (a column vector) as an input
 - Computes function separately for each individual


In [16]:
# The fucntion "apply" will extract each element and return the function value
# It is similar to running a "for-loop" over each element
# It is like a loop over each individual 

data["can_vote"] = data["age"].apply(fn_iseligible_vote)
data["in_twenties"] = data["age"].apply(fn_istwenties)
data["age_bracket"] = data["age"].apply(fn_agebracket)

# Creating a new variable, takes the age and applying whether that individual can vote (true or false output)
# Creating a new variable, takes age and apply whether the individual is in their twenties (True or false)
# Input is on the left of the collon, function (right) cannot have any parameters 

# NOTE: The following code also works:
# data["can_vote"]    = data["age"].apply(lambda age: age >= 18)
# data["in_twenties"] = data["age"].apply(lambda age: (age >= 20) & (age < 30))


display(data)

# It is like a loop but it is very convenient 


Unnamed: 0,age,num_underage_siblings,num_adult_siblings,can_vote,in_twenties,age_bracket
0,18,0,1,True,False,Adult
1,29,0,0,True,True,Adult
2,15,1,0,False,False,Adolescent
3,32,1,1,True,False,Adult
4,6,0,0,False,False,Child


<font size = "5">

Creating a new variable

In [17]:
data['new_var'] = data['age'].apply(lambda age: age >= 18)
data

Unnamed: 0,age,num_underage_siblings,num_adult_siblings,can_vote,in_twenties,age_bracket,new_var
0,18,0,1,True,False,Adult,True
1,29,0,0,True,True,Adult,True
2,15,1,0,False,False,Adolescent,False
3,32,1,1,True,False,Adult,True
4,6,0,0,False,False,Child,False


<font size = "5">

Dropping an existing variable

In [18]:
data = data.drop(columns=['new_var'])
data

# Removing a variable 
# Intended for functions with one arguement 

Unnamed: 0,age,num_underage_siblings,num_adult_siblings,can_vote,in_twenties,age_bracket
0,18,0,1,True,False,Adult
1,29,0,0,True,True,Adult
2,15,1,0,False,False,Adolescent
3,32,1,1,True,False,Adult
4,6,0,0,False,False,Child


<font size = "5">

Mapping functions with one or more arguments <br>

**Definition:** The ```map()``` function executes a <br>
specified function for each item in an iterable <br>
(such as a list or an array). <br>
 The item is sent to the function as a parameter.

```python
list(map(myfunction, list1,list2, ....))
```

In [19]:
list(map(fn_iseligible_vote, data["age"]))

# Inputting the function and a different list of the arguements that should be used to apply 

[True, True, False, True, False]

In [20]:
# Repeat the above example with map
# We use list() to convert the output to a list
# The first argument of map() is a function
# The following arguments are the subarguments of the function

data["can_vote_map"] = list(map(fn_iseligible_vote, data["age"]))

data

Unnamed: 0,age,num_underage_siblings,num_adult_siblings,can_vote,in_twenties,age_bracket,can_vote_map
0,18,0,1,True,False,Adult,True
1,29,0,0,True,True,Adult,True
2,15,1,0,False,False,Adolescent,False
3,32,1,1,True,False,Adult,True
4,6,0,0,False,False,Child,False


In [21]:
# In this example, there are more than two arguments

data["num_siblings"] = list(map(fn_sum,
                                data["num_underage_siblings"],
                                data["num_adult_siblings"]))


# use fn_sum and then define 2 sets of vectors 

data

Unnamed: 0,age,num_underage_siblings,num_adult_siblings,can_vote,in_twenties,age_bracket,can_vote_map,num_siblings
0,18,0,1,True,False,Adult,True,1
1,29,0,0,True,True,Adult,True,0
2,15,1,0,False,False,Adolescent,False,1
3,32,1,1,True,False,Adult,True,2
4,6,0,0,False,False,Child,False,0


In [22]:
list(map(fn_sum, data["num_underage_siblings"],
         data["num_adult_siblings"]))

[1, 0, 1, 2, 0]

In [23]:
data["num_underage_siblings"] + data["num_adult_siblings"]

# simple way to add them up

0    1
1    0
2    1
3    2
4    0
dtype: int64

In [24]:
data["num_underage_siblings"]

0    0
1    0
2    1
3    1
4    0
Name: num_underage_siblings, dtype: int64

In [25]:
data["num_adult_siblings"]

0    1
1    0
2    0
3    1
4    0
Name: num_adult_siblings, dtype: int64

<font size = "5">

<span style="color:darkgreen"> Recommended! </span>

- Arguments can be split into multiple lines!
- Start a separate line after a comma
- Experts recommend each line has 80 characters or less

In [26]:
data["num_siblings"] = list(map(fn_sum,
                                data["num_underage_siblings"],
                                data["num_adult_siblings"]))

<font size = "5">

Try it yourself!

- Write a function checking whether num_siblings $\ge$ 1
- Add a variable to the dataset called "has_siblings"
- Assign True/False to this variable using "apply()"

In [27]:
# Write your own code

data["Morethan_siblings"] = data['num_siblings'].apply(lambda num_siblings: num_siblings >= 1)

# OR

fn_sum_siblings = lambda num_siblings: num_siblings >= 1

data ["has_siblings"] = data["num_siblings"].apply(fn_sum_siblings)

display(data)

Unnamed: 0,age,num_underage_siblings,num_adult_siblings,can_vote,in_twenties,age_bracket,can_vote_map,num_siblings,Morethan_siblings,has_siblings
0,18,0,1,True,False,Adult,True,1,True,True
1,29,0,0,True,True,Adult,True,0,False,False
2,15,1,0,False,False,Adolescent,False,1,True,True
3,32,1,1,True,False,Adult,True,2,True,True
4,6,0,0,False,False,Child,False,0,False,False


<font size = "5">

Try it yourself!

- Read the car dataset "data_raw/features.csv"
- Create a function that tests whether mpg $\ge$ 29
- Add a variable "mpg_above_29" which is True/False if mpg $\ge$ 29
- Store the new dataset to "data_clean/features.csv"


In [28]:
# Write your own code

car = pd.read_csv("data_raw/features.csv")

fn_nospeeding = lambda mpg: mpg >= 29

car["mpg_above_29"] = car["mpg"].apply(fn_nospeeding)

car.to_csv("data_clean/features.csv")

display(car)

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,vehicle_id,mpg_above_29
0,18.0,8,307,130,3504,12.0,C-1689780,False
1,15.0,8,350,165,3693,11.5,B-1689791,False
2,18.0,8,318,150,3436,11.0,P-1689802,False
3,16.0,8,304,150,3433,12.0,A-1689813,False
4,17.0,8,302,140,3449,10.5,F-1689824,False
...,...,...,...,...,...,...,...,...
393,27.0,4,140,86,2790,15.6,F-1694103,False
394,44.0,4,97,52,2130,24.6,V-1694114,True
395,32.0,4,135,84,2295,11.6,D-1694125,True
396,28.0,4,120,79,2625,18.6,F-1694136,False


<font size = "5">

Try it yourself!

- Map can also be applied to simple lists!
- Create a lambda function with arguments {fruit,color}.
- The function returns the string <br>
" A {fruit} is {color}"
- Create the following two lists:

``` list_fruits  = ["banana","strawberry","kiwi"] ```

``` list_colors  = ["yellow","red","green"] ```
- Use the list(map()) function to output a list with the form

In [29]:
# Write your own code






## <span style="color:darkblue"> IV. (Optional) External Scripts </span>

<font size = "5">

".ipynb" files ...

- Markdown + python code
- Great for interactive output!

".py" files ...

- Python (only) script
- Used for specific tasks
- Why? Split code into smaller, more manageable files



<font size = "5">

<table><tr>
<td style = "border:0px"> <img src="figures/screenshot_py_functions.png" alt="drawing" width="300"/>  </td>
<td style = "border:0px">

File with functions

 </td>
</tr></table>

**A module is just a Python program that ends with <br>
.py extension and a folder that contains a <br>
 module becomes a package!**




<font size = "5">

We can import functions into the working <br>
environment from a file

In [30]:

import scripts.example_functions as ef

<font size = "5">

We reference them using the alias

In [31]:
x = 1
print(ef.fn_quadratic(1))
print(ef.fn_quadratic(5))

ef.message_hello("Juan")



1
25


'hi Juan'


<font size = "5">

<table><tr>
<td style = "border:0px"> <img src="figures/screenshot_py_variables.png" alt="drawing" width="300"/>  </td>
<td style = "border:0px">

File with variables

- Storing values/settings
- Variables are global <br>
(can be referenced later)

</td>
</tr></table>

<font size = "5">

We can also import and reference variables

In [32]:
import scripts.example_variables as ev

In [33]:
# When we run this program
# # the value of alpha will be overwritten

alpha = 1
print(alpha)
print(ev.alpha)

1
5


<font size = "5">

We can also use ``` from ``` and ``` * ``` <br>
to import variables directly into the working <br>
environment

In [34]:
from scripts.example_variables import *

print(alpha)
print(beta)
print(gamma)
print(delta)


5
10
20
100
