### *If you are using Google Colab, first run the code cell below. You can run a cell by clicking in the cell and clicking on the arrow that appears on the left side of the cell.

In [None]:
!wget "https://raw.githubusercontent.com/aGitHasNoName/lambda/master/plots.csv"

# <br>lambda functions

## <br>Common use #1 that you shouldn't do

### Naming a lambda function - NO

In [1]:
def add_two(a, b):
    c = a + b
    return c

In [2]:
add_two(5, 6)

11

In [3]:
def add_two(a, b):
    return a + b

In [4]:
add_two(5, 6)

11

In [5]:
add_two = lambda a,b: a + b

In [6]:
add_two(5, 6)

11

### <br><br>Exercise 1

In [7]:
capitalize_word = lambda word: word.lower().capitalize()
capitalize_word("POTATO CHIPS")

'Potato chips'

Rewrite the lambda function above as a regular function called `capitalize_word2`:

In [8]:
def capitalize_word2(word):
    return word.lower().capitalize()

In [9]:
capitalize_word2("POTATO CHIPS")

'Potato chips'

### <br>Exercise 2

In [10]:
def remove_year(phrase):
    return phrase.rstrip("2020")
remove_year("receiptMarch2020")

'receiptMarch'

Rewrite the function above as a lambda function:

In [11]:
remove_year2 = lambda phrase: phrase.rstrip("2020")

In [12]:
remove_year2("receiptMarch2020")

'receiptMarch'

## <br><br>Common use #2 that you shouldn't do

### <br>Using `map()` and lambda to do something to every item in a list

`map()` is a function that will do the same thing to every item in group.

In [13]:
lunchSides = ["POTATO CHIPS", "mac 'n cheese", "POTATO SALAD", "French Bread"]

In [14]:
cleaned_sides = map(lambda word: word.lower().capitalize(), lunchSides)
print(cleaned_sides)

<map object at 0x105624f98>


<br><br>The `map()` function creates a map object. If we want it to create a useable list, we need to pass all that code to the `list()` function:

In [15]:
cleaned_sides = list(map(lambda word: word.lower().capitalize(), lunchSides))
print(cleaned_sides)

['Potato chips', "Mac 'n cheese", 'Potato salad', 'French bread']


#### <br><br>Use a list comprehension instead:

In [16]:
cleaned_sides = [word.lower().capitalize() for word in lunchSides]
print(cleaned_sides)

['Potato chips', "Mac 'n cheese", 'Potato salad', 'French bread']


### <br><br>Using `filter()` and lambda to filter a list

In [17]:
lunchSides = ["POTATO CHIPS", "mac 'n cheese", "POTATO SALAD", "French Bread"]

In [18]:
potato_sides = list(filter(lambda word: "potato" in word.lower(), lunchSides))
print(potato_sides)

['POTATO CHIPS', 'POTATO SALAD']


#### <br><br>Use a list comprehension instead:

In [19]:
potato_sides = [word for word in lunchSides if "potato" in word.lower()]
print(potato_sides)

['POTATO CHIPS', 'POTATO SALAD']


#### <br>Plus, with a list comprehension, we can **filter** AND **map** in the same line of code:

In [20]:
potato_sides = [word.lower().capitalize() for word in lunchSides if "potato" in word.lower()]
print(potato_sides)

['Potato chips', 'Potato salad']


### <br>Exercise 3

In [21]:
numbers = [7, 3, 28, 47, 2, 49, 29]

In [22]:
squares = list(map(lambda num: num**2, numbers))
print(squares)

[49, 9, 784, 2209, 4, 2401, 841]


Rewrite the lambda above as a list comprehension called `squares2`. If you have not yet learned list comprehensions, take a look at the list comprehension/lambda pairs above and try to guess how you might restructure the lambda above as a list comprehension:

In [23]:
squares2 = [num**2 for num in numbers]

In [24]:
print(squares2)

[49, 9, 784, 2209, 4, 2401, 841]


## <br><br>Using lambdas with pandas Data Frames

- pandas Data Frames are Python's version of a data table (like an Excel spreadsheet).

- It's ok to use lambdas with a pandas Data Frame because there is no "dataframe comprehension", no "row comprehension", and no "column comprension".

- We're going to use a lamba paired with the `dataframe.apply()` function

<br>First, let's take a look at a simple pandas Data Frame, which we are going to load as the variable `df`, a commonly used variable name.

In [25]:
import pandas as pd

In [26]:
df = pd.read_csv("plots.csv")
df

Unnamed: 0,address,length,width
0,908 Washington,100,400
1,3987 Baker,50,350
2,7400 Hadley,150,375
3,89304 N. Marcello,200,550
4,782 Main,50,200
5,7391 Pontiac,60,225
6,927 Grand,125,325
7,8450 Oakhill,150,400
8,400 S. Highland,110,425


One common task is to add a new column that involves data found in other columns. When you want to do this with a simple calculation, you can do:

In [27]:
df["acreage"] = (df.length * df.width)/43560

In [28]:
df

Unnamed: 0,address,length,width,acreage
0,908 Washington,100,400,0.918274
1,3987 Baker,50,350,0.401745
2,7400 Hadley,150,375,1.291322
3,89304 N. Marcello,200,550,2.525253
4,782 Main,50,200,0.229568
5,7391 Pontiac,60,225,0.309917
6,927 Grand,125,325,0.932622
7,8450 Oakhill,150,400,1.37741
8,400 S. Highland,110,425,1.073232


<br>It will be easier to work with less digits, so let's round the acerage number to 3 places after the decimal point:

In [29]:
df["acreage"] = round((df.length * df.width)/43560, 3)

In [30]:
df

Unnamed: 0,address,length,width,acreage
0,908 Washington,100,400,0.918
1,3987 Baker,50,350,0.402
2,7400 Hadley,150,375,1.291
3,89304 N. Marcello,200,550,2.525
4,782 Main,50,200,0.23
5,7391 Pontiac,60,225,0.31
6,927 Grand,125,325,0.933
7,8450 Oakhill,150,400,1.377
8,400 S. Highland,110,425,1.073


<br><br>However, when you want to use a more complicated expression, you often get strange results if you try the same method. Here I'm taking the equation we used above and combining it in a string:

In [31]:
df["acreage"] = df.address + " is " + str(round((df.length * df.width)/43560, 3))

In [32]:
df

Unnamed: 0,address,length,width,acreage
0,908 Washington,100,400,908 Washington is 0 0.918\n1 0.402\n2 ...
1,3987 Baker,50,350,3987 Baker is 0 0.918\n1 0.402\n2 1.2...
2,7400 Hadley,150,375,7400 Hadley is 0 0.918\n1 0.402\n2 1....
3,89304 N. Marcello,200,550,89304 N. Marcello is 0 0.918\n1 0.402\n2...
4,782 Main,50,200,782 Main is 0 0.918\n1 0.402\n2 1.291...
5,7391 Pontiac,60,225,7391 Pontiac is 0 0.918\n1 0.402\n2 1...
6,927 Grand,125,325,927 Grand is 0 0.918\n1 0.402\n2 1.29...
7,8450 Oakhill,150,400,8450 Oakhill is 0 0.918\n1 0.402\n2 1...
8,400 S. Highland,110,425,400 S. Highland is 0 0.918\n1 0.402\n2 ...


<br><br>Don't these results look strange?

#### <br><br>Instead, we can use the function `df.apply()`. We pass it two arguments: 1. a lambda function, and 2. the axis we are looping through - here we are looping through rows, so we pass `axis=1`.

In [33]:
df["acreage"] = df.apply(lambda row: row.address + 
                         " is " + 
                         str(round((row.length * row.width)/43560, 3)), axis=1)

In [34]:
df

Unnamed: 0,address,length,width,acreage
0,908 Washington,100,400,908 Washington is 0.918
1,3987 Baker,50,350,3987 Baker is 0.402
2,7400 Hadley,150,375,7400 Hadley is 1.291
3,89304 N. Marcello,200,550,89304 N. Marcello is 2.525
4,782 Main,50,200,782 Main is 0.23
5,7391 Pontiac,60,225,7391 Pontiac is 0.31
6,927 Grand,125,325,927 Grand is 0.933
7,8450 Oakhill,150,400,8450 Oakhill is 1.377
8,400 S. Highland,110,425,400 S. Highland is 1.073


#### <br><br>It works, but the lambda function is long and not easy to read. Alternatively, rather than include all the computation in the lambda function, we can define a new function, and then use the lambda to call the function:

In [35]:
def create_acreage_statement(row):
    acres = round((row.length * row.width)/43560, 3)
    return row.address + " is " + str(acres)

In [36]:
df["acreage"] = df.apply(lambda row: create_acreage_statement(row), axis=1)

In [37]:
df

Unnamed: 0,address,length,width,acreage
0,908 Washington,100,400,908 Washington is 0.918
1,3987 Baker,50,350,3987 Baker is 0.402
2,7400 Hadley,150,375,7400 Hadley is 1.291
3,89304 N. Marcello,200,550,89304 N. Marcello is 2.525
4,782 Main,50,200,782 Main is 0.23
5,7391 Pontiac,60,225,7391 Pontiac is 0.31
6,927 Grand,125,325,927 Grand is 0.933
7,8450 Oakhill,150,400,8450 Oakhill is 1.377
8,400 S. Highland,110,425,400 S. Highland is 1.073


<br><br>As a note, the `row` in our lambda, is just naming a variable. `axis=1` tells the `apply()` function that we are looping through rows. Here's the same code, but using a different variable name:

In [38]:
df["acreage"] = df.apply(lambda x: create_acreage_statement(x), axis=1)
df

Unnamed: 0,address,length,width,acreage
0,908 Washington,100,400,908 Washington is 0.918
1,3987 Baker,50,350,3987 Baker is 0.402
2,7400 Hadley,150,375,7400 Hadley is 1.291
3,89304 N. Marcello,200,550,89304 N. Marcello is 2.525
4,782 Main,50,200,782 Main is 0.23
5,7391 Pontiac,60,225,7391 Pontiac is 0.31
6,927 Grand,125,325,927 Grand is 0.933
7,8450 Oakhill,150,400,8450 Oakhill is 1.377
8,400 S. Highland,110,425,400 S. Highland is 1.073


### <br>Exercise 4

In [39]:
def calculate_size_class(row):
    acres = (row.length * row.width)/43560
    if acres > 2:
        size = "large"
    elif acres > .75:
        size = "medium"
    else:
        size = "small"
    return size

Add a column to the `df` Data Frame called `sizeClass`. Use the `apply` function, a lambda function, and the `calculate_size_class` function above.

In [40]:
df["sizeClass"] = df.apply(lambda row: calculate_size_class(row), axis=1)

In [41]:
df

Unnamed: 0,address,length,width,acreage,sizeClass
0,908 Washington,100,400,908 Washington is 0.918,medium
1,3987 Baker,50,350,3987 Baker is 0.402,small
2,7400 Hadley,150,375,7400 Hadley is 1.291,medium
3,89304 N. Marcello,200,550,89304 N. Marcello is 2.525,large
4,782 Main,50,200,782 Main is 0.23,small
5,7391 Pontiac,60,225,7391 Pontiac is 0.31,small
6,927 Grand,125,325,927 Grand is 0.933,medium
7,8450 Oakhill,150,400,8450 Oakhill is 1.377,medium
8,400 S. Highland,110,425,400 S. Highland is 1.073,medium


## <br><br>Homework: Filtering a Data Frame on a complex condition:

First I will reload the data:

In [42]:
df = pd.read_csv("plots.csv")

In [43]:
df

Unnamed: 0,address,length,width
0,908 Washington,100,400
1,3987 Baker,50,350
2,7400 Hadley,150,375
3,89304 N. Marcello,200,550
4,782 Main,50,200
5,7391 Pontiac,60,225
6,927 Grand,125,325
7,8450 Oakhill,150,400
8,400 S. Highland,110,425


We can create a new Data Frame that only includes some rows like this:

In [44]:
new_df = df[df.length > 100]

In [45]:
new_df

Unnamed: 0,address,length,width
2,7400 Hadley,150,375
3,89304 N. Marcello,200,550
6,927 Grand,125,325
7,8450 Oakhill,150,400
8,400 S. Highland,110,425


#### <br><br> pandas does not work with conditionals for filtering Data Frames in the same way we are used to working with other objects in Python.

Instead of "and", "or", and "not", we use the `&` (and), `|` (or), and `~` (not) operators to combine conditional statements in pandas:

In [46]:
new_df = df[(df.length > 100) & (df.width > 400)]

In [47]:
new_df

Unnamed: 0,address,length,width
3,89304 N. Marcello,200,550
8,400 S. Highland,110,425


<br><br>Other conditionals in pandas also don't work in the same way they work in basic Python:

In [48]:
new_df = df["N" in df.address]

KeyError: False

<br><br>There are often specific functions in pandas to deal with conditionals:

In [49]:
new_df = df[df.address.str.contains("N")]
new_df

Unnamed: 0,address,length,width
3,89304 N. Marcello,200,550


<br>However, these are often difficult to remember.

Instead, you can always use a lambda. To create a new Data Frame with a conditional, use `df.apply()` with a lambda. Pass a conditional statement to the lambda (`"N" in row.address`) and then include the whole thing in `df[]`.

In [50]:
new_df = df[df.apply(lambda row: "N" in row.address, axis=1)]

In [51]:
new_df

Unnamed: 0,address,length,width
3,89304 N. Marcello,200,550


In [52]:
new_df = df[df.apply(lambda row: row.width > 400 and row.length > 100, axis=1)]

In [53]:
new_df

Unnamed: 0,address,length,width
3,89304 N. Marcello,200,550
8,400 S. Highland,110,425


### <br><br>Exercise 5

In [54]:
def acreage(row):
    return (row.length * row.width)/43560

Use `df.apply()`, a lambda, and the `acreage` function above. Create a new Data Frame called `large_lots` that includes the rows from `df` that contain lots that are larger than 1 acre.

In [55]:
large_lots = df[df.apply(lambda row: acreage(row) > 1, axis=1)]

In [56]:
large_lots

Unnamed: 0,address,length,width
2,7400 Hadley,150,375
3,89304 N. Marcello,200,550
7,8450 Oakhill,150,400
8,400 S. Highland,110,425
