# Lambda functions and apply

Lambda functions are something that data scientists use all all the time, often without regard for whether they are the most appropriate tool or not. It's important to understand how they work, not only because it will make other data scientists think you're cool, but also because without that understanding, most data science projects become an impenetrable maze of confusing code.

This notebook aims to explain, in simple-ish language with examples, exactly what lambda functions are and how to use them.

### Imports

Only one import is required for this notebook.

In [1]:
import pandas as pd

### Data sourcing

The following cells create a dummy dataframe consisting of entirely meaningless data. It's a lot easier to understand the principles using a minimal dataset, but the technique can be applied to larger, real-world data in exactly the same way.

In [2]:
data = pd.DataFrame(columns=["A", "B", "C", "D", "E"],
                    data=[["Owl", 23, "Green", True, [1, 2, 3, 4]],["Heron", 0, "Green", True, [0, 0]],
                          ["Kestrel", 8, "Red", False, [1, 1, 1]],["Stork", -7, "Yellow", False, [4, 5, 6, 1, 1]],
                          ["Puffin", 99, "Blue", False, [8, 0, 8, 0]]])

In [3]:
data.head()

Unnamed: 0,A,B,C,D,E
0,Owl,23,Green,True,"[1, 2, 3, 4]"
1,Heron,0,Green,True,"[0, 0]"
2,Kestrel,8,Red,False,"[1, 1, 1]"
3,Stork,-7,Yellow,False,"[4, 5, 6, 1, 1]"
4,Puffin,99,Blue,False,"[8, 0, 8, 0]"


### Apply

In data science, lambda functions are most commonly found nesting inside `.apply()` calls, and it's easy to assume that either the .apply() is never necessary, or that it always is. Before discussing lambdas themselves, it's worth discussing exactly what `.apply()` does.

`.apply()` is a Pandas method that allows you to *apply* the same function to every cell of a column, rather than just applying the function to the column itself.

In the cell below, calling the `len()` function with a column as an argument returns the length of the column itself.

In [4]:
len(data["A"])

5

If, however, you are more interested in discovering the length of every string in that column, you can use `.apply()` on the column and pass in `len()` as an argument. This returns the length of each separate string.

In [5]:
data["A"].apply(len)

0    3
1    5
2    7
3    5
4    6
Name: A, dtype: int64

It's absolutely possible to get the same information by using a for loop, as shown in the cell below, but the code is messier and easier to get confused by. It's both ugly and inefficient code; no one wants to read or debug it.

In [6]:
for i in range(len(data["A"])):
    print(len(data.iloc[i,0]))

3
5
7
5
6


`.apply()` is the neatest and easiest way to pass every value in a column through the same function. Any function can be used with `.apply(`), as long as it is capable of handling a Pandas `Series` object.

### Functions & Lambda functions

Functions (standard ones, not lambda) are a **named**, **repeatable** block of code. Once you've written a function, you can call it as often as you like just by using its name.

The function `add_two()` takes one value as an argument and returns that value plus 2. `add_two(4)` would return 6, for example.

In [7]:
def add_two(x):
    return x + 2

Lambda functions are essentially the same as standard functions, but with two key distinctions.

1. lambda functions don't have names.Once you've set them loose, they're gone and cannot be re-used without writing them again.


2. lambda functions have a more spartan syntax: though they work in the same way, taking arguments and returning values, they are generally shorter and simpler. 

The lambda function below does exactly the same thing as `add_two()` above. Because it's a lambda function, it uses the keyword `lambda` instead of `def`, and does not have a name. It doesn't need brackets around the arguments, and Python assumes that you want to return the resultant value - there's no need to write `return`.

In [8]:
lambda x: x + 2

<function __main__.<lambda>(x)>

The `x` in the lambda above is just an argument - it can have any value you want. `x` is often the conventional choice, but it is both acceptable and much better to have meaningful variable names. If you know your lambda function is only going to be passed prices, then `lambda price: price + 2` is immediately understandable.

### Apply and Lambda functions

As already mentioned, one of the major uses of lambda functions is within `.apply()` calls. 

In the example below, the `.apply()` function takes every value in the "B" column of the data frame, and passes it as an argument to the lambda function within.

The result is a column of numbers, each one 2 higher than the initial "B" column.

In [9]:
data["B"].apply(lambda x: x + 2)

0     25
1      2
2     10
3     -5
4    101
Name: B, dtype: int64

For every row in the column, `.apply()` calls the lambda function, passing in the value - first 23, then 0, and so on. Each value takes the place of `x` in turn.

Lambdas are not limited to manipulating numbers: they can do everything that standard functions do. In the example below, the values in column "C" are each concatenated with the string "brownish".

In [10]:
data["C"].apply(lambda x: "brownish-" + x)

0     brownish-Green
1     brownish-Green
2       brownish-Red
3    brownish-Yellow
4      brownish-Blue
Name: C, dtype: object

One of the most common uses of lambdas and `.apply()` is to create a new column based on transforming an existing one. This is shown in the example below.

In [11]:
data["F"] = data["C"].apply(lambda x: "brownish-" + x)

In [12]:
data.head()

Unnamed: 0,A,B,C,D,E,F
0,Owl,23,Green,True,"[1, 2, 3, 4]",brownish-Green
1,Heron,0,Green,True,"[0, 0]",brownish-Green
2,Kestrel,8,Red,False,"[1, 1, 1]",brownish-Red
3,Stork,-7,Yellow,False,"[4, 5, 6, 1, 1]",brownish-Yellow
4,Puffin,99,Blue,False,"[8, 0, 8, 0]",brownish-Blue


### Apply for whole rows

Occasionally, you may wish to apply a function to every value of a cell in a column, but also want to use a complimentary value from a different column at the same time. This is not possible using lambdas on a single column as we have done above.

Luckily, we can use `.apply()` on whole dataframes, not just individual columns. When we do this, and pass the argument `axis=1` (to ensure that `.apply()` works down the dataframe, not across), then `.apply()` passes each row to the function, one by one. You can then refer to a particular row's values by column name.

The example below shows the creation of a new column - "G" - based on the concatenation of "C" and "A"

In [13]:
data["G"] = data.apply(lambda row: row["C"] + " " + row["A"], axis=1)

In [14]:
data.head()

Unnamed: 0,A,B,C,D,E,F,G
0,Owl,23,Green,True,"[1, 2, 3, 4]",brownish-Green,Green Owl
1,Heron,0,Green,True,"[0, 0]",brownish-Green,Green Heron
2,Kestrel,8,Red,False,"[1, 1, 1]",brownish-Red,Red Kestrel
3,Stork,-7,Yellow,False,"[4, 5, 6, 1, 1]",brownish-Yellow,Yellow Stork
4,Puffin,99,Blue,False,"[8, 0, 8, 0]",brownish-Blue,Blue Puffin


This technique will be quite slow on large dataframes, but it allows you to quickly perform complex transformations involving multiple columns on an entire dataframe.

### Possibilities

It is, again, important to emphasise that lambdas can do anything standard functions can, which essentially means anything that you want. `.apply()` on its own is really useful for situations where a pre-existing function needs to be applied to every value; combined with lambda functions, anything you can express as a function can be used in the same way. 

To demonstrate this, the below example is not only more meaningless than the previous meaningless examples, but also unnecessarily complex to demonstrate that arbitrary complexity is possible.

In [15]:
# Creates a new column in the dataframe based on converting the value in column "B" to a str, 
# and multiplying it by the length of the value in "E", modulo 4.

data["H"] = data.apply(lambda row: str(row["B"]) * (len(row["E"]) % 4), axis=1)

In [16]:
data.head()

Unnamed: 0,A,B,C,D,E,F,G,H
0,Owl,23,Green,True,"[1, 2, 3, 4]",brownish-Green,Green Owl,
1,Heron,0,Green,True,"[0, 0]",brownish-Green,Green Heron,0.0
2,Kestrel,8,Red,False,"[1, 1, 1]",brownish-Red,Red Kestrel,888.0
3,Stork,-7,Yellow,False,"[4, 5, 6, 1, 1]",brownish-Yellow,Yellow Stork,-7.0
4,Puffin,99,Blue,False,"[8, 0, 8, 0]",brownish-Blue,Blue Puffin,
