# Basics - Apply, Map and Vectorised Functions

In [1]:
import pandas as pd
import numpy as np

data = np.round(np.random.normal(size=(4, 3)), 2)
df = pd.DataFrame(data, columns=["A", "B", "C"])
df.head()

Unnamed: 0,A,B,C
0,0.43,-0.14,-1.14
1,-1.63,-0.58,0.39
2,-2.17,-0.3,1.06
3,0.4,-0.04,0.99


## Apply

Used to execute an arbitrary function again an entire dataframe, or a subection. Applies in a vectorised fashion.

In [2]:
df.apply(lambda x: 1 + np.abs(x))

Unnamed: 0,A,B,C
0,1.43,1.14,2.14
1,2.63,1.58,1.39
2,3.17,1.3,2.06
3,1.4,1.04,1.99


In [3]:
df.A.apply(np.abs)

0    0.43
1    1.63
2    2.17
3    0.40
Name: A, dtype: float64

In [4]:
def double_if_positive(x):
    x[x > 0] *= 2
    return x

df.apply(double_if_positive)

Unnamed: 0,A,B,C
0,0.86,-0.14,-1.14
1,-1.63,-0.58,0.78
2,-2.17,-0.3,2.12
3,0.8,-0.04,1.98


In [5]:
df

Unnamed: 0,A,B,C
0,0.86,-0.14,-1.14
1,-1.63,-0.58,0.78
2,-2.17,-0.3,2.12
3,0.8,-0.04,1.98


In [9]:
def double_if_positive(x):
    x = x.copy()
    x[x > 0] *= 2
    return x

df.apply(double_if_positive, raw=True)

Unnamed: 0,A,B,C
0,1.72,-0.14,-1.14
1,-1.63,-0.58,1.56
2,-2.17,-0.3,4.24
3,1.6,-0.04,3.96


## Map

Similar to apply, but operators on Series, and uses dictionary based inputs rather than an array of values.

In [10]:
series = pd.Series(["Steve", "Alex", "Jess", "Mark"])

In [11]:
series.map({"Steve": "Stephen"})

0    Stephen
1        NaN
2        NaN
3        NaN
dtype: object

In [12]:
series.map(lambda d: f"I am {d}")

0    I am Steve
1     I am Alex
2     I am Jess
3     I am Mark
dtype: object

## Vectorised functions

Pandas and numpy obviously have tons of these, here are some examples

In [13]:
display(df, df.abs())

Unnamed: 0,A,B,C
0,0.86,-0.14,-1.14
1,-1.63,-0.58,0.78
2,-2.17,-0.3,2.12
3,0.8,-0.04,1.98


Unnamed: 0,A,B,C
0,0.86,0.14,1.14
1,1.63,0.58,0.78
2,2.17,0.3,2.12
3,0.8,0.04,1.98


In [14]:
series = pd.Series(["Obi-Wan Kenobi", "Luke Skywalker", "Han Solo", "Leia Organa"])

In [15]:
"Luke Skywalker".split()

['Luke', 'Skywalker']

In [16]:
series.str.split(expand=True)

Unnamed: 0,0,1
0,Obi-Wan,Kenobi
1,Luke,Skywalker
2,Han,Solo
3,Leia,Organa


In [17]:
series.str.contains("Skywalker")

0    False
1     True
2    False
3    False
dtype: bool

In [18]:
series.str.upper().str.split()

0    [OBI-WAN, KENOBI]
1    [LUKE, SKYWALKER]
2          [HAN, SOLO]
3       [LEIA, ORGANA]
dtype: object

## User defined functions

Lets investigate a super simple example of trying to find the hypotenuse given x and y distances.


In [20]:
data2 = np.random.normal(10, 2, size=(100000, 2))
df2 = pd.DataFrame(data2, columns=["x", "y"])

In [21]:
hypot = (df2.x**2 + df2.y**2)**0.5
print(hypot[0])

11.754337428186107


In [22]:
def hypot1(x, y):
    return np.sqrt(x**2 + y**2)

h1 = []
for index, (x, y) in df2.iterrows():
    h1.append(hypot1(x, y))
print(h1[0])

11.754337428186107


In [23]:
def hypot2(row):
    return np.sqrt(row.x**2 + row.y**2)

h2 = df2.apply(hypot2, axis=1)
print(h2[0])

11.754337428186107


In [24]:
def hypot3(xs, ys):
    return np.sqrt(xs**2 + ys**2)
h3 = hypot3(df2.x, df2.y)
print(h3[0])

11.754337428186107


Vectorising everything you can is the key to speeding up your code. Once you've done that, you should use other tools to investigate. PyCharm Professional has a great optimisation tool built in. Jupyter has %lprun (line profiler) command you can find here: https://github.com/rkern/line_profiler

### Recap

* apply
* map
* .str & similar