# 🎄 Python Advent Calendar 🎄
## [Day 2: Pandas Flavor Chains](https://py-advent-calendar.beehiiv.com/p/day-225-pandas-flavor-chains)
_🔗 [Read the newsletter here](https://py-advent-calendar.beehiiv.com/p/day-225-pandas-flavor-chains)_

*Every day until Christmas we’ll open a new door to Python secrets. Behind today’s door you will* ***discover a whole new way to use pandas*** *by combining the principles of method chaining with the power of* ***pandas_flavor****. Who doesn’t love a* **[good chain](https://link.mail.beehiiv.com/ss/c/Fu4EPqIapGgImIgza8sdTSKgpWt-hQw92q171wRa0oTptuPSFhgV4q4g0YVmER7BSFYGlB27L44m2ixF8y8aEut62_hdqWSwVqAq1RwmIw3Qz3ppbmWxFKQgf_kwA1FbPK8ahAgHaYXrp19BU0X-sZ6Nu7C3Qh4KDA921QM8ls62rXXsAStV0iWTBJqQUjmBopDrtzULd8qLZ9WqfTAosQ/41s/eX7tzooLQfKLVhBO3tBJeQ/h5/eQFEwm-nXKrsFOt7sU-qSFKB-5x4qtafOSmQLMypMY0)** **¹***?*

<img src="./ac-2-door.png" width=400>

---

## 1️⃣  Modern Pandas: An idiomatic guide by a pandas maintainer
_📆 Written: April 2016_

One of our top recommended teaching resources at Coefficient is Tom Augspurger’s **[blog post on method chaining in pandas](https://link.mail.beehiiv.com/ss/c/AJNVrU_uqSEvE7VehET4E0utaP9oG-x7MFzgOFbN4I4t_oyB8jPq94HB72ggL_gi9ZpEtkTEi2L4dQ4h9An1Xjwjh3QFYmnERFzybnIKbw73CHxBrbWOsUHmaIR5c3N2chSX38DCz_j2cTzMO9uGlJoSESCmoPduMe5G8PiyqP-X9o8JVP-ujyAi1b1nGg4Moe--WNNJTyh62G3NnVTWmewSnUDDg0UWTlkUlOVGZC8/41s/eX7tzooLQfKLVhBO3tBJeQ/h7/mgJdkqqpxN3Hhv7LSrTfJ_xMLiEX0Niaqw2bMzJGBCk)**. Adapting a concept Tom **[borrowed from Jeff Allen](https://link.mail.beehiiv.com/ss/c/ZfHgEd0ozJFXp5KhRVlI31bZTWgxU5T7h5S8yVdS7zz2rBFc4MAYgmpqQk9e_gk_eFfD3fpFpfQktK3cVFKJdYUUKBDR5NEcOh2HgGmGsIfchDsQwX34Y7hcmpAoslXsmqFOjLJC04bnzdmydm1lNgZxOFf30Fd4Ont9Nx-7pDOlraRhDR1o9_JdJ90tUv729p5BqLNv_gxHPTNvbcSDLf1QXuIAb1-M13OGNDMAAmM/41s/eX7tzooLQfKLVhBO3tBJeQ/h8/yOvWwM9njumTf0JkGhYcVeYiq7uf5pGLi0xJbz7I8ho)** of **[dplyr](https://link.mail.beehiiv.com/ss/c/QryPmEUXYzWv8ArfnkS7uHG3mNWdqliTZvTMJOZoQIrqYm0q7Jqezqx-dukU-XMheAFR2F9Jh0hCoAaFLYMuhUC31MSHY133OTcIRdol7H6eptpGKLFiGK0Lp-LMfPcVvgn6QcZ33TXzfTj7zp_BrVXLANpGnBTXVIFHuo2DRHCsEV1tlY4wvYOI8hr_WIkT/41s/eX7tzooLQfKLVhBO3tBJeQ/h9/CjV-zaEzc2i_TI7OdoU7g8w2Bs0g_C8ZMEGKKV_LZjE)**, here’s a story we might tell in Python:

```python
come_to(
    find_out(
	check(make(santa_claus, "list"), n=2),
    	"naughty_or_nice"
    ),
    "town"
)
```

and here’s how we can use the “pipe” operator in R, which feeds the thing on the left into the first argument of the function on the right:

```R
santa_claus %>%
    make("list") %>%
    check(n=2) %>%
    find_out("naughty_or_nice") %>%
    come_to("town")
```

Hopefully you’re not writing Python “inside out” like the first example here, but it is very common (especially for pandas users) to create an entire variable name just for a temporary transformation step. I call this the “hype man” pandas style:

```python
made_list = make(santa_claus, "list")
checked_twice = check(made_list, n=2)
found_out = find_out(checked_twice, "naughty_or_nice")
santas_in_town = come_to(found_out, "town")
```

Unless you *really* like inventing variable names, this style of code can be avoided using method chains to create cleaner, more readable, and in some cases more performant code:

```python
santa_claus = pd.DataFrame()
(
    santa_claus.pipe(make, "list")
    .pipe(check, n=2)
    .pipe(find_out, "naughty_or_nice")
    .pipe(come_to, "town")
)
```

There are some good arguments for “hype man” style, for example when first writing the pipeline, testing the outputs of each stage, debugging or performance optimisation, but in the long-term our code should be “***written for people to read, and only incidentally for machines to execute***” (Harold Abelson, *Structure and Interpretation of Computer Programs)*.

It *would* be nice, however, if we didn’t need those .pipe() arguments. Here’s what Tom said about this back in 2016:

> *Monkeypatching on your own methods is fragile. It’s not easy to correctly subclass pandas’ DataFrame to extend it with your own methods. Composition, where you create a class that holds onto a DataFrame internally, may be fine for your own code, but it won’t interact well with the rest of the ecosystem so your code will be littered with lines extracting and repacking the underlying DataFrame.* 
> 
> *— Tom Augspurger,* [Modern Pandas \(Part 2\): Method Chaining](https://link.mail.beehiiv.com/ss/c/AJNVrU_uqSEvE7VehET4E0utaP9oG-x7MFzgOFbN4I4t_oyB8jPq94HB72ggL_gi9ZpEtkTEi2L4dQ4h9An1Xjwjh3QFYmnERFzybnIKbw73CHxBrbWOsUHmaIR5c3N2chSX38DCz_j2cTzMO9uGlJoSESCmoPduMe5G8PiyqP-X9o8JVP-ujyAi1b1nGg4Moe--WNNJTyh62G3NnVTWmewSnUDDg0UWTlkUlOVGZC8/41s/eX7tzooLQfKLVhBO3tBJeQ/h10/HI5Ad9tGgCbElzMlRXqmb_Yy2zStZvT67vEh2B4NyNY)

With this concept in place, let us introduce today’s package… 🥁🥁🥁

---

## 2️⃣  [pandas_flavor](http://pypi.org/project/pandas_flavor/): DIY custom DataFrame methods

- 📆 Last updated: July 2023
- ⬇️  Downloads: 53,651/week
- ⚖️  License: MIT
- 🐍 **[PyPI](https://link.mail.beehiiv.com/ss/c/Fu4EPqIapGgImIgza8sdTQvYhsB2waz_k6h-ZREnI3FCRd-8LH3wiQ5Bs4iH3BlEkD7YaZLPqW9LHPGkSSu_GmchoXYz8xATZsz-H7hkIU4GbJ1tj3cN0GLc27XENkpEF1X5BbcWOwNx-cwBQ4OoZQpS7Xot53USMnUPMOPiU376xSXY7MatyYrDUet8FlWvYzyTRgHuB1j_08HEwkFbbw/41s/eX7tzooLQfKLVhBO3tBJeQ/h13/NSjZ9CAsWAHrJZK8Y3BymCPWp6_n5AT7ofDa6hFPz7c)** |  ⭐ **[GitHub Stars: 288](https://link.mail.beehiiv.com/ss/c/yARi27edgic-gkALGf0FjbSX6TBbWsW4ousugOm2LmA6zByEbZn5o1frcSZ81o0KTvWpN21zV6lwKsN-c7fpL5EkLg90pfyA8lnymmzidEHpgpEysKS80YVchboMU6rcmg8uScbgDPkKkAt6XJtD0Ln8bV5TIB1GSxVKaIuUE88b_K7pGf_TCjK5MAUQE5qL7Tpi4bJNsplH8XIw3tS0nDD0o_6oyqs2Ff2q3XOcOTk/41s/eX7tzooLQfKLVhBO3tBJeQ/h14/T4ueBftVAFmASmydEiOnOocHKTbGgbvCvYjRhQc9m1o)**

### 🔍 What is it?
A simpler API was added in pandas 0.23 for registering methods (and accessors) to DataFrames and Series. This library makes it easy to add your own custom functionality to any DataFrame, or even to share a custom DataFrame analytics suite within your team by registering your analytics class under a single namespaced accessor.

### 📦 Install
`pip install pandas-flavor`

### 🛠️ Use
Because it’s Christmas (in 23 days!), let’s make a working version of our Santa Claus example. As always, you can find the full notebook in the **[GitHub repo](https://link.mail.beehiiv.com/ss/c/yARi27edgic-gkALGf0FjXY-wcTEbkNSVzDLCI7TM8YGIJvZa3uDz0zvBGmvWTpK0l_bYmVsG3j80KfbDrUewcTkhW-GPOHqSrriKu3wG2duY-4JlvTj4dEFVzJt0dmBRKQfQa_yuTJWRFgaKksBNsxegLUQM-zc92SqvUTG-pILT8vVG5j0Wv2CsxQbPdqyV9RdwxirGTRJfsUAhel2UkMHf_lOXIFL6-fojENVKLg/41s/eX7tzooLQfKLVhBO3tBJeQ/h15/U4IXcpMJ2SEkuM-pejHeIj3MMx1F9h8ctTVqk7Tq14A)** for this advent calendar. Note that we’re using the wonderful **[Faker](https://link.mail.beehiiv.com/ss/c/8qwR6WebooK0eqCcIi3gruVrhASEkmRTZnoCe8ngT2-qYcN4-SsVTohY8fTwpD5YFRaY159LVncanhtqMqJq5votRoLnIGsfGUWGH1YtPeIhtqWzDN4RfYvJbEk_-8E8fWf2s9e7s1M4uy2bi4v8Pfhw3GWrICXyH8bNO7BEK_7XEwzLihnOuRiNOOg1MSYtvAVSFGIFU8CrhMM8Ey4qaQ/41s/eX7tzooLQfKLVhBO3tBJeQ/h16/RD-AbXgLT2nfTCcjlGOGhrQANCpAVVe56deDhrnYqQg)** package to generate some random names.

In [1]:
# Imports & setup
import pandas as pd
from faker import Faker

fake = Faker()


# A normal Python function, for now...


def make(df: pd.DataFrame, item: str, n=4) -> pd.DataFrame:
    """Add a column called `item` to a pandas DataFrame, with n rows."""
    return df.assign(item=[fake.name() for i in range(n)])

We can call this function on an empty DataFrame…

In [2]:
# Pass the santa_claus DataFrame directly into the function
santa_claus = pd.DataFrame()
make(santa_claus, item="names", n=5)

Unnamed: 0,item
0,Joshua Porter
1,Charles Rich
2,Amanda Liu
3,Meghan Ware
4,Brenda Travis


…or we can do this using the .pipe() method:

In [3]:
# Pipe the santa_claus DataFrame into the function using .pipe()
santa_claus.pipe(make, item="list")

Unnamed: 0,item
0,Kelly Browning
1,Sue Odom
2,Jennifer Brown
3,Renee Evans


Time for some 🎩 **pandas_flavor magic** 🪄, let’s add a custom pandas method called **.make()** by adding a single decorator to our function:

In [4]:
import pandas_flavor as pf


@pf.register_dataframe_method
def make(df, item, n=4):
    return df.assign(item=[fake.name() for i in range(n)])

In [5]:
# Use pandas_flavor to define a new .make() method on all DataFrames
santa_claus.make(item="list", n=2)

Unnamed: 0,item
0,Michael Harrison
1,Steven Martin


#### Write functions for `make()`, `check()`, `find_out()` and `come_to()`

In [6]:
import time
import random
from tqdm import tqdm

In [7]:
@pf.register_dataframe_method
def make(df: pd.DataFrame, item: str, n: int = 4) -> pd.DataFrame:
    df = df.copy()
    if item == "list":
        df[item] = [fake.name() for i in range(n)]
    else:
        df[item] = [f"{item}{i}" for i in range(n)]
    return df

  @pf.register_dataframe_method


In [8]:
santa_claus = pd.DataFrame()

In [9]:
# Our new make() function generates names if you specify "list"...
make(santa_claus, "list")

Unnamed: 0,list
0,Taylor Rubio
1,Robert Jones
2,Tiffany Nelson
3,Heather Wade


In [10]:
# ...or it generates placeholders otherwise
make(santa_claus, "sandwich")

Unnamed: 0,sandwich
0,sandwich0
1,sandwich1
2,sandwich2
3,sandwich3


In [11]:
@pf.register_dataframe_method
def check(df: int, n: int = 1) -> pd.DataFrame:
    for _ in range(n):
        for _, row in tqdm(df.iterrows(), total=len(df)):
            time.sleep(0.2)
    return df

In [12]:
# The check() function runs through each row with theatrical rigour
check(make(santa_claus, "list"), n=2)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.83it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.85it/s]


Unnamed: 0,list
0,Matthew Velazquez
1,Faith Reeves
2,David Arnold
3,Rodney Sanders


In [13]:
@pf.register_dataframe_method
def find_out(df, status):
    options = status.split("_or_")
    df["status"] = [random.choice(options) for _ in range(len(df))]
    return df

In [14]:
# The find_out() function randomly allocates a status to each row based on the provided string
find_out(check(make(santa_claus, "list"), n=2), status="naughty_or_nice")

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.86it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.84it/s]


Unnamed: 0,list,status
0,Angela Smith,naughty
1,Nicholas Baxter,nice
2,Robert Cooper,naughty
3,Nicole Robinson,nice


In [15]:
@pf.register_dataframe_method
def come_to(df, place):
    df["destination"] = place
    return df

In [16]:
# The come_to() function adds a "destination" column containing the specified value
come_to(find_out(check(make(santa_claus, "list"), n=2), status="naughty_or_nice"), "town")

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.84it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.75it/s]


Unnamed: 0,list,status,destination
0,Anna Edwards,nice,town
1,James Lewis,nice,town
2,Stephanie Garcia,nice,town
3,Susan Mccoy,naughty,town


#### Let's put it all together to unlock the sweet taste of 🐼 idiomatic pandas 🐼 

In [17]:
# Here's the SPAGHETTI code version
santa_claus = pd.DataFrame()
come_to(
    find_out(
        check(
            make(santa_claus, "list"), n=2),
            "naughty_or_nice"
    ),
    "town"
)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.83it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.91it/s]


Unnamed: 0,list,status,destination
0,April Hamilton,naughty,town
1,Jeremy Morrison,naughty,town
2,Ashley Sanchez,naughty,town
3,Carl Stanley,nice,town


In [18]:
# Here's the HYPE-man pandas version
santa_claus = pd.DataFrame()
made_list = make(santa_claus, "list")
checked_list = check(made_list, n=2)
found_out = find_out(checked_list, "naughty_or_nice")
arrived = come_to(found_out, "town")
arrived

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.85it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.84it/s]


Unnamed: 0,list,status,destination
0,Cheryl Roberts,naughty,town
1,Douglas Rogers,nice,town
2,Brandon Hatfield,naughty,town
3,Marie Jones,naughty,town


In [19]:
# Here's the .pipe() version
(
    pd.DataFrame()
    .pipe(make, "list")
    .pipe(check, n=2)
    .pipe(find_out, "naughty_or_nice")
    .pipe(come_to, "town")
)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.78it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.85it/s]


Unnamed: 0,list,status,destination
0,Kevin Newman,nice,town
1,Jeffery Moore,nice,town
2,Jennifer Vargas,nice,town
3,Joel Arias,nice,town


In [20]:
# Finally, here's the pandas_flavor version
santa_claus = pd.DataFrame()
(
    santa_claus
    .make("list")
    .check(n=2)
    .find_out("naughty_or_nice")
    .come_to("town")
)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.87it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.82it/s]


Unnamed: 0,list,status,destination
0,Lindsey Smith,naughty,town
1,Hunter Robinson,naughty,town
2,Ronald Li,nice,town
3,Thomas Hoover,nice,town


In [21]:
# 🎅🏻 What else could Santa do with these reusable component methods? 🥪
santa_claus = pd.DataFrame()
(
    santa_claus
    .make("sandwich")
    .check(n=1)
    .find_out("ham_or_turkey_or_vegetarian")
    .come_to("PyCon")
)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  4.85it/s]


Unnamed: 0,sandwich,status,destination
0,sandwich0,ham,PyCon
1,sandwich1,vegetarian,PyCon
2,sandwich2,vegetarian,PyCon
3,sandwich3,turkey,PyCon


---

<div class="alert alert-block alert-success">
<h1>🎊🎄  Congratulations!  🐍🚀 </h1>

<p>
    👤 This notebook has been made by <a href="https://twitter.com/john_sandall">@John_Sandall</a> and the team at <a href="https://twitter.com/CoefficientData">@CoefficientData</a>. We run training workshops in Python, data science and data engineering.
</p><br/>

<p>
    🎓 If you are interested in registering for our <strong>paid workshops in Python for data science and engineering</strong>, you can <a href="https://coefficient.ai/learn-python">sign up to our workshops mailing list here</a>.
</p><br/>

<p>
    🎬 You can follow my <a href="https://github.com/pydatabristol/workshops/tree/master/workshop_2019_10_28_first_steps"><em>First Steps with Python</em></a> and <a href="https://github.com/pydatabristol/workshops/tree/master/workshop_2020_02_27_first_steps_with_pandas"><em>First Steps with pandas</em></a> workshops for free as part of <a href="https://www.meetup.com/PyData-Bristol/">PyData Bristol's</a> Zero To Hero workshop series. If you'd like to learn more <strong>Jupyter tips &amp; tricks</strong> you may be interested in my event with Ben Sparks from <a href="http://bit.ly/Numberphile_Sub">Numberphile</a> where we explored simulating viral outbreaks with <strong>SIR models</strong>, <strong>interactive Jupyter Widgets</strong> and <strong>animated matplotlib charts</strong> in <a href="https://www.crowdcast.io/e/pydata1/register"><em>Building An Interactive Coronavirus Model In Jupyter w/ Ben Sparks</em></a>.
</p><br/>

<p>
    💼 I am the Founder of data science consultancy <a href="https://coefficient.ai/">Coefficient</a>. If you would like to work with us, our team can help you with your <a href="https://www.youtube.com/watch?v=qBvO3fyl1lk">data science</a>, <a href="https://coefficient.ai/#services-page">software engineering</a> and <a href="https://coefficient.ai/#machine-learning-page">machine learning</a> projects as an on-demand resource. We can also create <a href="https://coefficient.ai/#training-page">bespoke training workshops</a> adapted to your industry, virtual or in-person, with training clients currently including BNP Paribas, EY, the Met Police and the BBC.
</p>

</div>