![](assets/image.png)


My name is Alix Tiran-Cappello, I work as a data scientist / MLOPS pipeline maintainer for Renault Group for my day job.

But today I am here on my own, and what I will say today is not associated in any way with my company.

The reason I am here is to present to you a small open source package that I created: `pelage` .

## My goal:

- Import best practices of software engineering around testing
- They have demonstrated usefulness to maintain velocity and imbed quality from the start
- Help you write better data science code with polars and TDD.


# What is TDD?

![alt text](assets/tdd-schema.png){width=500}


We will see it countless times during this talk.

Before we can start with TDD, I'd like to guide you through a brief journey:

1. Take some pandas code and make it better using refactoring tests.
1. Translate it to polars, to get used to polars syntax and specificities.
1. use pelage to write a meaningful data test.

During this time we will see tests fail, then write some simple code, and make them pass.


# Step 1: From Pandas To Polars

![kent-hat](assets/kent-beck-hat.png){width=150}


In [None]:
%load_ext autoreload
%autoreload 2

import pandas as pd
import polars as pl

from load_data import data_loader

emissions_pandas = (
    data_loader().filter(pl.col.electric_range_km.is_not_null()).collect().to_pandas()
)

In [None]:
%load_ext notifier

In [None]:
df = emissions_pandas
df = df.dropna(subset=["fuel_consumption"])
df.drop(["obfcm_data_source", "registered_category"], axis=1, inplace=True)

df[df["fuel_type"] == "PETROL"]["fuel_type"] = "petrol"
df["fuel_consumption_per_100km"] = df["fuel_consumption"] * 100

grouped = []
for manufacturer in df["manufacturer_name"].unique():
    manuf_df = df[df["manufacturer_name"] == manufacturer]
    for year in df["year"].unique():
        subset_df = manuf_df[(manuf_df["year"] == year)]
        result = {
            "manufacturer_name": manufacturer,
            "year": year,
            "mean_fuel_consumption": subset_df["fuel_consumption_per_100km"].mean(),
            "mean_electric_range": subset_df["electric_range_km"].mean(),
            "vehicle_count": subset_df["vehicle_id"].nunique(),
        }
        grouped.append(result)

grouped = pd.DataFrame(grouped)
grouped = grouped.dropna()
grouped = grouped.sort_values(["mean_fuel_consumption", "year"], ascending=False)
grouped = grouped[grouped["vehicle_count"] >= 100]
grouped = grouped.reset_index(drop=True)

## Refactoring


# But What About TDD?

## This is Kent Beck:

![alt text](assets/kent_beck.png){width=300}

- ##### Kent coined the term TDD
- ##### Kent created one of the first testing frameworks
- ##### Kent found that a testing framework should be written in the same language as the code
- ##### Kent say that TDD reduces developer anxiety (Better than Xanax!)
- ##### Kent has usually good ideas about software development


## Enters `Pelage`

##### A testing framework to be used with polars to express data science tests clearly and easily.

##### Pass a dataframe to a testing function:

- ##### If it fails you get a nice descriptive error message!
- ##### If it passes, you get your dataframe back!


## Step 2: Now Let's Do Some Real Tdd In Polars!


In [None]:
emissions_for_tdd = (
    data_loader().filter(pl.col.electric_range_km.is_not_null()).collect()
)

## TDD With Polars and Pelage


In [None]:
import pelage as ...

primary_key_columns = [
    "vehicle_id",
    "reporting_period",
    "obfcm_data_source",
    "used_in_calculation",
]
(
    emissions_for_tdd
)

## TDD With Polars And Pelage (part 2)


# Conclusion

#### Method Chaining: From horrible slow-performing pandas to good, easy-to-read pandas code.

#### Easy Translation to polars: A good equivalence test + `.pipe(pl.DataFrame)` + you shift this up!

#### With TDD in data science, we rapidly get larger insights from just a few data-tests as we build up our analysis:

- **Being able to express tests in the same manner as your code is a critical part of the TDD workflow.**

- **This is already done for your in pelage, the core logic leverages polars**

- **The result: a simple, easy to use package, to write code of great quality that is required for production contexts.**

#### All you have to do is:

- **❌ Write a failing test!**

- **✅ Write some code to make it pass!**

- **🔄 Refactor to make it better!**


# And Now?

- ### Type `uv add pelage` ( ~~pip install pelage~~, it works but `uv` is better )
- ### Got to the website: https://alixtc.github.io/pelage/
- ### QR Code for the Presentation
  ![](assets/qr_presentation_link.png){width=200}
- ### QR Code for my LinkedIn
  ![](assets/qr_linkedin_profile.png){width=200}
