# Don't do that, do this instead

Derek Nagel

## Introduction

In Python, there are often multiple ways to solve a problem. While there may be multiple ways which work, some ways are faster, simpler, and easier to understand than others. This presentation is about demonstrating alternatives to common practices in Python. All of the "don't do that" code chunks work as intended, but they are often slower and more complicated than necessary. The "do this" code chunks return the same results, but are often faster and easier to understand.

**Disclaimer: This presentation is not meant to criticize anyone's coding practices.** Like many people, I've done a lot of the "don't do this" things in the past. This presentation is meant to be informational.

## Strings

### Don't do that (concatenating strings)

In [19]:
name = 'Derek'
location = 'Savage, MN'
interests = ['cats', 'hiking', 'board games', 'Python']

greeting = "Hello I'm " + name + ", I live in " + location + ", and I like " + ', '.join(interests[:-1]) + ", and " + interests[-1] + "."

print(greeting)

Hello I'm Derek, I live in Savage, MN, and I like cats, hiking, board games, and Python.


### Do this (f-string)

In [20]:
name = 'Derek'
location = 'Savage, MN'
interests = ['cats', 'hiking', 'board games', 'Python']

greeting = f"Hello I'm {name}, I live in {location} and I like {', '.join(interests[:-1])}, and {interests[-1]}."

print(greeting)

Hello I'm Derek, I live in Savage, MN and I like cats, hiking, board games, and Python.


### Can also do this (format)

In [21]:
name = 'Derek'
location = 'Savage, MN'
interests = ['cats', 'hiking', 'board games', 'Python']

template = "Hello I'm {}, I live in {} and I like {}, and {}."

greeting = template.format(name, location, ', '.join(interests[:-1]), interests[-1])

print(greeting)

Hello I'm Derek, I live in Savage, MN and I like cats, hiking, board games, and Python.


## Mapping

### Don't do that (if-else chains)

In [22]:
def get_capital(state):
    if state == 'MN':
        capital = 'St. Paul'
    elif state == 'WI':
        capital = 'Madison'
    elif state == 'IA':
        capital = 'Des Moines'
    elif state == 'ND':
        capital = 'Bismarck'
    elif state == 'SD':
        capital = 'Pierre'

    return capital

print(get_capital('MN'))

St. Paul


### Better (match-case)

In [23]:
def get_capital(state):
    match state:
        case 'MN':
            capital = 'St. Paul'
        case 'WI':
            capital = 'Madison'
        case 'IA':
            capital = 'Des Moines'
        case 'ND':
            capital = 'Bismarck'
        case 'SD':
            capital = 'Pierre'

    return capital

print(get_capital('MN'))

St. Paul


### Do this (mapping dict)

In [24]:
def get_capital(state):
    capitals = {
        'MN': 'St. Paul',
        'WI': 'Madison',
        'IA': 'Des Moines',
        'ND': 'Bismarck',
        'SD': 'Pierre'
    }

    return capitals[state]

print(get_capital('MN'))

St. Paul


## For loops

### Don't do that (append)

In [25]:
import re

def count_vowels(text: str) -> int:
    return len(re.findall('[aeiou]', text, re.IGNORECASE))

words = ['Pymntos', 'Minnesota', 'Python']

counts = []
for i in words:
    counts.append(count_vowels(i))

print(counts)

[1, 4, 1]


### Do this (use list comprehension)

In [26]:
import re

def count_vowels(text: str) -> int:
    return len(re.findall('[aeiou]', text, re.IGNORECASE))

words = ['Pymntos', 'Minnesota', 'Python']

counts = [count_vowels(i) for i in words]

print(counts)

[1, 4, 1]


### Don't do that (indexes)

In [27]:
import random

characters = [
    'Mario',
    'Luigi',
    'Wario',
    'Waluigi',
    'Peach',
    'Daisy',
    'Toad',
    'Bowser',
    'Bowser Jr.',
    'Yoshi',
    'Birdo',
    'Donkey Kong'
]

points = [15, 12, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

random.seed(20251211)

#simulate 4 races
results = [
    random.sample(
        characters,
        len(characters)
    )
    for _ in range(4)
]

for i in results:
    print(i)

['Wario', 'Birdo', 'Yoshi', 'Bowser Jr.', 'Mario', 'Bowser', 'Peach', 'Luigi', 'Waluigi', 'Donkey Kong', 'Daisy', 'Toad']
['Donkey Kong', 'Mario', 'Bowser', 'Yoshi', 'Birdo', 'Toad', 'Wario', 'Waluigi', 'Peach', 'Bowser Jr.', 'Daisy', 'Luigi']
['Bowser', 'Yoshi', 'Bowser Jr.', 'Luigi', 'Donkey Kong', 'Waluigi', 'Daisy', 'Wario', 'Birdo', 'Mario', 'Peach', 'Toad']
['Bowser Jr.', 'Mario', 'Luigi', 'Birdo', 'Wario', 'Donkey Kong', 'Toad', 'Waluigi', 'Bowser', 'Peach', 'Yoshi', 'Daisy']


In [28]:
print('Course 1:\n')

for i in range(len(characters)):
    name = results[0][i]
    print(f'{i+1}: {name}')

Course 1:

1: Wario
2: Birdo
3: Yoshi
4: Bowser Jr.
5: Mario
6: Bowser
7: Peach
8: Luigi
9: Waluigi
10: Donkey Kong
11: Daisy
12: Toad


### Do this (enumerate)

In [29]:
print('Course 1:\n')

for index, name in enumerate(results[0]):
    print(f'{index+1}: {name}')

Course 1:

1: Wario
2: Birdo
3: Yoshi
4: Bowser Jr.
5: Mario
6: Bowser
7: Peach
8: Luigi
9: Waluigi
10: Donkey Kong
11: Daisy
12: Toad


### Don't do that (multiple lists)

In [30]:
print('Course 1:\n')

for i in range(len(characters)):
    name = results[0][i]
    score = points[i]
    print(f'{name}: {score}')

Course 1:

Wario: 15
Birdo: 12
Yoshi: 10
Bowser Jr.: 9
Mario: 8
Bowser: 7
Peach: 6
Luigi: 5
Waluigi: 4
Donkey Kong: 3
Daisy: 2
Toad: 1


### Do this (use zip)

In [31]:
print('Course 1:\n')

for name, score in zip(results[0], points):
        print(f'{name}: {score}')

Course 1:

Wario: 15
Birdo: 12
Yoshi: 10
Bowser Jr.: 9
Mario: 8
Bowser: 7
Peach: 6
Luigi: 5
Waluigi: 4
Donkey Kong: 3
Daisy: 2
Toad: 1


### Don't do that (nested for loops)

In [32]:
def results_to_dict(course, points):
    return dict(zip(course, points))

results_dicts = [results_to_dict(course, points) for course in results]

scores = {name: 0 for name in characters}

for course in results_dicts:
    for name in characters:
        scores[name] += course[name]

print(sorted(scores.items(), key = lambda item: item[1], reverse = True))

[('Bowser Jr.', 37), ('Bowser', 36), ('Mario', 35), ('Wario', 34), ('Yoshi', 33), ('Birdo', 33), ('Donkey Kong', 33), ('Luigi', 25), ('Waluigi', 21), ('Peach', 15), ('Toad', 15), ('Daisy', 11)]


### Do this (use itertools)

In [33]:
import itertools as it

def results_to_dict(course, points):
    return dict(zip(course, points))

results_dicts = [results_to_dict(course, points) for course in results]

scores = {name: 0 for name in characters}

for course, name in it.product(results_dicts, characters):
    scores[name] += course[name]

print(sorted(scores.items(), key = lambda item: item[1], reverse = True))

[('Bowser Jr.', 37), ('Bowser', 36), ('Mario', 35), ('Wario', 34), ('Yoshi', 33), ('Birdo', 33), ('Donkey Kong', 33), ('Luigi', 25), ('Waluigi', 21), ('Peach', 15), ('Toad', 15), ('Daisy', 11)]


## Data analysis
Data from [NYC Taxi and Limousine Commission](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page)

### Don't use that

![pandas logo](https://pandas.pydata.org/static/img/pandas_white.svg)

In [34]:
import pandas as pd
import glob

taxi_files = glob.glob(
    'data/yellow_tripdata_*.parquet'
)

taxi_trips = pd.concat(
    pd.read_parquet(file)
    for file in taxi_files
)

print(taxi_trips.shape)

print('\n')

print(taxi_trips.head())

(11724069, 20)


   VendorID tpep_pickup_datetime tpep_dropoff_datetime  passenger_count  \
0         1  2025-07-01 00:29:37   2025-07-01 00:45:30              1.0   
1         1  2025-07-01 00:23:28   2025-07-01 01:07:44              1.0   
2         2  2025-07-01 00:53:50   2025-07-01 01:27:12              1.0   
3         2  2025-07-01 00:58:49   2025-07-01 01:15:55              1.0   
4         2  2025-07-01 00:09:22   2025-07-01 00:23:54              1.0   

   trip_distance  RatecodeID store_and_fwd_flag  PULocationID  DOLocationID  \
0           7.30         1.0                  N           138            74   
1          17.70         2.0                  N           132           142   
2           9.98         1.0                  N           138            48   
3          10.27         1.0                  N           138           229   
4           2.94         1.0                  N           211            97   

   payment_type  fare_amount  extra  mta_tax  tip_amount 

### Use this instead
[![polars logo](https://raw.githubusercontent.com/pola-rs/polars-static/master/banner/polars_github_banner.svg)](https://pola.rs/)

In [35]:
import polars as pl

taxi_trips = pl.read_parquet('data/yellow_tripdata_*.parquet')

print(taxi_trips.shape)

print('\n')

print(taxi_trips.head())

print('\n')

print(taxi_trips.head().to_pandas())

(11724069, 20)


shape: (5, 20)
┌──────────┬──────────────┬──────────────┬──────────────┬───┬──────────────┬──────────────┬─────────────┬──────────────┐
│ VendorID ┆ tpep_pickup_ ┆ tpep_dropoff ┆ passenger_co ┆ … ┆ total_amount ┆ congestion_s ┆ Airport_fee ┆ cbd_congesti │
│ ---      ┆ datetime     ┆ _datetime    ┆ unt          ┆   ┆ ---          ┆ urcharge     ┆ ---         ┆ on_fee       │
│ i32      ┆ ---          ┆ ---          ┆ ---          ┆   ┆ f64          ┆ ---          ┆ f64         ┆ ---          │
│          ┆ datetime[μs] ┆ datetime[μs] ┆ i64          ┆   ┆              ┆ f64          ┆             ┆ f64          │
╞══════════╪══════════════╪══════════════╪══════════════╪═══╪══════════════╪══════════════╪═════════════╪══════════════╡
│ 1        ┆ 2025-07-01   ┆ 2025-07-01   ┆ 1            ┆ … ┆ 54.79        ┆ 0.0          ┆ 1.75        ┆ 0.0          │
│          ┆ 00:29:37     ┆ 00:45:30     ┆              ┆   ┆              ┆              ┆             ┆              │


In [36]:
taxi_trips = pd.concat(
    pd.read_parquet(file)
    for file in taxi_files
)

#select
taxi_trips_subset = taxi_trips[['VendorID', 'passenger_count', 'trip_distance']]

print(taxi_trips_subset)

#filter
long_trips = taxi_trips[taxi_trips['trip_distance'] >= 10]

long_trips = taxi_trips.loc[
    lambda df: df['trip_distance'] >= 10
]

long_trips = taxi_trips.query('trip_distance >= 10')

print(long_trips.head())

#add column

taxi_trips['trip_time_min'] = (
    taxi_trips['tpep_dropoff_datetime']
    - taxi_trips['tpep_pickup_datetime']
).dt.total_seconds() / 60

taxi_trips = taxi_trips.assign(
    trip_time_min = lambda df: (
        df['tpep_dropoff_datetime']
        - df['tpep_pickup_datetime']
    ).dt.total_seconds() / 60
)

print(taxi_trips)

#group agg

taxi_trips_agg = taxi_trips.groupby('VendorID').agg(
    total_fare = pd.NamedAgg('fare_amount', lambda df: df.sum()),
    avg_distance = pd.NamedAgg('trip_distance', lambda df: df.mean())
).reset_index()

print(taxi_trips_agg)

         VendorID  passenger_count  trip_distance
0               1              1.0           7.30
1               1              1.0          17.70
2               2              1.0           9.98
3               2              1.0          10.27
4               2              1.0           2.94
...           ...              ...            ...
4251010         2              NaN           1.38
4251011         2              NaN           3.86
4251012         2              NaN           4.01
4251013         2              NaN           3.98
4251014         2              NaN           3.23

[11724069 rows x 3 columns]
    VendorID tpep_pickup_datetime tpep_dropoff_datetime  passenger_count  \
1          1  2025-07-01 00:23:28   2025-07-01 01:07:44              1.0   
3          2  2025-07-01 00:58:49   2025-07-01 01:15:55              1.0   
5          1  2025-07-01 00:39:14   2025-07-01 00:55:21              1.0   
11         2  2025-07-01 00:39:13   2025-07-01 01:09:25            

In [37]:
taxi_trips = pl.read_parquet('data/yellow_tripdata_*.parquet')

#select
taxi_trips_subset = taxi_trips.select('VendorID', 'passenger_count', 'trip_distance')

taxi_trips_subset = taxi_trips.select(
    pl.col('VendorID'),
    pl.col('passenger_count'),
    pl.col('trip_distance')
)

print(taxi_trips_subset)

#filter
long_trips = taxi_trips.filter(
    pl.col('trip_distance') >= 10
)

print(long_trips.head())

#add column

taxi_trips = taxi_trips.with_columns(
    trip_time_min = (
        pl.col('tpep_dropoff_datetime')
        - pl.col('tpep_pickup_datetime')
    ).dt.total_minutes()
)

print(taxi_trips)

#group agg

taxi_trips_agg = taxi_trips.group_by('VendorID').agg(
    total_fare = pl.col('fare_amount').sum(),
    avg_distance = pl.col('trip_distance').mean()
)

print(taxi_trips_agg.sort('VendorID'))

shape: (11_724_069, 3)
┌──────────┬─────────────────┬───────────────┐
│ VendorID ┆ passenger_count ┆ trip_distance │
│ ---      ┆ ---             ┆ ---           │
│ i32      ┆ i64             ┆ f64           │
╞══════════╪═════════════════╪═══════════════╡
│ 1        ┆ 1               ┆ 7.3           │
│ 1        ┆ 1               ┆ 17.7          │
│ 2        ┆ 1               ┆ 9.98          │
│ 2        ┆ 1               ┆ 10.27         │
│ 2        ┆ 1               ┆ 2.94          │
│ …        ┆ …               ┆ …             │
│ 2        ┆ null            ┆ 1.38          │
│ 2        ┆ null            ┆ 3.86          │
│ 2        ┆ null            ┆ 4.01          │
│ 2        ┆ null            ┆ 3.98          │
│ 2        ┆ null            ┆ 3.23          │
└──────────┴─────────────────┴───────────────┘
shape: (5, 20)
┌──────────┬──────────────┬──────────────┬──────────────┬───┬──────────────┬──────────────┬─────────────┬──────────────┐
│ VendorID ┆ tpep_pickup_ ┆ tpep_dropoff ┆

### Conclusion

Questions?<br>
Do you have any "don't do that, do this" tips of your own?