When members have multiple email addresses on file, Broadstripes exports them all. By default, each email is in a different row. For many members, we even have multiple Dartmouth emails on file. Meanwhile, Mailchimp only allows one email per contact. It's a bit of a mess.

Some manual trial and error shows that Broadstripes seems to always list the member's "Primary" email first. So this script prunes the exported contact list to keep only the first Dart email for each member.

In [33]:
import pandas as pd
from pathlib import Path

infile = Path('/Users/cove/Documents/gold-stuff/listwork-scripts/data/20251006 Basic Info for Mailchimp.csv')
df = pd.read_csv(infile)

# Drop non-Dartmouth emails and aggregate by Broadstripes ID
emails = df["Email"]
df["keep"] = ~emails.isnull() & emails.str.contains("dartmouth.edu", case=False)

# Did we drop anyone?
ids_remaining = df.groupby("Broadstripes ID")["keep"].any()
ids_dropped = ids_remaining.loc[ids_remaining == False].index.tolist()
if ids_dropped:
    print("WARNING: dropped the following worker(s):")
    display(df.query("`Broadstripes ID` in @ids_dropped"))

# Keep only the first Dartmouth email for each ID
df_dart = df[df["keep"]]
df_dart = df_dart.groupby("Broadstripes ID").first().reset_index()
assert (df_dart["Broadstripes ID"].sort_values().tolist() == list(sorted(df["Broadstripes ID"].unique())))

df_dart = df_dart[[
    "Nickname",
    "Last Name",
    "Phone",
    "Email"
]]

outfile = infile.parent / (infile.stem + " - Pruned for Mailchimp.csv")
df_dart.to_csv(outfile, index=False)