> I gave it to my sister. She wound up getting a newer and more expensive carpet, so she gave it to an acquaintance of hers who collects all sorts of junk. Apparently he owns an entire set of Noah’s collectibles!

Giving it to his sister seems like a red herring, and there's not much info I can glean from that.
So instead, I'll focus on the acquaintance.

First step is to figure out what the collectibles look like in the database.
I learn from visual inspection that collectibles have a `sku` starting with "COL".

In [1]:
import polars as pl

customers = pl.read_csv("data/noahs-customers.csv",try_parse_dates=True)
orders_items = pl.read_csv("data/noahs-orders_items.csv",try_parse_dates=True)
orders = pl.read_csv("data/noahs-orders.csv",try_parse_dates=True)
products = pl.read_csv("data/noahs-products.csv",try_parse_dates=True)

In [13]:
collectibles = products.filter(pl.col("sku").str.starts_with("COL"))
print(collectibles)

shape: (85, 4)
┌─────────┬───────────────────────────────┬────────────────┬────────────────┐
│ sku     ┆ desc                          ┆ wholesale_cost ┆ dims_cm        │
│ ---     ┆ ---                           ┆ ---            ┆ ---            │
│ str     ┆ str                           ┆ f64            ┆ str            │
╞═════════╪═══════════════════════════════╪════════════════╪════════════════╡
│ COL0037 ┆ Noah's Jewelry (green)        ┆ 28.32          ┆ 17.4|11.2|5.7  │
│ COL0041 ┆ Noah's Ark Model (HO Scale)   ┆ 2487.35        ┆ 7.2|4.3|0.4    │
│ COL0065 ┆ Noah's Jewelry (mauve)        ┆ 33.52          ┆ 19.0|12.2|10.5 │
│ COL0166 ┆ Noah's Action Figure (blue)   ┆ 13.98          ┆ 12.1|7.7|7.2   │
│ COL0167 ┆ Noah's Bobblehead (blue)      ┆ 5.36           ┆ 8.9|5.6|0.6    │
│ …       ┆ …                             ┆ …              ┆ …              │
│ COL9349 ┆ Noah's Action Figure (orange) ┆ 15.47          ┆ 16.6|12.9|11.9 │
│ COL9420 ┆ Noah's Jewelry (amber)        ┆ 30.01

I'll find all the orders that contain any collectibles.

In [7]:
orders_items_collectibles = orders_items.join(collectibles, on="sku", how="inner")
print(orders_items_collectibles)

shape: (28_013, 7)
┌─────────┬─────────┬─────┬────────────┬─────────────────────────┬────────────────┬────────────────┐
│ orderid ┆ sku     ┆ qty ┆ unit_price ┆ desc                    ┆ wholesale_cost ┆ dims_cm        │
│ ---     ┆ ---     ┆ --- ┆ ---        ┆ ---                     ┆ ---            ┆ ---            │
│ i64     ┆ str     ┆ i64 ┆ f64        ┆ str                     ┆ f64            ┆ str            │
╞═════════╪═════════╪═════╪════════════╪═════════════════════════╪════════════════╪════════════════╡
│ 1014    ┆ COL4117 ┆ 1   ┆ 4.55       ┆ Noah's Poster (yellow)  ┆ 3.63           ┆ 19.7|12.0|6.3  │
│ 1015    ┆ COL8357 ┆ 1   ┆ 13.48      ┆ Noah's Lunchbox (mauve) ┆ 9.21           ┆ 19.9|17.4|9.8  │
│ 1018    ┆ COL6388 ┆ 1   ┆ 3.72       ┆ Noah's Gift Box         ┆ 3.28           ┆ 19.0|9.5|2.2   │
│         ┆         ┆     ┆            ┆ (magenta)               ┆                ┆                │
│ 1040    ┆ COL7454 ┆ 1   ┆ 10.65      ┆ Noah's Jersey (mauve)   ┆ 8.19 

Now I'll get the corresponding customerid for each of those rows.

In [11]:
customers_who_bought_collectibles = orders_items_collectibles.join(orders, on="orderid", how="inner")
print(customers_who_bought_collectibles.select("customerid"))

shape: (28_013, 1)
┌────────────┐
│ customerid │
│ ---        │
│ i64        │
╞════════════╡
│ 4716       │
│ 3808       │
│ 2645       │
│ 2520       │
│ 8385       │
│ …          │
│ 6222       │
│ 5950       │
│ 8352       │
│ 3894       │
│ 1368       │
└────────────┘


Group by customerid and get a list of the collectibles they bought.

In [28]:
customers_bought_num_collectibles = (
    customers_who_bought_collectibles.group_by("customerid")
    .agg(pl.col("sku"))
    .with_columns(pl.col("sku").list.len().alias("num_collectibles"))
    .sort("num_collectibles", descending=True)
)
print(customers_bought_num_collectibles.head())

shape: (5, 3)
┌────────────┬─────────────────────────────────┬──────────────────┐
│ customerid ┆ sku                             ┆ num_collectibles │
│ ---        ┆ ---                             ┆ ---              │
│ i64        ┆ list[str]                       ┆ u32              │
╞════════════╪═════════════════════════════════╪══════════════════╡
│ 3808       ┆ ["COL8357", "COL6858", … "COL2… ┆ 111              │
│ 1787       ┆ ["COL6461", "COL1263", … "COL9… ┆ 37               │
│ 6855       ┆ ["COL8354", "COL9011", … "COL5… ┆ 36               │
│ 3580       ┆ ["COL4363", "COL5018", … "COL2… ┆ 36               │
│ 8352       ┆ ["COL9948", "COL9349", … "COL1… ┆ 34               │
└────────────┴─────────────────────────────────┴──────────────────┘


Customer with id 3808 has bought the most collectibles by far.

In [30]:
print(
    customers.filter(pl.col("customerid") == 3808).select(
        "customerid", "name", "phone"
    )
)

shape: (1, 3)
┌────────────┬─────────────┬──────────────┐
│ customerid ┆ name        ┆ phone        │
│ ---        ┆ ---         ┆ ---          │
│ i64        ┆ str         ┆ str          │
╞════════════╪═════════════╪══════════════╡
│ 3808       ┆ James Smith ┆ 212-547-3518 │
└────────────┴─────────────┴──────────────┘
