> I gave it to my sister. She wound up getting a newer and more expensive carpet, so she gave it to an acquaintance of hers who collects all sorts of junk. Apparently he owns an entire set of Noah’s collectibles!

Giving it to his sister seems like a red herring, and there's not much info I can glean from that.
So instead, I'll focus on the acquaintance.

First step is to figure out what the collectibles look like in the database.
I learn from visual inspection that collectibles have a `sku` starting with "COL".

In [31]:
import polars as pl

customers = pl.read_csv("harder-data/noahs-customers.csv",try_parse_dates=True)
orders_items = pl.read_csv("harder-data/noahs-orders_items.csv",try_parse_dates=True)
orders = pl.read_csv("harder-data/noahs-orders.csv",try_parse_dates=True)
products = pl.read_csv("harder-data/noahs-products.csv",try_parse_dates=True)

In [32]:
collectibles = products.filter(pl.col("sku").str.starts_with("COL"))
print(collectibles)

shape: (85, 4)
┌─────────┬──────────────────────────────┬────────────────┬────────────────┐
│ sku     ┆ desc                         ┆ wholesale_cost ┆ dims_cm        │
│ ---     ┆ ---                          ┆ ---            ┆ ---            │
│ str     ┆ str                          ┆ f64            ┆ str            │
╞═════════╪══════════════════════════════╪════════════════╪════════════════╡
│ COL0030 ┆ Noah's Action Figure (azure) ┆ 15.7           ┆ 18.2|15.2|13.6 │
│ COL0061 ┆ Noah's Jewelry (yellow)      ┆ 27.89          ┆ 9.2|2.5|2.3    │
│ COL0325 ┆ Noah's Bobblehead (amber)    ┆ 5.44           ┆ 15.7|6.3|1.5   │
│ COL0510 ┆ Noah's Poster (amber)        ┆ 3.58           ┆ 15.0|14.2|2.3  │
│ COL0582 ┆ Noah's Bobblehead (green)    ┆ 4.12           ┆ 10.3|4.9|0.6   │
│ …       ┆ …                            ┆ …              ┆ …              │
│ COL9236 ┆ Noah's Jewelry (azure)       ┆ 26.6           ┆ 19.6|6.6|5.6   │
│ COL9268 ┆ Noah's Poster (mauve)        ┆ 3.14           ┆ 1

I'll find all the orders that contain any collectibles.

In [33]:
orders_items_collectibles = orders_items.join(collectibles, on="sku", how="inner")
print(orders_items_collectibles)

shape: (38_162, 7)
┌─────────┬─────────┬─────┬────────────┬─────────────────────────┬────────────────┬────────────────┐
│ orderid ┆ sku     ┆ qty ┆ unit_price ┆ desc                    ┆ wholesale_cost ┆ dims_cm        │
│ ---     ┆ ---     ┆ --- ┆ ---        ┆ ---                     ┆ ---            ┆ ---            │
│ i64     ┆ str     ┆ i64 ┆ f64        ┆ str                     ┆ f64            ┆ str            │
╞═════════╪═════════╪═════╪════════════╪═════════════════════════╪════════════════╪════════════════╡
│ 1001    ┆ COL5420 ┆ 1   ┆ 12.27      ┆ Noah's Jersey (mauve)   ┆ 9.92           ┆ 14.4|12.9|6.7  │
│ 1006    ┆ COL7392 ┆ 1   ┆ 5.17       ┆ Noah's Bobblehead (red) ┆ 3.92           ┆ 14.9|14.5|6.5  │
│ 1007    ┆ COL5686 ┆ 2   ┆ 4.52       ┆ Noah's Poster (green)   ┆ 3.59           ┆ 14.4|14.1|12.7 │
│ 1008    ┆ COL3641 ┆ 1   ┆ 11.34      ┆ Noah's Jersey (azure)   ┆ 10.66          ┆ 10.5|10.1|3.4  │
│ 1015    ┆ COL8172 ┆ 1   ┆ 33.53      ┆ Noah's Jewelry (puce)   ┆ 29.27

Now I'll get the corresponding customerid for each of those rows.

In [34]:
customers_who_bought_collectibles = orders_items_collectibles.join(orders, on="orderid", how="inner")
print(customers_who_bought_collectibles.select("customerid"))

shape: (38_162, 1)
┌────────────┐
│ customerid │
│ ---        │
│ i64        │
╞════════════╡
│ 9576       │
│ 3564       │
│ 6323       │
│ 2602       │
│ 5093       │
│ …          │
│ 9472       │
│ 2322       │
│ 11226      │
│ 10823      │
│ 4942       │
└────────────┘


Group by customerid and get a list of the collectibles they bought.

In [35]:
customers_bought_num_collectibles = (
    customers_who_bought_collectibles.group_by("customerid")
    .agg(pl.col("sku"))
    .with_columns(pl.col("sku").list.len().alias("num_collectibles"))
    .sort("num_collectibles", descending=True)
)
print(customers_bought_num_collectibles.head())

shape: (5, 3)
┌────────────┬─────────────────────────────────┬──────────────────┐
│ customerid ┆ sku                             ┆ num_collectibles │
│ ---        ┆ ---                             ┆ ---              │
│ i64        ┆ list[str]                       ┆ u32              │
╞════════════╪═════════════════════════════════╪══════════════════╡
│ 2602       ┆ ["COL3641", "COL9236", … "COL3… ┆ 106              │
│ 8943       ┆ ["COL1322", "COL3894", … "COL4… ┆ 32               │
│ 5281       ┆ ["COL2279", "COL3967", … "COL2… ┆ 32               │
│ 6553       ┆ ["COL9150", "COL7086", … "COL7… ┆ 32               │
│ 6871       ┆ ["COL6487", "COL9592", … "COL1… ┆ 32               │
└────────────┴─────────────────────────────────┴──────────────────┘


Wow, that customer bought the most collectibles by far.

In [37]:
collectibles_nerd_id = customers_bought_num_collectibles.item(0,0)
print(
    customers.filter(pl.col("customerid") == collectibles_nerd_id).select(
        "customerid", "name", "phone"
    )
)

shape: (1, 3)
┌────────────┬───────────────┬──────────────┐
│ customerid ┆ name          ┆ phone        │
│ ---        ┆ ---           ┆ ---          │
│ i64        ┆ str           ┆ str          │
╞════════════╪═══════════════╪══════════════╡
│ 2602       ┆ Daniel Wilson ┆ 516-638-9966 │
└────────────┴───────────────┴──────────────┘
