> She’s always been very frugal, and she clips every coupon and shops every sale at Noah’s Market. In fact I like to tease her that Noah actually loses money whenever she comes in the store.
>
> Once the subway fare increased, she stopped coming to visit me... I hope she remembers to invite me to the family reunion next year.

Ok, so I think this is where I'll need to pay attention to the "unit_price" vs "wholesale_cost" of products.
If the market really does lose money whenever she purchases something, it means that items she's buying cost less per-unit than they do wholesale.

In other words, the market is **selling items at a loss** to Nicole Wilson's cousin.

In [30]:
import polars as pl

customers = pl.read_csv("harder-data/noahs-customers.csv",try_parse_dates=True)
orders_items = pl.read_csv("harder-data/noahs-orders_items.csv",try_parse_dates=True)
orders = pl.read_csv("harder-data/noahs-orders.csv",try_parse_dates=True)
products = pl.read_csv("harder-data/noahs-products.csv",try_parse_dates=True)

## But first, what's a subway?

Can we use the information that her cousin stopped visiting her due to subway fare increase? This confused me at first, because all of the trains on Staten Island are aboveground. Aren't subways *below ground?*. 

But after poking around online, it seems like the whole system is referred to as The Subway citywide, regardless if many sections are aboveground.
To get technical, Staten Island Railway is "operated by the New York City Transit Authority **Department of Subways**" and "the line uses modified R44 and R211S **subway cars**" according to [its Wikipedia page](https://en.wikipedia.org/wiki/Staten_Island_Railway).

One important piece of info is that Staten Island is *not* connected to any other part of NYC by rail.
So maybe I'll get lucky and find only one cousin (someone with the last name Wilson, hopefully) who lives on Staten Island.


In [31]:
print(
    customers.filter(
        pl.col("citystatezip").str.starts_with("Staten Island"),
        pl.col("name").str.ends_with("Wilson"),
    ).select("name", "phone")
)

shape: (5, 2)
┌────────────────┬──────────────┐
│ name           ┆ phone        │
│ ---            ┆ ---          │
│ str            ┆ str          │
╞════════════════╪══════════════╡
│ Matthew Wilson ┆ 680-997-0472 │
│ Donald Wilson  ┆ 585-706-6255 │
│ Anna Wilson    ┆ 838-848-7496 │
│ Brandi Wilson  ┆ 716-729-2527 │
│ Crystal Wilson ┆ 680-234-0764 │
└────────────────┴──────────────┘


Alas, none of those people are the answer.
It was still a fun detour, though.

## Items sold at a loss

I'll make a table where I compute the profit made on every order.

In [35]:
orders_items_including_profit = products.join(
    orders_items, on="sku", how="inner"
).with_columns(profit=pl.col("unit_price") - pl.col("wholesale_cost"))

# Compute the
profit_per_order = orders_items_including_profit.group_by("orderid").agg(pl.col("profit").sum())
profit_per_order

orderid,profit
i64,f64
56703,45.58
95747,5.49
221579,26.0
244379,11.48
83239,42.68
…,…
56404,2.47
58634,0.84
31550,1.24
130208,0.46


Then I'll find the customer who has the lowest (since it's negative) total profit summed across their orders.

In [38]:
# Create a table with orderid, customerid, and profit
orders_with_profits = orders.select("orderid","customerid").join(profit_per_order, on="orderid", how="inner")

# Group by customer id and take the sum of the profit
top_thriftiest_customers = orders_with_profits.group_by("customerid").agg(pl.col("profit").sum()).sort("profit").head()
print(top_thriftiest_customers)

shape: (5, 2)
┌────────────┬─────────┐
│ customerid ┆ profit  │
│ ---        ┆ ---     │
│ i64        ┆ f64     │
╞════════════╪═════════╡
│ 8884       ┆ -1392.3 │
│ 9463       ┆ -0.97   │
│ 6189       ┆ -0.06   │
│ 10044      ┆ -0.06   │
│ 3511       ┆ -0.05   │
└────────────┴─────────┘


The thriftiest customer is very obvious here. They have costed Noah's Market $$85.59 ($1392.30 in the speedrun!!) over the course of their career as a penny-pincher.

In [40]:
thriftiest_customer_id = top_thriftiest_customers.item(0,0)
print(customers.filter(pl.col("customerid") == thriftiest_customer_id).select("customerid", "name","phone"))

shape: (1, 3)
┌────────────┬───────────────┬──────────────┐
│ customerid ┆ name          ┆ phone        │
│ ---        ┆ ---           ┆ ---          │
│ i64        ┆ str           ┆ str          │
╞════════════╪═══════════════╪══════════════╡
│ 8884       ┆ Deborah Green ┆ 838-295-7143 │
└────────────┴───────────────┴──────────────┘
