# Part 6: The Bargain Hunter

> She’s always been very frugal, and she clips every coupon and shops every sale at Noah’s Market. In fact I like to tease her that Noah actually loses money whenever she comes in the store.
>
> Once the subway fare increased, she stopped coming to visit me... I hope she remembers to invite me to the family reunion next year.

If the market really does lose money whenever she purchases something, it means that items she's buying cost less per-unit than they do wholesale.

In other words, the market is **selling items at a loss** to Nicole Wilson's cousin.

In [43]:
import polars as pl

customers = pl.read_csv("data/noahs-customers.csv",try_parse_dates=True)
orders_items = pl.read_csv("data/noahs-orders_items.csv",try_parse_dates=True)
orders = pl.read_csv("data/noahs-orders.csv",try_parse_dates=True)
products = pl.read_csv("data/noahs-products.csv",try_parse_dates=True)

## But first, what's a subway? (Red herring) 

Can we use the information that her cousin stopped visiting her due to subway fare increase? This confused me at first, because all of the trains on Staten Island are aboveground. Aren't subways *below ground?*. 

But after poking around online, it seems like the whole system is referred to as "The Subway" citywide, regardless if many sections are aboveground.
To get technical, Staten Island Railway is "operated by the New York City Transit Authority **Department of Subways**" and "the line uses modified R44 and R211S **subway cars**" according to [its Wikipedia page](https://en.wikipedia.org/wiki/Staten_Island_Railway).

One important piece of info is that Staten Island is *not* connected to any other part of NYC by rail.
So maybe I'll get lucky and find only one cousin (someone with the same last name as the lady from part 5, hopefully) who lives on Staten Island.


In [44]:
print(
    customers.filter(
        pl.col("citystatezip").str.starts_with("Staten Island"),
        pl.col("name").str.ends_with("Wilson"),
    ).select("name", "phone")
)

shape: (4, 2)
┌─────────────────┬──────────────┐
│ name            ┆ phone        │
│ ---             ┆ ---          │
│ str             ┆ str          │
╞═════════════════╪══════════════╡
│ Nicole Wilson   ┆ 631-507-6048 │
│ Michaela Wilson ┆ 315-859-7694 │
│ Thomas Wilson   ┆ 914-910-1529 │
│ Edwin Wilson    ┆ 516-767-0295 │
└─────────────────┴──────────────┘


Alas, none of those people are the answer.
I guess it was just to highlight her cheapness.

It was still a fun detour, though.

## Items sold at a loss

I'll make a table where I compute the profit made on every order.

In [45]:
orders_items_including_profit = products.join(
    orders_items, on="sku", how="inner"
).with_columns(profit=pl.col("unit_price") - pl.col("wholesale_cost"))

profit_per_order = orders_items_including_profit.group_by("orderid").agg(pl.col("profit").sum())
print(profit_per_order)

shape: (213_232, 2)
┌─────────┬────────┐
│ orderid ┆ profit │
│ ---     ┆ ---    │
│ i64     ┆ f64    │
╞═════════╪════════╡
│ 65819   ┆ 3.11   │
│ 31783   ┆ 0.64   │
│ 16655   ┆ 0.49   │
│ 46089   ┆ 0.97   │
│ 56649   ┆ 2.06   │
│ …       ┆ …      │
│ 2027    ┆ 0.33   │
│ 173325  ┆ 0.39   │
│ 92266   ┆ 2.21   │
│ 16890   ┆ 1.51   │
│ 165218  ┆ 1.04   │
└─────────┴────────┘


Then I'll find the customer who has the lowest (since it's negative) total profit summed across their orders.

In [46]:
# Create a table with orderid, customerid, and profit
orders_including_profits = orders.select("orderid","customerid").join(profit_per_order, on="orderid", how="inner")

# Group by customer id and take the sum of the profit
print(orders_including_profits.group_by("customerid").agg(pl.col("profit").sum()).sort("profit").head())

shape: (5, 2)
┌────────────┬────────┐
│ customerid ┆ profit │
│ ---        ┆ ---    │
│ i64        ┆ f64    │
╞════════════╪════════╡
│ 4167       ┆ -85.59 │
│ 8286       ┆ -1.04  │
│ 7676       ┆ -0.17  │
│ 6309       ┆ -0.04  │
│ 2908       ┆ -0.02  │
└────────────┴────────┘


It seems that customer with ID 4167 has costed Noah's Market $85.59 over the course of their career as a penny-pincher.

In [48]:
print(customers.filter(pl.col("customerid") == 4167).select("customerid", "name","phone"))

shape: (1, 3)
┌────────────┬─────────────┬──────────────┐
│ customerid ┆ name        ┆ phone        │
│ ---        ┆ ---         ┆ ---          │
│ i64        ┆ str         ┆ str          │
╞════════════╪═════════════╪══════════════╡
│ 4167       ┆ Sherri Long ┆ 585-838-9161 │
└────────────┴─────────────┴──────────────┘
