# Part 2: The Contractor

Finding the phone number in Part 1 unlocks Part 2, where you need to find a contractor's phone number.

> ... **they usually talked about the project over coffee and bagels at Noah’s before handing off the item to be cleaned. The contractors would pick up the tab and expense it, along with their cleaning supplies.**
> 
> “So this rug was apparently one of those special projects. The claim ticket said ‘2017 JP’. **‘2017’ is the year the item was brought in, and ‘JP’ is the initials of the contractor.**


In [8]:
import polars as pl

customers = pl.read_csv("harder-data/noahs-customers.csv",try_parse_dates=True)
orders_items = pl.read_csv("harder-data/noahs-orders_items.csv",try_parse_dates=True)
orders = pl.read_csv("harder-data/noahs-orders.csv",try_parse_dates=True)
products = pl.read_csv("harder-data/noahs-products.csv",try_parse_dates=True)

First I'll find all the customers whose initials are JP.

In [28]:
initials_regex = r"^D.+ S.+$"
customers_with_matching_initials = customers.filter(pl.col("name").str.contains(initials_regex))
print(f"{len(customers_with_matching_initials)}/{len(customers)} customers match the initals regex of '{initials_regex}'")

71/10237 customers match the initals regex of '^D.+ S.+$'


How many orders did they collectively place in 2017?

In [29]:
# Get all of the orders (including items) from INITIALS in 2017
orders_2017 = orders.filter(pl.col("ordered").dt.year() == 2017).join(
    customers_with_matching_initials, on="customerid", how="inner"
)
orders_items_2017 = orders_items.join(orders_2017, on="orderid", how="inner")
print(orders_items_2017)

shape: (543, 17)
┌─────────┬─────────┬─────┬────────────┬───┬──────────────┬─────────────────┬──────────┬───────────┐
│ orderid ┆ sku     ┆ qty ┆ unit_price ┆ … ┆ phone        ┆ timezone        ┆ lat      ┆ long      │
│ ---     ┆ ---     ┆ --- ┆ ---        ┆   ┆ ---          ┆ ---             ┆ ---      ┆ ---       │
│ i64     ┆ str     ┆ i64 ┆ f64        ┆   ┆ str          ┆ str             ┆ f64      ┆ f64       │
╞═════════╪═════════╪═════╪════════════╪═══╪══════════════╪═════════════════╪══════════╪═══════════╡
│ 1256    ┆ HOM3370 ┆ 1   ┆ 9.24       ┆ … ┆ 929-724-7181 ┆ America/New_Yor ┆ 40.76445 ┆ -73.92904 │
│         ┆         ┆     ┆            ┆   ┆              ┆ k               ┆          ┆           │
│ 1310    ┆ HOM7886 ┆ 1   ┆ 134.75     ┆ … ┆ 838-591-7147 ┆ America/New_Yor ┆ 40.73383 ┆ -73.98621 │
│         ┆         ┆     ┆            ┆   ┆              ┆ k               ┆          ┆           │
│ 1321    ┆ TOY5812 ┆ 1   ┆ 15.03      ┆ … ┆ 716-950-2530 ┆ America/New_Yo

Ok, so I need to find the customer who buys coffee, bagels, and cleaning supplies.
Visually inspecting the products table shows that I only need to filter on products with descriptions containing "bagel", "coffee", and "cleaner".

In [30]:
coffee_bagels_cleaners = products.filter(pl.col("desc").str.contains("Bagel|Coffee|Cleaner"))
print(coffee_bagels_cleaners)

shape: (4, 4)
┌─────────┬───────────────┬────────────────┬───────────────┐
│ sku     ┆ desc          ┆ wholesale_cost ┆ dims_cm       │
│ ---     ┆ ---           ┆ ---            ┆ ---           │
│ str     ┆ str           ┆ f64            ┆ str           │
╞═════════╪═══════════════╪════════════════╪═══════════════╡
│ BKY0542 ┆ Caraway Bagel ┆ 1.1            ┆ 14.7|9.0|3.2  │
│ BKY6777 ┆ Sesame Bagel  ┆ 0.99           ┆ 19.7|19.3|2.3 │
│ HOM6863 ┆ Rug Cleaner   ┆ 1.64           ┆ 17.2|13.7|8.1 │
│ DLI7565 ┆ Coffee, Drip  ┆ 1.42           ┆ 16.6|7.4|7.4  │
└─────────┴───────────────┴────────────────┴───────────────┘


~~I want all the order ids where coffee **AND** bagels **AND** cleaner were purchased, ~~but I'm going to take a shortcut and find the orders where coffee **OR** bagels **OR** cleaner were purchased.~~  I can no longer take shortcuts in the speedrun.
Then I'll take the orders that had three items or more.

In [54]:
potential_contractor_orders = (
    orders_items_2017.join(coffee_bagels_cleaners, on="sku", how="inner")
    .group_by("orderid")
    .agg(pl.col("sku"))
)
print(potential_contractor_orders.filter(pl.col("sku").list.len() >= 3))

shape: (1, 2)
┌─────────┬─────────────────────────────────┐
│ orderid ┆ sku                             │
│ ---     ┆ ---                             │
│ i64     ┆ list[str]                       │
╞═════════╪═════════════════════════════════╡
│ 5618    ┆ ["DLI7565", "HOM6863", … "BKY0… │
└─────────┴─────────────────────────────────┘


FIXME: ~~This happens to work in this case, but I suspect it'll fail in larger datasets. (spoiler: it did)~~
~~I find it hard to believe there's only one person within initals JP who bought three coffees in 2017.~~

Now I find the customer who placed that order. Going to copy paste the orderid from above output instead of carry over programatically.
FIXME: DO THIS PROGRAMATICALLY

In [57]:

print(
    orders.filter(pl.col("orderid") == 5618).join(customers, on="customerid", how="inner").select("customerid", "name", "phone")
)

shape: (1, 3)
┌────────────┬───────────────────┬──────────────┐
│ customerid ┆ name              ┆ phone        │
│ ---        ┆ ---               ┆ ---          │
│ i64        ┆ str               ┆ str          │
╞════════════╪═══════════════════╪══════════════╡
│ 5745       ┆ David Swanson Jr. ┆ 838-351-0370 │
└────────────┴───────────────────┴──────────────┘
