> I turned around to see this cute guy holding an item I had bought. He said, ‘I got the same thing!’ We laughed about it and wound up swapping items because I wanted the color he got.
> I asked him to get some food with me and we spent the rest of the day together.

In [40]:
import polars as pl

customers = pl.read_csv("data/noahs-customers.csv",try_parse_dates=True)
orders_items = pl.read_csv("data/noahs-orders_items.csv",try_parse_dates=True)
orders = pl.read_csv("data/noahs-orders.csv",try_parse_dates=True)
products = pl.read_csv("data/noahs-products.csv",try_parse_dates=True)

Step 1: Find all the items with colors in their names. 
After inspecting the products table, it seems that colors are written in **lower-case** at the end of the product description in parentheses.
I'll separate the product color into its own column.

In [46]:
# Products with colors in the name always end in e.g. `... (blue)`, so use a regex that pulls that color name out
color_name_filter = r"(.+) \(([[:lower:]]+)\)"
colorful_products = products.filter(pl.col("desc").str.contains(color_name_filter))
products_with_colors_extracted = colorful_products.select(
    pl.col("sku"),
    pl.col("desc").str.extract(color_name_filter, group_index=1).alias("desc_without_color"),
    pl.col("desc").str.extract(color_name_filter, group_index=2).alias("color")
)
print(products_with_colors_extracted.head())

shape: (5, 3)
┌─────────┬──────────────────────┬───────┐
│ sku     ┆ desc_without_color   ┆ color │
│ ---     ┆ ---                  ┆ ---   │
│ str     ┆ str                  ┆ str   │
╞═════════╪══════════════════════╪═══════╡
│ COL0037 ┆ Noah's Jewelry       ┆ green │
│ COL0065 ┆ Noah's Jewelry       ┆ mauve │
│ COL0166 ┆ Noah's Action Figure ┆ blue  │
│ COL0167 ┆ Noah's Bobblehead    ┆ blue  │
│ COL0483 ┆ Noah's Action Figure ┆ mauve │
└─────────┴──────────────────────┴───────┘


Ok now I'll find all orders with items that contain "colorful products".

In [54]:
print(orders_items.join(products_with_colors_extracted, on="sku", how="inner"))

shape: (28_012, 6)
┌─────────┬─────────┬─────┬────────────┬──────────────────────┬─────────┐
│ orderid ┆ sku     ┆ qty ┆ unit_price ┆ desc_without_color   ┆ color   │
│ ---     ┆ ---     ┆ --- ┆ ---        ┆ ---                  ┆ ---     │
│ i64     ┆ str     ┆ i64 ┆ f64        ┆ str                  ┆ str     │
╞═════════╪═════════╪═════╪════════════╪══════════════════════╪═════════╡
│ 1014    ┆ COL4117 ┆ 1   ┆ 4.55       ┆ Noah's Poster        ┆ yellow  │
│ 1015    ┆ COL8357 ┆ 1   ┆ 13.48      ┆ Noah's Lunchbox      ┆ mauve   │
│ 1018    ┆ COL6388 ┆ 1   ┆ 3.72       ┆ Noah's Gift Box      ┆ magenta │
│ 1040    ┆ COL7454 ┆ 1   ┆ 10.65      ┆ Noah's Jersey        ┆ mauve   │
│ 1041    ┆ COL2141 ┆ 1   ┆ 5.87       ┆ Noah's Bobblehead    ┆ puce    │
│ …       ┆ …       ┆ …   ┆ …          ┆ …                    ┆ …       │
│ 214217  ┆ COL9349 ┆ 1   ┆ 18.31      ┆ Noah's Action Figure ┆ orange  │
│ 214218  ┆ COL0837 ┆ 1   ┆ 4.97       ┆ Noah's Poster        ┆ mauve   │
│ 214227  ┆ COL1263

And now I have to find consecutive orders (the orderids auto-increment by one) which have the same item but a different color...