So we're looking for someone who has a habit of buying pastries early in the morning!

I built a full merged version of the dataset just to not have to do that all the time. So let's start by importing that.

In [1]:
import pandas as pd
import numpy as np
full = pd.read_csv('full.csv').drop('Unnamed: 0',axis=1)
full.ordered = pd.to_datetime(full.ordered,infer_datetime_format = True)
full.shipped = pd.to_datetime(full.shipped,infer_datetime_format = True)
full.head()

Unnamed: 0,customerid,name,address,citystatezip,birthdate,phone,orderid,ordered,shipped,items,total,sku,qty,unit_price,desc,wholesale_cost
0,11145,Frederick Moss,2855 Bronx Park E,"Bronx, NY 10467",1971-07-15,917-807-7174,1589,2017-02-05 16:15:23,2017-02-05 20:00:00,,33.47,DLI0002,1,10.16,Smoked Whitefish Sandwich,9.33
1,4140,Linda Porter,559 W 139th St,"Manhattan, NY 10031",1979-06-09,516-933-2477,1767,2017-02-07 14:48:09,2017-02-07 16:15:00,,283.37,DLI0002,1,10.75,Smoked Whitefish Sandwich,9.33
2,6080,Amy Wilson,375 W 123rd St,"Manhattan, NY 10027",1976-06-08,838-660-8339,1934,2017-02-09 11:21:16,2017-02-09 12:45:00,,22.25,DLI0002,1,12.17,Smoked Whitefish Sandwich,9.33
3,5406,Tina Lauren Goodwin,1431B St Nicholas Ave,"Manhattan, NY 10033",1960-12-10,585-878-9905,2944,2017-02-19 15:52:35,2017-02-19 17:45:00,,44.52,DLI0002,1,11.69,Smoked Whitefish Sandwich,9.33
4,7635,Robert Armstrong,620 W 162nd St,"Manhattan, NY 10032",1982-07-30,516-851-2207,4207,2017-03-05 10:16:19,2017-03-05 10:16:19,,13.04,DLI0002,1,11.37,Smoked Whitefish Sandwich,9.33


Let's figure out bakery related SKUs

In [2]:
full['sku'] = full['sku'].str.slice(0,3)
full['sku'].value_counts()

KIT    121726
PET    101443
HOM     66004
TOY     56562
COL     30645
DLI     27535
CMP     16190
BKY      7671
Name: sku, dtype: int64

Looks like BKY is probably bakery, let's try it.

In [3]:
bky = full[(full.sku.str.contains('BKY'))]

Now, to get early in the morning we'll use `ordered` and `shipped`. It's possible she orders ahead and shipped contains the pickup time? So let's just use both. I'll give it until 6am to hedge -- maybe she doesn't always pick up before 5.

In [4]:
early = bky[(bky.ordered.dt.hour < 6) |(bky.shipped.dt.hour < 6)]

Now, let's find returning customers

In [5]:
num = early.groupby('customerid')[['name']].count()
returning = num[num.name > 1].reset_index()
returning.sort_values('name',ascending=False).head()

Unnamed: 0,customerid,name
20,5375,10
23,6623,4
39,10374,4
31,8744,4
19,5362,3


We've got one habitual early morning pastry buyer! Let's try `5375`

In [6]:
full[full.customerid == 5375].drop_duplicates(['name','phone'])

Unnamed: 0,customerid,name,address,citystatezip,birthdate,phone,orderid,ordered,shipped,items,total,sku,qty,unit_price,desc,wholesale_cost
6986,5375,Christina Booker,1127 Grinnell Pl,"Bronx, NY 10474",1981-01-08,718-649-9036,189441,2022-04-01 23:21:55,2022-04-03 08:45:00,,37.62,PET,2,5.67,"Gluten-free Adult Cat Food, Salmon & Turkey",4.49


Tada!