# ComposeML
### Getting Started
In this example, we have a data frame of transactions from different customers. To get an idea on how the data looks, we preview the data frame.

In [1]:
from featuretools.demo import load_mock_customer
df = load_mock_customer(return_single_table=True)
df.set_index('transaction_time', inplace=True)

df[df.columns[:8]].head()

Unnamed: 0_level_0,transaction_id,session_id,product_id,amount,customer_id,device,session_start,zip_code
transaction_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2014-01-01 00:00:00,298,1,5,127.64,2,desktop,2014-01-01 00:00:00,13244
2014-01-01 00:09:45,10,1,5,57.39,2,desktop,2014-01-01 00:00:00,13244
2014-01-01 00:14:05,495,1,5,69.45,2,desktop,2014-01-01 00:00:00,13244
2014-01-01 02:33:50,460,10,5,123.19,2,tablet,2014-01-01 02:31:40,13244
2014-01-01 02:37:05,302,10,5,64.47,2,tablet,2014-01-01 02:31:40,13244


We want to extract label times for each customer where the label equals the total purchase amount over the next hour of transactions. First, we define the function that will return the total purchase amount given a hour of transactions.

In [2]:
def my_labeling_function(df_slice):
    label = df_slice["amount"].sum()
    return label

With the labeling function, we create the `LabelMaker` for our prediction problem. We need an hour of transactions for each label, so we set `window_size` to one hour.

In [3]:
from composeml import LabelMaker

label_maker = LabelMaker(
    target_entity="customer_id",
    time_index="transaction_time",
    labeling_function=my_labeling_function,
    window_size="1h",
)

With the label maker, we automatically search and extract the labels from the data frame by using `search`.

In [4]:
labels = label_maker.search(
    df,
    minimum_data="1h",
    num_examples_per_instance=2,
    gap="2h",
)

labels.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,my_labeling_function
customer_id,time,Unnamed: 2_level_1
1,2014-01-01 01:45:30,1052.03
1,2014-01-01 03:45:30,943.28
2,2014-01-01 01:00:00,0.0
2,2014-01-01 03:00:00,1600.65
3,2014-01-01 02:45:05,0.0


Next, we make the lables binary by using a `threshold` for total purchase amounts above 1000.

In [5]:
labels = labels.threshold(1000)

labels.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,my_labeling_function
customer_id,time,Unnamed: 2_level_1
1,2014-01-01 01:45:30,True
1,2014-01-01 03:45:30,False
2,2014-01-01 01:00:00,False
2,2014-01-01 03:00:00,True
3,2014-01-01 02:45:05,False


We could also take those label times and shift the time 1 hour earlier for predicting in advance.

In [6]:
labels = labels.apply_lead('1h')

labels.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,my_labeling_function
customer_id,time,Unnamed: 2_level_1
1,2014-01-01 00:45:30,True
1,2014-01-01 02:45:30,False
2,2014-01-01 00:00:00,False
2,2014-01-01 02:00:00,True
3,2014-01-01 01:45:05,False


With the labels, we could use `describe` to get the distribution and the settings used to make the labels.

In [7]:
labels.describe()

False    7
True     3
Name: my_labeling_function, dtype: int64

name                         my_labeling_function
target_entity                         customer_id
num_examples_per_instance                       2
minimum_data                                   1h
window_size                                    1h
gap                                            2h
threshold                                    1000
lead                                           1h
dtype: object

