# Controlling cutoff times in a label search

The start time of the labeling process is known as the first cutoff time. You need data that exists before the first cutoff time to build features. You can use `minimum_data` in a label search to directly define the first cutoff time or the amount of data needed before the first cutoff time. Similarly, you can use `maximum_data` to directly define the last cutoff time. These parameters let you control when the labeling process starts and finishes.

In [None]:
from io import StringIO
from pandas import read_csv

transaction_data = """
customer_id,transaction_time,amount
3,2021-03-31 18:51:27,52.29
5,2021-03-22 06:56:05,33.81
5,2021-03-20 23:45:21,76.3
2,2021-03-30 10:06:59,32.72
1,2021-02-17 11:01:22,59.16
2,2021-01-16 10:59:44,56.33
3,2021-01-12 07:53:00,61.84
4,2021-03-15 21:00:25,34.91
2,2021-01-26 10:01:37,69.88
2,2021-02-07 05:42:14,49.7
2,2021-03-15 16:35:16,41.08
4,2021-02-06 13:17:19,32.34
2,2021-02-21 09:42:48,86.15
4,2021-03-24 00:40:24,97.08
4,2021-03-23 04:27:47,58.81
4,2021-02-23 13:32:22,59.67
4,2021-02-10 03:46:16,96.36
3,2021-03-13 09:24:54,25.4
1,2021-01-27 13:58:38,26.15
3,2021-02-23 03:26:58,28.96
1,2021-01-05 09:55:18,24.6
1,2021-03-09 07:14:27,49.64
1,2021-02-10 23:27:37,31.29
2,2021-01-23 18:19:05,42.88
1,2021-01-05 22:50:52,58.58
"""

created_account_data = """
customer_id,created_account
1,2021-01-10
2,2021-02-12
3,2021-01-23
4,2021-02-13
5,2021-01-24
"""

with StringIO(transaction_data) as data:
    transactions = read_csv(data, parse_dates=["transaction_time"])

with StringIO(created_account_data) as data:
    created_account = read_csv(
        data, parse_dates=["created_account"], index_col="customer_id"
    )["created_account"]

## Labeling customer transactions

For example, suppose you have customer transactions from the first quarter of 2021.

In [None]:
import composeml as cp

transactions.head()

You want to calculate the total amount that customers spent over two weeks *only for February*. Start by defining a labeling function that sums up the transaction amount. Then, create a label maker that will label data over two weeks using the transaction time.

In [None]:
def total_amount(ds):
    return ds.amount.sum()


lm = cp.LabelMaker(
    labeling_function=total_amount,
    time_index="transaction_time",
    target_dataframe_index="customer_id",
    window_size="14d",
)

### Defining the first and last cutoff time

Now, you can use `minimum_data` in the label search to directly set the 1st of February as the first cutoff time. Since you are labeling data over two weeks, you can define the last cutoff time as the 15th.

In [None]:
lt = lm.search(
    df=transactions.sort_values("transaction_time"),
    num_examples_per_instance=-1,
    minimum_data="2021-02-01",
    maximum_data="2021-02-15",
    drop_empty=False,
    verbose=False,
)

lt

### Changing the first cutoff time for each customer

Suppose you have a lookup table that contains the dates when customers signed up and created their accounts. Now, you are interested in calculating the total amount that customers spent over two weeks *only after creating an account*.

In [None]:
created_account

You can use the column of sign up dates directly as the first cutoff times in the labeling process. Each customer should only have one cutoff time.

In [None]:
lt = lm.search(
    df=transactions.sort_values("transaction_time"),
    num_examples_per_instance=-1,
    minimum_data=created_account,
    drop_empty=False,
    verbose=False,
)

lt.head(10)