# Getting Started


## Install

```
pip install composeml
```

## Load Data
In this example, we have a dataset of transactions from different customers. To get an idea on how the data looks, we preview the data frame.

In [None]:
from composeml.datasets import transactions

df = transactions()

df[df.columns[:5]].head()

## Create Labeling Function

We want to extract label times for each customer where the label equals the total purchase amount over the next hour of transactions. First, we define the function that will return the total purchase amount given a hour of transactions.

In [None]:
def my_labeling_function(df_slice):
    label = df_slice["amount"].sum()
    return label

In [None]:
from composeml import LabelMaker

label_maker = LabelMaker(
    target_entity="customer_id",
    time_index="transaction_time",
    labeling_function=my_labeling_function,
    window_size="1h",
)

## Generate Labels

With the label maker, we automatically search and extract the labels from the data frame by using `search`.

In [None]:
labels = label_maker.search(
    df,
    minimum_data="1h",
    num_examples_per_instance=25,
    gap=1,
    verbose=True,
)

labels.head()

## Transform Labels


### Apply Threshold on Label Values

Next, we make the lables binary by using a `threshold` for total purchase amounts above 100.

In [None]:
labels = labels.threshold(100)

labels.head()

### Lead Label Times
We could also take those label times and shift the time 1 hour earlier for predicting in advance.

In [None]:
labels = labels.apply_lead('1h')

labels.head()

## Describe Labels

With the labels, we could use `describe` to get the distribution and the settings used to make the labels.

In [None]:
labels.describe()

## Plot Labels

### Label Distribution

In [None]:
%matplotlib inline

labels.plot.distribution(stacked=True)

### Label Count vs. Time

In [None]:
labels.plot.count_by_time(figsize=(7, 5))