# Compose

Compose is a Python library that can be used to automate prediction engineering. It provides a standardized way for structuring prediction problems; the end-user defines the outcome of interest by creating a labelling function. Compose then runs a search and automatically extracts the relevant training examples from historical data. 

To read about it more, please refer [this](https://analyticsindiamag.com/guide-to-prediction-engineering-with-compose/) article.

# Prediction Engineering with Compose

  Install Compose from PyPI:

In [None]:
!python -m pip install pip --upgrade --user -q
!python -m pip install numpy pandas seaborn matplotlib scipy sklearn statsmodels tensorflow keras --user -q

In [None]:
!python -m pip install composeml --user -q

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

  Import necessary libraries and load the data.

In [None]:
import matplotlib.pyplot as plt
import composeml as cp
df = cp.demos.load_transactions()
df.head() 

  Write the labelling function. 

The labelling function will calculate the total of a customer’s transactions over a span of an hour. It will be passed groups of data points corresponding to different windows of one hour; all it needs to do is add them up.

In [None]:
def amount_spent(data):
     total = data['amount'].sum()
     return total 

Create a LabelMaker object for the prediction problem. We intend to calculate the hourly transactions of each customer, so we set the target_entity to the customer ID, the window_size to one hour and pass our labelling function.

In [None]:
label_maker = cp.LabelMaker(
     target_entity="customer_id",
     time_index="transaction_time",
     labeling_function=amount_spent,
     window_size="1h",
 ) 

Use the search() method on the LabelMaker object to automatically search for and extract labels.

In [None]:
labels = label_maker.search(
    df.sort_values('transaction_time'),
    num_examples_per_instance=-1,
    gap=1,
    verbose=True,
)
labels.head() 

In [None]:
labels.plot.dist()

  Various transformations can be applied to the LabelTimes table to modify the label as per the problem. 

Let’s say you want to create binary labels for the threshold of transaction amounts greater than $200. This can be done using the threshold() method:

In [None]:
binary_labels = labels.threshold(200)
binary_labels.head() 

binary_labels.plot.distribution()

Or maybe you want to shift the label times by one hour for predicting in advance. This can be achieved using the apply_lead() method: 

In [None]:
shifted_labels = labels.apply_lead('1h')
shifted_labels.head() 

You can learn more about the available methods here.

  Once you’re satisfied with the labels, you can use the describe() method to print out the distribution of the labels and the settings and transformations that were used to create them.

In [None]:
binary_labels.describe()