# Example usage

The `features_creator` package was designed for use with `pandas` dataframes storing temporal data.

For example, assume that you are working with a dataset for a telecommunications company that records the weekly payments from its customers. You have a dataframe with rows for individual customers and columns for the payment that they made at a particular week. Your task is to engineer some new features from this data to help with some prediction task.

In [17]:
import pandas as pd

# TODO: make better numbers
example_data = pd.DataFrame({
        "week_payment1": [1, 2, 3],  # Columns of interest
        "week_payment2": [4, 5, 6],
        "week_payment3": [7, 8, 9],
        "week_payment4": [10, 11, 12],
        "week_payment5": [13, 14, 15],
        "othercolumn": [5, 6, 7],  # Other example column
        "week_payment_string6": [5, 6, 7]  # Other example column with an integer
    })

example_data

Unnamed: 0,week_payment1,week_payment2,week_payment3,week_payment4,week_payment5,othercolumn,week_payment_string6
0,1,4,7,10,13,5,5
1,2,5,8,11,14,6,6
2,3,6,9,12,15,7,7


Note that the columns of interest in the dataframe should all have the same name and should have an incrementing integer as the last character(s).

The `calculate_average` function allows you to calculate the average payment that each customer has made across all the recorded weeks. It takes two arguments: the dataframe to use and the pattern to match. The pattern to match is the prefix of the column name of interest not including the incrementing integer at the end.

In [11]:
from features_creator.features_creator import calculate_average

calculate_average(example_data, "week_payment")

array([7., 8., 9.])

Similarly, the `calculate_standard_deviation` function allows you to calculate the standard deviation in the payment amount that each customer has made across all the recorded weeks.

In [12]:
from features_creator.features_creator import calculate_standard_deviation

calculate_standard_deviation(example_data, "week_payment")

Unnamed: 0,week_payment_std
0,18.0
1,18.0
2,18.0


# TODO: how `percentage_change`?

In [13]:
from features_creator.features_creator import calculate_percentage_change

calculate_percentage_change(example_data, "week_payment", compare_period=(1,1))

array([-75., -60., -50.])

In [14]:
calculate_percentage_change(example_data, "week_payment", compare_period=(2,2))

array([-70.58823529, -63.15789474, -57.14285714])

In [15]:
calculate_percentage_change(example_data, "week_payment", compare_period=(2,1))

array([-64.28571429, -56.25      , -50.        ])

Lastly, the `get_matching_column_names` function is what is used to extract the columns from the dataframe that match the given pattern. This function does not create new features but it is made public for transparency and to help with any troubleshooting. Notice how only the matching column names are returned.

In [18]:
from features_creator.features_creator import get_matching_column_names

get_matching_column_names(example_data, "week_payment")

['week_payment1',
 'week_payment2',
 'week_payment3',
 'week_payment4',
 'week_payment5']