# Feature pipeline Production notebook

In [None]:
# Since this is a production notebook and both 'hopsworks' and 'Faker' are part of 
# requirements.txt we comment them here to save resources.

# !pip install -U hopsworks --quiet
# !pip install -U faker --quiet

### Imports

In [2]:
import pandas as pd
import datetime
import hopsworks
from helper import synthetic_data
import random
pd.options.mode.chained_assignment = None

In [3]:
# Again print statements are not needed in production

start_time = (datetime.datetime.now() - datetime.timedelta(hours=24)).strftime("%Y-%m-%d %H:%M:%S")
end_time = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
# print(start_time)
# print(end_time)

2022-11-24 09:31:26
2022-11-25 09:31:26


In [4]:
synthetic_data.FRAUD_RATIO = random.uniform(0.001, 0.005)
synthetic_data.TOTAL_UNIQUE_USERS = 1000
synthetic_data.TOTAL_UNIQUE_TRANSACTIONS = 54000
synthetic_data.CASH_WITHRAWAL_CARDS_TOTAL = 2000
synthetic_data.TOTAL_UNIQUE_CASH_WITHDRAWALS = 200
synthetic_data.START_DATE=start_time
synthetic_data.END_DATE=end_time

credit_cards = synthetic_data.generate_list_credit_card_numbers()
credit_cards_df = synthetic_data.create_credit_cards_as_df(credit_cards)
profiles_df = synthetic_data.create_profiles_as_df(credit_cards)
trans_df = synthetic_data.create_transactions_as_df(credit_cards)

## Feature Engineering

An indication of a fraudulent transaction would be a large number of transactions in a short period of time(chain attack).Also age can be a decisive factor, say elderly people are more likely to be targeted by fraudters. To facilitate model learning we will create additional features based on these patterns. In particular, we will create two types of features:
1. **Features that aggregate data from different data sources**. This could for instance be the age of a customer at the time of a transaction, which combines the `birthdate` feature from `profiles.csv` with the `datetime` feature from `transactions.csv`.
2. **Features that aggregate data from multiple time steps**. An example of this could be the transaction frequency of a credit card in the span of a few hours, which is computed using a window function

In [5]:
# Similarly, commenting out the code for displaying 'fraud_labels'

fraud_labels = trans_df.copy()[["tid", "cc_num", "datetime", "fraud_label"]]
# fraud_labels

Unnamed: 0,tid,cc_num,datetime,fraud_label
0,c62198fdd4903bdf05394582b70d7d06,4659295496557788,2022-11-24 09:31:27,0
1,3a099df13885d4dbbf9db27b74a84ef5,4678255098755179,2022-11-24 09:31:31,0
2,7272e97e54b398ed9b629dc4f55684b7,4690490079151957,2022-11-24 09:31:31,0
3,224981cf50b4c12e20ef46f50cea0c0c,4357371252336041,2022-11-24 09:31:36,0
4,b599fafb558109310f89bf4df5261cfc,4959839009754007,2022-11-24 09:31:37,0
...,...,...,...,...
60081,a85afe38b685cc0af501d4f15abfe6de,4796807885357879,2022-12-13 19:13:57,0
60082,6f533f68e7fe99d689bf6b5d1abad1ff,4796807885357879,2022-12-16 21:13:57,0
60083,c939108c33b4e4ee4360aba4a5aa1364,4796807885357879,2022-12-19 23:13:57,0
60084,c7ecf0539d7a28d750f5b2206142623f,4796807885357879,2022-12-23 01:13:57,0


In [6]:
from helper import features

fraud_labels.datetime = fraud_labels.datetime.map(lambda x: features.date_to_timestamp(x))
# fraud_labels

Unnamed: 0,tid,cc_num,datetime,fraud_label
0,c62198fdd4903bdf05394582b70d7d06,4659295496557788,1669282287000,0
1,3a099df13885d4dbbf9db27b74a84ef5,4678255098755179,1669282291000,0
2,7272e97e54b398ed9b629dc4f55684b7,4690490079151957,1669282291000,0
3,224981cf50b4c12e20ef46f50cea0c0c,4357371252336041,1669282296000,0
4,b599fafb558109310f89bf4df5261cfc,4959839009754007,1669282297000,0
...,...,...,...,...
60081,a85afe38b685cc0af501d4f15abfe6de,4796807885357879,1670958837000,0
60082,6f533f68e7fe99d689bf6b5d1abad1ff,4796807885357879,1671225237000,0
60083,c939108c33b4e4ee4360aba4a5aa1364,4796807885357879,1671491637000,0
60084,c7ecf0539d7a28d750f5b2206142623f,4796807885357879,1671758037000,0


In [7]:
trans_df.drop(['fraud_label'], inplace = True, axis=1)
trans_df = features.card_owner_age(trans_df, profiles_df)
trans_df = features.expiry_days(trans_df, credit_cards_df)
trans_df = features.activity_level(trans_df, 1)

In [8]:
window_len = 4
window_aggs_df = features.aggregate_activity_by_hour(trans_df, window_len)

### Connecting to hopsworks

In [9]:
project = hopsworks.login()
fs = project.get_feature_store()

Copy your Api Key (first register/login): https://c.app.hopsworks.ai/account/api/generated

Paste it here: ··········
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/3342
Connected. Call `.close()` to terminate connection gracefully.




### Inserting Synthetic data (to be used for inference)

In [10]:
trans_fg = fs.get_feature_group(name="cc_trans_fraud", version=2)
trans_fg.insert(trans_df, write_options={"wait_for_job" : False})

Uploading Dataframe: 0.00% |          | Rows 0/60086 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/3342/jobs/named/cc_trans_fraud_2_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7f8b2af89ed0>, None)

In [11]:
window_aggs_fg = fs.get_feature_group(name=f"cc_trans_fraud_{window_len}h", version=2)
window_aggs_fg.insert(window_aggs_df, write_options={"wait_for_job" : False})

Uploading Dataframe: 0.00% |          | Rows 0/60086 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/3342/jobs/named/cc_trans_fraud_4h_2_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7f8b1c4869d0>, None)

In [12]:
labels_fg = fs.get_feature_group(name="transactions_fraud_label", version=2)
labels_fg.insert(fraud_labels)

Uploading Dataframe: 0.00% |          | Rows 0/60086 | Elapsed Time: 00:00 | Remaining Time: ?

Launching offline feature group backfill job...
Backfill Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/3342/jobs/named/transactions_fraud_label_2_offline_fg_backfill/executions


(<hsfs.core.job.Job at 0x7f8b1b513610>, None)