##Training Data & Feature views


In [1]:
%pip install -U hopsworks --quiet

Note: you may need to restart the kernel to use updated packages.




In [2]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()

Copy your Api Key (first register/login): https://c.app.hopsworks.ai/account/api/generated

Paste it here: zDbNxQYcq9mYmN5Z.8MnghQkzlJyEazYkW172GOrkEXGdeBa1JtvaSJyApDH72thyp1GqWc06aULkkJc3
Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/295
2022-08-01 12:55:01,514 INFO: Generating grammar tables from /usr/lib/python3.7/lib2to3/Grammar.txt
2022-08-01 12:55:01,544 INFO: Generating grammar tables from /usr/lib/python3.7/lib2to3/PatternGrammar.txt
Connected. Call `.close()` to terminate connection gracefully.




##Feature Selection

We start by selecting all the features we want to include for model training/inference.

In [3]:
# Load feature groups.
trans_fg = fs.get_feature_group('transactions_fraud_batch_fg', version=1)
window_aggs_fg = fs.get_feature_group('transactions_4h_aggs_fraud_batch_fg', version=1)

In [4]:
# Select features for training data.
ds_query = trans_fg.select(["fraud_label", "category", "amount", "age_at_transaction", "days_until_card_expires", "loc_delta"])\
    .join(window_aggs_fg.select_except(["cc_num"]))

# uncomment this if you would like to view query results
#ds_query.show(5)

Transformation Functions 

We will preprocess our data using *min-max scaling* on numerical features and *label encoding* on categorical features. To do this we simply define a mapping between our features and transformation functions. This ensures that transformation functions such as *min-max scaling* are fitted only on the training data (and not the validation/test data), which ensures that there is no data leakage.

In [5]:
# Load transformation functions.
min_max_scaler = fs.get_transformation_function(name="min_max_scaler")
label_encoder = fs.get_transformation_function(name="label_encoder")

# Map features to transformations.
transformation_functions = {
    "category": label_encoder,
    "amount": min_max_scaler,
    "trans_volume_mavg": min_max_scaler,
    "trans_volume_mstd": min_max_scaler,
    "trans_freq": min_max_scaler,
    "loc_delta": min_max_scaler,
    "loc_delta_mavg": min_max_scaler,
    "age_at_transaction": min_max_scaler,
    "days_until_card_expires": min_max_scaler,
}

## Feature View Creation

The Feature Views allows schema in form of a query with filters, define a model target feature/label and additional transformation functions.
In order to create a Feature View we may use `fs.create_feature_view()`

In [6]:
feature_view = fs.create_feature_view(
    name='transactions_view_fraud_batch_fv',
    query=ds_query,
    labels=["fraud_label"],
    transformation_functions=transformation_functions
)

Feature view created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/295/fs/235/fv/transactions_view_fraud_batch_fv/version/2


##Training dataset Creation

In [7]:
td_version, td_job = feature_view.create_train_validation_test_split(
    description = 'transactions fraud batch training dataset',
    data_format = 'csv',
    validation_size = 0.2,
    test_size = 0.1,
    write_options = {'wait_for_job': True},
    coalesce = True,
)

Training dataset job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/295/jobs/named/transactions_view_fraud_batch_fv_2_1_create_fv_td_01082022125506/executions




## Training Dataset Retrieval

In [8]:
X_train, y_train, X_val, y_val, X_test, y_test = feature_view.get_train_validation_test_split(td_version)

In [9]:
X_train

Unnamed: 0,category,amount,age_at_transaction,days_until_card_expires,loc_delta,trans_volume_mstd,trans_volume_mavg,trans_freq,loc_delta_mavg
0,0,0.000000,0.010858,0.850530,0.026437,0.000000,0.000000,0.000000,0.027016
1,0,0.000000,0.047379,0.943808,0.037840,0.000000,0.000000,0.000000,0.038668
2,0,0.000000,0.063760,0.132038,0.000046,0.000000,0.000000,0.000000,0.000047
3,0,0.000000,0.340609,0.208485,0.224487,0.000000,0.000000,0.000000,0.229403
4,0,0.000000,0.954679,0.874915,0.194834,0.000000,0.000000,0.000000,0.199101
...,...,...,...,...,...,...,...,...,...
74102,5,0.005703,0.206617,0.228954,0.103895,0.002156,0.002156,0.002156,0.101783
74103,5,0.007304,0.909348,0.729056,0.227438,0.004702,0.004702,0.004702,0.205915
74104,5,0.013629,0.516297,0.314472,0.091271,0.005832,0.005832,0.005832,0.100012
74105,8,0.000488,0.488992,0.607059,0.046052,0.000488,0.000488,0.000488,0.047060


In [10]:
X_val

Unnamed: 0,category,amount,age_at_transaction,days_until_card_expires,loc_delta,trans_volume_mstd,trans_volume_mavg,trans_freq,loc_delta_mavg
0,0,3.336858e-07,0.364082,0.664443,0.098624,3.336858e-07,3.336858e-07,3.336858e-07,0.100784
1,0,3.336858e-07,0.374577,0.458921,0.260845,3.336858e-07,3.336858e-07,3.336858e-07,0.266557
2,0,3.336858e-07,0.817831,0.368757,0.124282,3.336858e-07,3.336858e-07,3.336858e-07,0.127004
3,0,3.336858e-07,0.850191,0.948100,0.142634,3.336858e-07,3.336858e-07,3.336858e-07,0.145757
4,0,6.673716e-07,0.030266,0.437965,0.219298,6.673716e-07,6.673716e-07,6.673716e-07,0.224101
...,...,...,...,...,...,...,...,...,...
21222,4,2.564709e-03,0.560466,0.835661,0.156463,8.005345e-03,8.005345e-03,8.005345e-03,0.172731
21223,4,2.690842e-03,0.206617,0.228957,0.029113,2.856851e-03,2.856851e-03,2.856851e-03,0.032210
21224,4,3.030201e-03,0.560466,0.835661,0.176636,5.422461e-03,5.422461e-03,5.422461e-03,0.160203
21225,4,3.077918e-03,0.922027,0.287236,0.058588,2.466939e-03,2.466939e-03,2.466939e-03,0.087421
