# MCLabs Churn Analyzer - Data Preparation
Author: @cmh02

This Jupyter Notebook will be used for general data preparation for the model in three main steps:
1) Anonymizing the data
2) Cleaning the data
3) Adding features and recognizing the target

This is now accomplished via the mcalib McaDataPrepare package.

In [1]:
# Restart libraries for interactive testing
import importlib
import mcalib
importlib.reload(mcalib)

# Import mcalib 
from mcalib import McaDataPrepare, McaOutputMode, McaStorageMode, McaHashMode, McaDataCombine

In [2]:
# Create Data Prepare instance and run preparation steps on data
T1_DataPreparer = McaDataPrepare(inputFilePath="../data/gatheringoutput/1757051828.555/PlayerData.csv", autoPrepare=True, outputMode=McaOutputMode.FINAL, storageMode=McaStorageMode.INSTANCE, hashMode=McaHashMode.NONE)

A New MCA Data Preparer has been created!
-> AutoPrepare Mode: True
-> Output Mode: McaOutputMode.FINAL
-> Storage Mode: McaStorageMode.INSTANCE
-> Hash Mode: McaHashMode.NONE
-> Input File Path: ../data/gatheringoutput/1757051828.555/PlayerData.csv
-> Timestamp Folder Path: ../data/gatheringoutput/1757051828.555
-> Gather Data Folder Path: ../data/gatheringoutput
-> Data Directory Path: ../data
-> Relative Input File Path: 1757051828.555/PlayerData.csv
-> Output Folder Path (Public): ../data/prepared/public/
-> Output Folder Path (Private): ../data/prepared/private/


In [3]:
# Print Analysis of Dataframe
T1_DataPreparer.analyzeData()

                                             count            mean  \
balance                                  3155.0000     916036.5265   
lw_rev_total                             3155.0000     225594.7759   
lw_rev_phase                             3155.0000      77651.2425   
leaderboard_position_chems_all           3155.0000        117.2808   
leaderboard_position_chems_week          3155.0000          0.9458   
leaderboard_position_police_all          3155.0000          0.7078   
leaderboard_position_police_week         3155.0000          0.0010   
chemrank                                 3155.0000          3.4900   
policerank                               3155.0000          0.4276   
donorrank                                3155.0000          0.3791   
goldrank                                 3155.0000          0.0158   
current_month_votes                      3155.0000          0.4282   
plan_player_time_total_raw               3155.0000  115084827.3195   
plan_player_time_mon

In [4]:
# Create Data Prepare instance and run preparation steps on data
T2_DataPreparer = McaDataPrepare(inputFilePath="../data/gatheringoutput/1758949983.598/PlayerData.csv", autoPrepare=True, outputMode=McaOutputMode.FINAL, storageMode=McaStorageMode.INSTANCE, hashMode=McaHashMode.NONE)

A New MCA Data Preparer has been created!
-> AutoPrepare Mode: True
-> Output Mode: McaOutputMode.FINAL
-> Storage Mode: McaStorageMode.INSTANCE
-> Hash Mode: McaHashMode.NONE
-> Input File Path: ../data/gatheringoutput/1758949983.598/PlayerData.csv
-> Timestamp Folder Path: ../data/gatheringoutput/1758949983.598
-> Gather Data Folder Path: ../data/gatheringoutput
-> Data Directory Path: ../data
-> Relative Input File Path: 1758949983.598/PlayerData.csv
-> Output Folder Path (Public): ../data/prepared/public/
-> Output Folder Path (Private): ../data/prepared/private/


In [5]:
# Print Analysis of Dataframe
T2_DataPreparer.analyzeData()

                                             count            mean  \
balance                                  3500.0000     900126.0348   
lw_rev_total                             3500.0000     137047.8791   
lw_rev_phase                             3500.0000      49604.0000   
leaderboard_position_chems_all           3500.0000        131.9700   
leaderboard_position_chems_week          3500.0000          1.0094   
leaderboard_position_police_all          3500.0000          0.6769   
leaderboard_position_police_week         3500.0000          0.0043   
chemrank                                 3500.0000          3.4506   
policerank                               3500.0000          0.4274   
donorrank                                3500.0000          0.3583   
goldrank                                 3500.0000          0.0143   
current_month_votes                      3500.0000          2.2611   
plan_player_time_total_raw               3500.0000  116063434.6826   
plan_player_time_mon

In [6]:
# Combine the two prepared dataframes into a single dataframe for modeling
DataCombiner = McaDataCombine(inputFilePath1="../data/prepared/private/1757051828.555/PlayerData.csv", inputFilePath2="../data/prepared/private/1758949983.598/PlayerData.csv", outputFileName="PlayerData_SeptemberTraining.csv", autoCombine=True, outputMode=McaOutputMode.FINAL, storageMode=McaStorageMode.INSTANCE)

A New MCA Data Combiner has been created!
-> AutoCombine Mode: True
-> Output Mode: McaOutputMode.FINAL
-> Storage Mode: McaStorageMode.INSTANCE
-> Input File Path 1: ../data/prepared/private/1757051828.555/PlayerData.csv
-> Input File Path 2: ../data/prepared/private/1758949983.598/PlayerData.csv
-> Output Folder Path: ../data/combined
-> Public Output Folder Path: ../data/combined/public
-> Private Output Folder Path: ../data/combined/private


In [7]:
# Print Analysis of Combined Dataframe
DataCombiner.analyzeData()

                                        count         mean            std  \
balance_t1                          3155.0000  916036.5265  15420454.4273   
lw_rev_total_t1                     3155.0000  225594.7759   1835316.8063   
lw_rev_phase_t1                     3155.0000   77651.2425    669104.0767   
leaderboard_position_chems_all_t1   3155.0000     117.2808       231.8771   
leaderboard_position_chems_week_t1  3155.0000       0.9458         6.9609   
...                                       ...          ...            ...   
chemrank_change                     3155.0000       0.3445         4.9270   
policerank_change                   3155.0000       0.0513         2.5551   
donorrank_change                    3155.0000       0.0010         0.0398   
goldrank_change                     3155.0000       0.0067         0.0813   
churn                               3155.0000       0.3746         0.8448   

                                       min     25%        50%         75%  