# VANGUARD AB TEST


## METADATA HELP

This comprehensive set of fields will guide your analysis, helping you unravel the intricacies of client behavior and preferences.

- **client_id**: Every client’s unique ID.
- **variation**: Indicates if a client was part of the experiment.
- **visitor_id**: A unique ID for each client-device combination.
- **visit_id**: A unique ID for each web visit/session.
- **process_step**: Marks each step in the digital process.
- **date_time**: Timestamp of each web activity.
- **clnt_tenure_yr**: Represents how long the client has been with Vanguard, measured in years.
- **clnt_tenure_mnth**: Further breaks down the client’s tenure with Vanguard in months.
- **clnt_age**: Indicates the age of the client.
- **gendr**: Specifies the client’s gender.
- **num_accts**: Denotes the number of accounts the client holds with Vanguard.
- **bal**: Gives the total balance spread across all accounts for a particular client.
- **calls_6_mnth**: Records the number of times the client reached out over a call in the past six months.
- **logons_6_mnth**: Reflects the frequency with which the client logged onto Vanguard’s platform over the last six months.


In [None]:
%load_ext autoreload
%autoreload 2 

In [729]:
from cleaning import *
from mining import *
from db_handling import *
import pandas as pd
from dotenv import load_dotenv
import os


### Load Configuration

In [None]:
# Load config.yaml
config = parse_config()

## Data Mining

In [731]:
#TODO: adapt function to remote url + save to sql to prevent large repo

In [732]:
# Creates a dictionary of all imported dataframes
all_dataframes = { name:import_data_from_config(config, name) for name in config['tables']}

In [None]:
display_dataFrames(all_dataframes, 'frame')

## Data Cleaning

In [734]:
#TODO: don't impose categories?

In [735]:
# Rename columns
all_dataframes = rename_columns(all_dataframes, config)

In [736]:
# Select columns
all_dataframes = select_columns(all_dataframes, config)

In [737]:
# Data Categorizing
all_dataframes = clean_categorical_data(all_dataframes, config)

In [738]:
#Convert types
all_dataframes = convert_types(all_dataframes, config)

In [None]:
display_dataFrames(all_dataframes,'head','dtypes', 'cat_count')

In [740]:
# Handle duplicates

In [741]:
# Handle missing values

### SQL EXPORT

In [742]:
# Load environment variables
load_dotenv()
db_password = os.getenv('SQL_PASSWORD')

In [743]:
# Create database if it doesn't exist
engine = create_db(db_password, config)

In [None]:
# Export tables to database if refresh is set to true
export_dataframes_to_sql(engine, all_dataframes, config)

## Data Re-import

In [745]:
# Import data from database
clients_df = import_data_from_sql(engine, 'clients')
experiment_df = import_data_from_sql(engine, 'experiment')
visits_df = import_data_from_sql(engine, 'visits')

In [None]:
display('clients :',clients_df, 'experiment :',experiment_df, 'visits :',visits_df)

## Data Exploration

In [747]:
# Handle outliers

In [748]:
#frequency tables

## Analysis

In [749]:
#TODO: consider binning / pd.cut / qcut for numerical data

## Visualizations

## Conclusions