# Predictive Model

We are going to implement the first stage of this entire Digital Marketing process, which would be the construction of a predictive model.


In [None]:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
import os
import matplotlib.pylab as plt
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score


These lines import the libraries needed for data analysis and modeling. The imported libraries are:
- **Pandas**: for data manipulation and analysis.
- **numpy**: for numerical operations.
- **OS**: For operations related to the operating system.
- **matplotlib.pylab**: for data visualization.
- **scikit-learn**: for machine learning modeling and evaluation.


## Function to Load Data

This function loads the data from a CSV file specified by the file path (`file_path`) and returns a pandas DataFrame.


In [None]:
def load_data(file_path):
    return pd.read_csv(file_path)


## Function to Inspect Data

This function prints:
- The data types for each column.
- The first five rows of the DataFrame.
- A statistical summary of the data.
- The correlation of each column with the 'BUY' column.


In [None]:
def inspect_data(data):
    print(data.dtypes)
    print(data.head())
    print(data.describe())
    print(data.corr()['BUY'])


## Function to Prepare Data

This function selects the predictor columns of interest, sets the target column, and splits the data into training and test sets using `train_test_split`, where 30% of the data is used for testing.


In [None]:
def prepare_data(data):
    predictors = data[['Read_Review', 'Compare_Products', 'Add_to_List', 'Save_for_Later', 'Personalized', 'View_Similar']]
    targets = data.BUY
    return train_test_split(predictors, targets, test_size=0.3)


## Function to Train Model

This function:
- Creates a Gaussian Naïve Bayes (GaussianNB) model.
- Trains the model with the training data.
- Returns the trained model.


In [None]:
def train_model(X_train, y_train):
    model = GaussianNB()
    model.fit(X_train, y_train)
    return model


## Function to Evaluate Model

This function uses the model to make predictions on the test set and prints the confusion matrix and the accuracy score. It returns the predictive probabilities for the test set.


In [None]:
def evaluate_model(model, X_test, y_test):
    predictions = model.predict(X_test)
    print(confusion_matrix(y_test, predictions))
    print(accuracy_score(y_test, predictions))
    return model.predict_proba(X_test)


## Function to Predict Propensity

This function converts the input data to a numpy array, adjusts the shape for the model, and returns the probability of propensity predicted by the model.


In [None]:
def predict_propensity(model, data):
    data = np.array(data).reshape(1, -1)
    return model.predict_proba(data)[:,1]


## Execution Pipeline

Setting the path of the CSV file, loading the data, inspecting, preparing for training and testing, training the model, and evaluating it.


In [None]:
file_path = "/path/to/market_app_correlated.csv"
prospect_data = load_data(file_path)
inspect_data(prospect_data)
X_train, X_test, y_train, y_test = prepare_data(prospect_data)
model = train_model(X_train, y_train)
probabilities = evaluate_model(model, X_test, y_test)
