<a href="https://colab.research.google.com/github/MarcoMinozzo/MarcoMinozzo/blob/main/consumer_behaviour.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Predictive Model
We are going to implement the first stage of this entire Digital Marketing process, which would be the construction of a predictive model.

In [None]:
# Imports
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
import os
import matplotlib.pylab as plt
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

These lines import the libraries needed for data analysis and modeling. The imported libraries are:


**Pandas**: for data manipulation and analysis.

**numpy**: for numerical operations.

**OS:** For operations related to the operating system.

**matplotlib.pylab**: for data visualization.

**scikit-learn:** for machine learning modeling and evaluation.

In [None]:
 # Function to load data

def load_data(file_path):
    return pd.read_csv(file_path)

This function loads the data from a CSV file specified by the file path (file_path) and returns a pandas DataFrame.

In [None]:
# Function to inspect data

def inspect_data(data):
     print(data.dtypes)
     print(data.head())
     print(data.describe())
     print(data.corr()['BUY'])

This function prints:

The data types for each column (data.dtypes).

The first five rows of the DataFrame (data.head()).

A statistical summary of the data (data.describe()).

The correlation of each column with the 'BUY' column (data.corr()['BUY']).

In [None]:
# Function to prepare data

def prepare_data(data):
# Correct the column name from 'Read_Reviews' to 'Read_Review'
   predictors = data[['Read_Review', 'Compare_Products', 'Add_to_List', 'Save_for_Later',     'Personalized', 'View_Similar']]
   targets = data. BUY
   return train_test_split(predictors, targets, test_size=0.3)

This function:

Seleciona as colunas preditoras de interesse (Read_Review, Compare_Products, Add_to_List, Save_for_Later, Personalized, View_Similar).

Define coluna alvo as BUY.

Splits the data into training and test sets using train_test_split, where 30% of the data is used for testing (test_size=0.3).

In [None]:
 # Function to train model

def train_model(X_train, y_train):
    model = GaussianNB()
    model.fit(X_train, y_train)
    return model

This function:

Create a Gaussian Naïve Bayes (GaussianNB) model.

Trains the model with the training data (X_train and y_train).

Returns the trained model.

In [None]:
# Function to evaluate model

def evaluate_model(model, X_test, y_test):
    predictions = model.predict(X_test)
    print(confusion_matrix(y_test, predictions))
    print(accuracy_score(y_test, predictions))
    return model.predict_proba(X_test)

This function:

Uses the model to make predictions on the test set (X_test).

Imprime a matriz de confusão (confusion_matrix) e a precisão do modelo (accuracy_score).

Returns the predictive probabilities for the test set (model.predict_proba(X_test)).

In [None]:
# Function to predict propensity

def predict_propensity(model, data):
    data = np.array(data).reshape(1, -1)
    return model.predict_proba(data)[:,1]

This function:

Converts the input data to a numpy array and adjusts the shape to be used in the model.

Returns the probability of propensity predicted by the model.

In [None]:
# Execution Pipeline

file_path = "/…/market_app_correlated.csv"
prospect_data = load_data(file_path)
inspect_data(prosptect_data)
X_train, X_test, y_train, y_test = prepare_data(prospect_data)
model = train_model(X_train, y_train)
probabilities = evaluate_model(model, X_test, y_test)

These lines:

Set the path of the CSV file.

Load the data from the CSV.

They inspect the uploaded data.

Prepare the data for training and testing.

They train the model with the training data.

Evaluate the model with the test data.

# **Now let's do some simulations**:
Predict propensity for new browsing data

In [None]:
new_browsing_data = [0, 0, 0, 0, 0, 0]
print("New User: propensity:", predict_propensity(model, new_browsing_data))

# Result:

In [None]:
 New User: propensity: [0.19087601]

That is, simply by entering and logging in to the website or application, the chance of buying that user is close to **19%.**
## Predict propensity after adding to list

In [None]:
add_to_list_data = [1, 1, 1, 0, 0, 0]
print("After Add_to_List: propensity:", predict_propensity(model, add_to_list_data))

# Result:

In [None]:
Full Interaction: propensity: [0.80887743]

For those users who have made all possible interactions, the chance of purchase rises to about** 81%.**
To wrap up this comprehensive exploration of consumer behavior analytics and its implementation within a large marketplace, it is evident that leveraging data science and machine learning can significantly enhance our understanding of consumer actions and preferences. By systematically collecting and analyzing user interaction data, we can predict purchasing behaviors with remarkable accuracy, enabling the creation of targeted marketing strategies that boost engagement and conversion rates. This case study not only demonstrates the practical application of these techniques but also underscores their potential to drive substantial business growth and competitive advantage in the dynamic landscape of e-commerce.