# Bloc 3 - Analyse prédictive de données structurées par l'intelligence artificielle - Conversion rate challenge

## Introduction

www.datascienceweekly.org is a free newsletter helping people keeping up with the latest developments in data science. It is curated by independent data scientists and anyone can simply register on the website with an email address to receive weekly news.

### Problematic

wwww.datascienceweekly.org aims at improving the newsletter's conversion rate.

The creators of the newsletter would like to better understand the behavior of the users visiting their website. 

### Scope

To improve the number of subscribers to their newsletter, the creators of the newsletter designed a competion to get the best predictive model for conversion (using f1-score for rankings). They open-sourced a dataset containing information about the traffic on their website.

Competitors were asked to develop the best model for prediction and to analyse the parameters of their model to identify features that would explain the users behavior in respect to conversion.

### Aim and objectives

Overall aim: Build a machine learning model able to estimate the conversion rate and identify levers of action.

Objectives:
- 1 - Make an EDA and preprocess data for machine learning.
- 2 - Train a logistic regression model (baseline) and assess its performance.
- 3 - Improve the model's f1-score
- 4 - Make predictions on the unlabelled test set
- 5 - Make recommendations to improve the conversion rate

##
## Methods

### 1 - Library import

### 2 - File reading and basic exploration

The dataset was composed of records for 284580 visitors of the website. It contained 6 features: the country of the readers, their age, whether they are a new user or not, the traffic source, the total number of pages visited, and finally the target variable, conversion. 

The initial inspection of the dataset revealed that it did not contain any missing value, and that the age variable did contain outliers.

### 3 - Exploratory data analysis

As the target variable was categorical, probability distributions for each variable and for subscribers and non-subscribers were plotted.

### 4 - Baseline model (linear regression) with one feature

From the EDA, "total_pages_visited" was identified as the most useful feature. A baseline model was trained by using at first only this feature using a simple (univariate) logistic regression. Its performances were assessed.

### 5 - Optimized model

Firstly, rows containing outliers in the age variable (age above mean + 3 * std) were droped (1017 rows). Then, a step of feature engineering was included. Features were created from the age and the number of pages variables. Features were also created from combinations of two original features. The resulting dataset contained 283563 rows for 136 features.

Secondly, data was processed for machine learning and feature selection was performed using a strategy of forward sequential selection (with optional exclusion of already selected features at each iteration).

Finally, a logistic regression model was trained with the best combination of features, and its performances assesed after fine-tuning of its regularization strength.

### 6 - Predictions on test data

The entire train dataset and the unlabelled test dataset were preprocessed as done for the model optimization, the optimized model was then trained and predictions were made form the unlabelled test data.

### 7 - Feature importance

The feature importance of the optimized model was displayed from the model coefficients.

##
## Conclusion

The baseline model for the prediction of the conversion rate could be improved (with regards to the f1-score). A single feature was identified as the main factor influencing the subscription to the newsletter.

The f1-scores obtained from the baseline model (univariate logistic regression) were 0.6938 for the train set and 0.7060 for the test set. After cleaning of the data, engineering of new features, feature selection and fine-tuning, theses scores could be increased to 0.7652 for the train set and 0.7712 for the test set (optimized model).

Regarding feature importance, the most important feature by far for conversion rate prediction was the total number of pages visited by the readers. Readers that visit many pages are more likely to subscribe to the newsletter.

To increase the number of subscriptions to their newsletter, the team of www.datascienceweekly.org could try to increase the traffic between the different pages of their website. For example, they could highlight the most read content on their home page and make content suggestions based on the pages that were already visited by the reader. Since the subscription form is already present on all the pages of their website, this strategy could encourage readers to subscribe.

##
## Code

### 1 - Library import

In [None]:
### 1 - library import ### ----

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import  OneHotEncoder, StandardScaler, LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score, confusion_matrix
from sklearn.model_selection import cross_val_score, GridSearchCV
from feature_engine.selection import DropDuplicateFeatures
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from sklearn.model_selection import StratifiedKFold

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots


###
### 2 - File reading and basic exploration

In [None]:
### 2 - file reading and basic exploration - import dataset ### ----

# read data
data = pd.read_csv("cnm_bloc3-2_data_train.csv")
data_test = pd.read_csv("cnm_bloc3-2_data_test.csv")


In [None]:
### 2 - file reading and basic exploration - get basic stats ### ----

# print shape of data
print("Number of rows: {}".format(data.shape[0]))
print("Number of columns: {}".format(data.shape[1]))
print()

# display dataset
pd.set_option('display.max_columns', None)
print("Dataset display: ")
display(data.head())
print()

# display basic statistics
print("Basics statistics: ")
data_desc = data.describe(include='all')
display(data_desc)
print()


In [None]:
### 2 - file reading and basic exploration - get percentage of missing values ### ----

# check wether some columns are full of NaNs
column_nan_full = data.columns[data.isnull().all()]
column_nb = len(column_nan_full)

# get percentage of missing values in columns
percent_nan_col = data.isnull().sum() / data.shape[0] * 100

# check wether some rows are full of NaNs
row_nan_count = pd.Series([data.loc[i,:].isnull().sum() for i in range(0, data.shape[0])])
row_nan_full = row_nan_count.index[row_nan_count == data.shape[1]]
row_nb = len(row_nan_full)

# print report
print("COLUMNS")
print("{} columns out of {} are fully filled with missing values".format(column_nb,data.shape[1]))
print("Percentage of missing values per column:\n{}".format(percent_nan_col))
print()
print("ROWS")
print("{} rows out of {} are fully filled with missing values".format(row_nb,data.shape[0]))


###
### 3 - Exploratory data analysis

In [None]:
### 3 - exploratory data analysis - sample dataset ### ----

# the dataset is quite big: create a sample of the dataset before making any visualizations
data_sample = data.sample(10000)


In [None]:
### 3 - exploratory data analysis - plot univariate analysis ### ----

# set figure to make subplots
fig1 = make_subplots(
    rows = 2,
    cols = 3,
    subplot_titles = (
        "A. Age",
        "B. Total pages visited",
        "",
        "D. Country",
        "E. New user",
        "F. Source"),
    column_widths = [0.25, 0.25, 0.25],
    horizontal_spacing = 0.15)

# plot distribution of age
fig1.add_trace(go.Histogram(
        name = "No conversion",
        x = data_sample.loc[data["converted"] == 0,"age"],
        histnorm = 'probability',
        marker_color = px.colors.qualitative.Vivid[0],
        showlegend = True),
        row = 1, col = 1)
fig1.add_trace(go.Histogram(
        name = "Conversion",
        x = data_sample.loc[data["converted"] == 1,"age"],
        histnorm = 'probability',
        marker_color = px.colors.qualitative.Vivid[1],
        showlegend = True),
        row = 1, col = 1)

# plot distribution of total pages visited
fig1.add_trace(go.Histogram(
        name = "No conversion",
        x = data_sample.loc[data["converted"] == 0,"total_pages_visited"],
        histnorm = 'probability',
        marker_color = px.colors.qualitative.Vivid[0],
        showlegend = False),
        row = 1, col = 2)
fig1.add_trace(go.Histogram(
        name = "Conversion",
        x = data_sample.loc[data["converted"] == 1,"total_pages_visited"],
        histnorm = 'probability',
        marker_color = px.colors.qualitative.Vivid[1],
        showlegend = False),
        row = 1, col = 2)

# plot categorical variables
features_cat = ["country", "new_user", "source"]
[fig1.add_trace(go.Histogram(
        name = "No conversion",
        x = data_sample.loc[data["converted"] == 0,features_cat[i]],
        histnorm = 'probability',
        marker_color = px.colors.qualitative.Vivid[0],
        showlegend = False),
        row = 2, col = i+1) for i in [0, 1, 2]]
[fig1.add_trace(go.Histogram(
        name = "Conversion",
        x = data_sample.loc[data["converted"] == 1,features_cat[i]],
        histnorm = 'probability',
        marker_color = px.colors.qualitative.Vivid[1],
        showlegend = False),
        row = 2, col = i+1) for i in [0, 1, 2]]


# update layout
fig1.update_annotations(font_size = 15)
fig1.update_xaxes(tickfont = dict(size = 10))
fig1.update_yaxes(title = "Probability", tickfont = dict(size = 10))
fig1.update_layout(
        margin = dict(l = 90, t= 120),
        title_text = "Figure 1. Univariate analysis",
        title_x = 0.5,
        title_y = 0.95,
        title_font_size = 18,
        xaxis = dict(title = "Age (Years)"),
        xaxis2 = dict(title = "Total pages visited (Count)"),
        xaxis4 = dict(title = "Country"),
        xaxis5 = dict(title = "New user"),
        xaxis6 = dict(title = "Source"),
        yaxis = dict(range = [0, 0.1], tickvals = [0, 0.02, 0.04, 0.06, 0.08]),
        yaxis2 = dict(range = [0, 0.19], tickvals = [0, 0.05, 0.10, 0.15]),
        yaxis4 = dict(range = [0, 1], tickvals = [0, 0.2, 0.4, 0.6, 0.8]),
        yaxis5 = dict(range = [0, 1], tickvals = [0, 0.2, 0.4, 0.6, 0.8]),
        yaxis6 = dict(range = [0, 0.75], tickvals = [0, 0.2, 0.4, 0.6]),
        legend = dict(
            yanchor = "top",
            y = 0.83,
            xanchor = "left",
            x = 0.77,
            font = dict(size = 11)),
        plot_bgcolor = "rgba(0,0,0,0)",
        paper_bgcolor = "rgb(232,232,232)",
        width = 800,
        height = 600)

fig1.show()


###
### 4 - Baseline model (logistic regression) with one feature

In [None]:
### 4 - baseline model (logistic regression) with one feature - preprocess data for machine learning ### ----

# separate target variable Y from features X
X = data["total_pages_visited"]
Y = data["converted"]

# divide dataset into train and test sets
X_train, X_test, Y_train, Y_test = train_test_split(X.array.reshape(-1, 1), Y, test_size = 0.1, random_state = 0)

# scale numeric feature
featureencoder = StandardScaler()
X_train = featureencoder.fit_transform(X_train)
X_test = featureencoder.transform(X_test)


In [None]:
### 4 - baseline model (logistic regression) with one feature - train model and asses performance ### ----

# train model
classifier1 = LogisticRegression()
classifier1.fit(X_train, Y_train)

# perform 5-fold cross-validation to evaluate the generalized f1-score 
scores1 = cross_val_score(classifier1, X_train, Y_train, scoring = "f1", cv = 5)
print('Cross-validated f1-score: ', scores1.mean())
print('Standard deviation: ', scores1.std())
print()

# make predictions on train and test sets
Y_train_pred1 = classifier1.predict(X_train)
Y_test_pred1 = classifier1.predict(X_test)

# assess performance and print report
f1_score_train1 = f1_score(Y_train, Y_train_pred1)
f1_score_test1 = f1_score(Y_test, Y_test_pred1)
print("f1-score on train set: ", f1_score_train1)
print("f1-score on test set: ", f1_score_test1)
print()
print("Confusion matrix on train set:")
print(confusion_matrix(Y_train, Y_train_pred1))
print()
print("Confusion matrix on test set:")
print(confusion_matrix(Y_test, Y_test_pred1))


###
### 5 - Optimized model

In [None]:
### 5 - optimized model - drop rows containing outliers ### ----

# copy data for safety
data1 = data.copy()

# rename content of "new_user" (for automatic detection as categorical feature)
data1["new_user"] = data1["new_user"].apply(lambda x: "No" if x == 0 else "Yes")

# set bounds to identify outliers
column_mean = data1["age"].mean()
column_std = data1["age"].std()
upper_bond = column_mean + 3 * column_std

# get index of rows to drop
mask_drop = data1["age"] > upper_bond
index_drop = data1.index[mask_drop]

# drop rows
data1 = data1.drop(index_drop, axis = 0)

# count rows that were droped
drop_nb = len(index_drop)

# print report
print("Number of rows with outliers that were dropped: {}".format(drop_nb))


In [None]:
### 5 - optimized model - engineer features ### ----

# part 1 - create features from "age" and "total_pages_visited"

data1["age_2"] = data1["age"]**2
data1["age_3"] = data1["age"]**3
data1["age_4"] = data1["age"]**4
data1["age_5"] = 1 / data1["age"]
data1["age_6"] = 1 / data1["age"]**2

data1["page_2"] = data1["total_pages_visited"]**2
data1["page_3"] = data1["total_pages_visited"]**3
data1["page_4"] = data1["total_pages_visited"]**4
data1["page_5"] = 1 / data1["total_pages_visited"]
data1["page_6"] = 1 / data1["total_pages_visited"]**2

# part 2 - create combinations of two features

# create masks
mask1 = data1["country"] == "US"
mask2 = data1["country"] == "UK"
mask3 = data1["country"] == "Germany"
mask4 = data1["country"] == "China"

mask5 = data1["age"] < 26
mask6 = (data1["age"] >= 26) & (data["age"] <= 46)
mask7 = data1["age"] > 46

mask8 = data1["new_user"] == "No"
mask9 = data1["new_user"] == "Yes"

mask10 = data1["total_pages_visited"] < 5
mask11 = (data1["total_pages_visited"] >= 5) & (data["total_pages_visited"] <= 12)
mask12 = data1["total_pages_visited"] > 12

mask13 = data1["source"] == "Ads"
mask14 = data1["source"] == "Seo"
mask15 = data1["source"] == "Direct"

# create dataframe containing mask names
mask_ids = ["mask1", "mask2", "mask3", "mask4", "mask5", "mask6", "mask7", "mask8", "mask9", "mask10", "mask11",
    "mask12", "mask13", "mask14", "mask15"]
mask_names = ["US", "UK", "Germany", "China", "young", "midage", "old", "old_user", "new_user", "few", "mid", 
    "many", "Ads", "Seo", "Direct"]

mask_ids = pd.DataFrame(mask_ids, columns = ["mask"])
mask_names = pd.DataFrame(mask_names, columns = ["name"])
masks = pd.concat([mask_ids, mask_names], axis = 1)

# create combinations of features
for i in range(0, masks.shape[0]):

    mask_first = globals()[masks.loc[i,"mask"]]
    
    for j in range(i, masks.shape[0]):

        mask_second = globals()[masks.loc[j,"mask"]]
        
        mask_current = mask_first | mask_second
        feature_name = masks.loc[i,"name"] + "_or_" + masks.loc[j,"name"]

        feature_current = pd.Series("no", index = data1.index, name = feature_name)
        feature_current[mask_current] = "yes"
        data1 = pd.concat([data1, feature_current], axis = 1)
        

In [None]:
### 5 - optimized model - preprocess data for machine learning ### ----

# separate target variable Y from features X
X2 = data1.drop(["converted"], axis = 1)
Y2 = data1["converted"]

# drop duplicated features
transformer2 = DropDuplicateFeatures()
transformer2.fit(X2)
X2 = transformer2.transform(X2)

# get features by category
features_all2 = X2.columns
features_num2 = X2._get_numeric_data().columns
features_cat2 = features_all2.drop(features_num2)

# divide dataset into train and test sets
X_train2, X_test2, Y_train2, Y_test2 = train_test_split(X2, Y2, test_size = 0.1, random_state = 0)

# create preprocessor object from pipelines for numeric and categorical features
numeric_transformer2 = Pipeline(steps = [
    ("scaler", StandardScaler())
])
categorical_transformer2 = Pipeline(steps = [
    ("encoder", OneHotEncoder(drop = "first", sparse_output = False))
])
preprocessor2 = ColumnTransformer(transformers = [
    ("num", numeric_transformer2, features_num2),
    ("cat", categorical_transformer2, features_cat2)
])
preprocessor2 = preprocessor2.set_output(transform = "pandas")

# scale numeric features and encode categorical features
X_train2 = preprocessor2.fit_transform(X_train2)
X_test2 = preprocessor2.transform(X_test2)


In [None]:
### 5 - optimized model - select features ### ----

# select features sequentially (forward, floating variant)
sffs = SFS(
    LogisticRegression(max_iter = 2000), 
    k_features = 15,
    forward = True,
    floating = True, 
    scoring = 'f1',
    cv = StratifiedKFold(5),
    n_jobs = -2)
sffs = sffs.fit(X_train2, Y_train2)

# store results in a dataframe
sffs_results = pd.DataFrame.from_dict(sffs.get_metric_dict()).T

# plot results
fig2 = go.Figure()
fig2.add_trace(go.Scatter(
    x = sffs_results.index,
    y = sffs_results["avg_score"],
    mode = "lines+markers",
    marker_color = px.colors.qualitative.Vivid[1],
    error_y = dict(array = sffs_results["std_dev"], visible = True)
))

# update layout
fig2.update_xaxes(tickfont = dict(size = 10))
fig2.update_yaxes(tickfont = dict(size = 10))
fig2.update_layout(
        margin = dict(l = 120, t= 100),
        title_text = "Figure 2. Sequential Floating Forward Selection",
        title_x = 0.5,
        title_y = 0.95,
        title_font_size = 18,
        xaxis = dict(title = "Number of features", range = [0, 16], tickvals = np.arange(16), showgrid = False,
            zeroline = False),
        yaxis = dict(title = "Performance (f1-score)", range = [0.699, 0.781], 
            tickvals = [0.70, 0.72, 0.74, 0.76, 0.78]),
        showlegend = False,
        plot_bgcolor = "rgba(0,0,0,0)",
        paper_bgcolor = "rgb(232,232,232)",
        width = 800,
        height = 400)

fig2.show()


In [None]:
### 5 - optimized model - train model and asses performance ### ----

# select features and update train and test sets
features_selected = list(sffs_results.loc[7,"feature_names"])
# use the following commented command if you do not want to run the previous cell (about 80 minutes to run)
#features_selected = ['num__age_5','num__page_5','cat__US_or_young_yes','cat__China_or_China_yes',
# 'cat__old_or_Direct_yes','cat__old_user_or_few_yes','cat__old_user_or_mid_yes']
X_train2 = X_train2[features_selected]
X_test2 = X_test2[features_selected]

# tune c (inverse of regularization strength) with gridsearch
params = {
    'C': [0.1, 0.5, 1, 5, 10],
}
gridsearch = GridSearchCV(LogisticRegression(), param_grid = params, cv = 5, scoring = "f1", n_jobs = -2)
gridsearch.fit(X_train2, Y_train2)
print("Best hyperparameter: ", gridsearch.best_params_)
print()

# perform 5-fold cross-validation to evaluate the generalized f1-score 
scores2 = cross_val_score(LogisticRegression(C = gridsearch.best_params_["C"]), X_train2, Y_train2, 
    scoring = "f1", cv = 5)
print('Cross-validated f1-score: ', scores2.mean())
print('Standard deviation: ', scores2.std())
print()

# train model
classifier2 = LogisticRegression(C = gridsearch.best_params_["C"])
classifier2.fit(X_train2, Y_train2)

# make predictions on train and test sets
Y_train_pred2 = classifier2.predict(X_train2)
Y_test_pred2 = classifier2.predict(X_test2)

# assess performance and print report
f1_score_train2 = f1_score(Y_train2, Y_train_pred2)
f1_score_test2 = f1_score(Y_test2, Y_test_pred2)
print("f1-score on train set: ", f1_score_train2)
print("f1-score on test set: ", f1_score_test2)
print()
print("Confusion matrix on train set:")
print(confusion_matrix(Y_train2, Y_train_pred2))
print()
print("Confusion matrix on test set:")
print(confusion_matrix(Y_test2, Y_test_pred2))


###
### 6 - Predictions on test data

In [None]:
### 6 - predictions on test data - print selected features ### ----

print("Selected features:\n{}".format(sffs_results.loc[7,"feature_names"]))


In [None]:
### 6 - predictions on test data - preprocess test data ### ----

# part 1 - create features from "age" and "total_pages_visited"

data_test["age_5"] = 1 / data_test["age"]
data_test["page_5"] = 1 / data_test["total_pages_visited"]


# part 2 - create combinations of two features

# create masks
mask1 = data_test["country"] == "US"
mask4 = data_test["country"] == "China"

mask5 = data_test["age"] < 26
mask7 = data_test["age"] > 46

mask8 = data_test["new_user"] == "No"

mask10 = data_test["total_pages_visited"] < 5
mask11 = (data_test["total_pages_visited"] >= 5) & (data_test["total_pages_visited"] <= 12)

mask15 = data_test["source"] == "Direct"

# create dataframe containing mask names
mask_ids = ["mask1", "mask4", "mask5", "mask7", "mask8", "mask10", "mask11", "mask15"]
mask_names = ["US", "China", "young", "old", "old_user", "few", "mid", "Direct"]

mask_ids = pd.DataFrame(mask_ids, columns = ["mask"])
mask_names = pd.DataFrame(mask_names, columns = ["name"])
masks = pd.concat([mask_ids, mask_names], axis = 1)

# create combinations of features
for i in range(0, masks.shape[0]):

    mask_first = globals()[masks.loc[i,"mask"]]
    
    for j in range(i, masks.shape[0]):

        mask_second = globals()[masks.loc[j,"mask"]]
        
        mask_current = mask_first | mask_second
        feature_name = masks.loc[i,"name"] + "_or_" + masks.loc[j,"name"]

        feature_current = pd.Series("no", index = data_test.index, name = feature_name)
        feature_current[mask_current] = "yes"
        data_test = pd.concat([data_test, feature_current], axis = 1)


In [None]:
### 6 - predictions on test data - preprocess data for machine learning ### ----

# separate target variable Y from features X
X3 = data1.drop(["converted"], axis = 1)
Y3 = data1["converted"]

# keep only the 7 selected features
selected_features3 = ['age_5', 'page_5', 'US_or_young', 'China_or_China', 'old_or_Direct', 'old_user_or_few', 
    'old_user_or_mid']
X3 = X3[selected_features3]
X_without_labels = data_test[selected_features3]

# get features by category
features_all3 = X3.columns
features_num3 = X3._get_numeric_data().columns
features_cat3 = features_all3.drop(features_num3)

# create preprocessor object from pipelines for numeric and categorical features
numeric_transformer3 = Pipeline(steps = [
    ("scaler", StandardScaler())
])
categorical_transformer3 = Pipeline(steps = [
    ("encoder", OneHotEncoder(drop = "first", sparse_output = False))
])
preprocessor3 = ColumnTransformer(transformers = [
    ("num", numeric_transformer3, features_num3),
    ("cat", categorical_transformer3, features_cat3)
])
preprocessor3 = preprocessor3.set_output(transform = "pandas")

# scale numeric features and encode categorical features
X3 = preprocessor3.fit_transform(X3)
X_without_labels = preprocessor3.transform(X_without_labels)


In [None]:
### 6 - predictions on test data - train model and make predictions ### ----

# train model
classifier3 = LogisticRegression(C = gridsearch.best_params_["C"])
classifier3.fit(X3, Y3)

# make predictions and dump to file
data3 = {
    'converted': classifier3.predict(X_without_labels)
}
Y_predictions = pd.DataFrame(columns = ['converted'], data = data3)
Y_predictions.to_csv('conversion_data_test_predictions_CELINE-model1.csv', index = False)


###
### 7 - Feature importance

In [None]:
### 7 - feature importance ### ----

# get column names from the preprocessor
column_names3 = []
for name, pipeline, features_list in preprocessor3.transformers_: 
    if name == 'num': 
        features = features_list 
    else: 
        features = pipeline.named_steps['encoder'].get_feature_names_out() 
    column_names3.extend(features)

# store coefficients in a dataframe
coefs3 = pd.DataFrame(index = range(0,len(classifier3.coef_[0,:])), columns = ["features", "coefficients"])
coefs3["features"] = column_names3
coefs3["coefficients"] = abs(classifier3.coef_[0,:])

# get feature importance
feature_importance3 = coefs3.sort_values("coefficients", ascending = False).reset_index(drop = True)

# plot feature importance
fig3 = go.Figure([go.Bar(
    x = feature_importance3["features"],
    y = feature_importance3["coefficients"],
    marker_color = px.colors.qualitative.Vivid)])

# update layout
fig3.update_xaxes(tickfont = dict(size = 10), tickangle = 90)
fig3.update_yaxes(tickfont = dict(size = 10))
fig3.update_layout(
        margin = dict(l = 120),
        title_text = "Figure 3. Feature importance",
        title_x = 0.5,
        title_y = 0.95,
        title_font_size = 18,
        xaxis = dict(title = "Features"),
        yaxis = dict(title = "Coefficients", range = [-1, 21], tickvals = [0, 5, 10, 15, 20]),
        showlegend = False,
        plot_bgcolor = "rgba(0,0,0,0)",
        paper_bgcolor = "rgb(232,232,232)",
        width = 800,
        height = 400)

fig3.show()
