<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#REQUIREMENTS" data-toc-modified-id="REQUIREMENTS-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>REQUIREMENTS</a></span><ul class="toc-item"><li><span><a href="#Import-Libraries" data-toc-modified-id="Import-Libraries-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Import Libraries</a></span></li><li><span><a href="#Functions" data-toc-modified-id="Functions-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Functions</a></span><ul class="toc-item"><li><span><a href="#Model-Evaluation" data-toc-modified-id="Model-Evaluation-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Model Evaluation</a></span></li></ul></li></ul></li><li><span><a href="#DATA" data-toc-modified-id="DATA-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>DATA</a></span><ul class="toc-item"><li><span><a href="#Dataset" data-toc-modified-id="Dataset-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Dataset</a></span></li><li><span><a href="#Feature-Engineering" data-toc-modified-id="Feature-Engineering-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Feature Engineering</a></span></li><li><span><a href="#Target" data-toc-modified-id="Target-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Target</a></span></li><li><span><a href="#Final-data-to-feed-the-model" data-toc-modified-id="Final-data-to-feed-the-model-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Final data to feed the model</a></span></li></ul></li><li><span><a href="#BASELINE" data-toc-modified-id="BASELINE-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>BASELINE</a></span></li></ul></div>

## REQUIREMENTS

### Import Libraries

In [1]:
from category_encoders import TargetEncoder

In [2]:
import eli5
from eli5.sklearn import PermutationImportance

In [3]:
import imblearn
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from imblearn.over_sampling import BorderlineSMOTE
from imblearn.over_sampling import ADASYN

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.model_selection import train_test_split, GridSearchCV, ParameterGrid
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score, recall_score, precision_score, roc_auc_score, roc_curve, accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.base import BaseEstimator,TransformerMixin
from sklearn.preprocessing import FunctionTransformer, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.feature_selection import SelectFromModel
from scipy import stats
import statsmodels.api as sm
import os
import pickle
from collections import Counter
import multiprocessing

### Functions

#### Model Evaluation

In [5]:
def evaluate(classifier,X_train, X_test, y_train, y_test):
    classifier.fit(X_train,y_train)
    predictions = classifier.predict(X_test)
    probabilities = classifier.predict_proba(X_test)
    print("TRAINING SCORE: " + str(classifier.score(X_train,y_train)))
    print("ACCURACY: " + str(accuracy_score(y_test, predictions)))
    print("PRECISION: " + str(precision_score(y_test, predictions)))
    print("RECALL: " + str(recall_score(y_test, predictions)))
    print("F1 SCORE: " + str(f1_score(y_test, predictions)))
    print("AUC: " + str(roc_auc_score(y_test,probabilities[:, 1])))

    tpr, fpr, thresolds = roc_curve(y_test, probabilities[:, 1])
    plt.plot(tpr, fpr)
    plt.xlabel('fpr')
    plt.ylabel('tpr')

## DATA

### Dataset

In [6]:
data_root="../data/"
datafile=os.path.join(data_root,'no_carrito_no_pedido_df_2019_2020_jan21.csv')
df=pd.read_csv(datafile)
df.drop('Unnamed: 0', axis=1, inplace=True)
print(df.shape)
print("")
df.info()

  interactivity=interactivity, compiler=compiler, result=result)


(3628293, 21)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3628293 entries, 0 to 3628292
Data columns (total 21 columns):
 #   Column                       Dtype  
---  ------                       -----  
 0   ga:productSKU                object 
 1   ga:dateHourMinute            int64  
 2   ga:pagePath                  object 
 3   ga:pageDepth                 int64  
 4   ga:userType                  object 
 5   ga:sessionCount              int64  
 6   ga:daysSinceLastSession      int64  
 7   ga:landingPagePath           object 
 8   ga:campaign                  object 
 9   ga:sourceMedium              object 
 10  ga:city                      object 
 11  ga:deviceCategory            object 
 12  ga:operatingSystem           object 
 13  ga:productListViews          int64  
 14  ga:productListClicks         int64  
 15  ga:productDetailViews        int64  
 16  ga:productAddsToCart         int64  
 17  ga:productAddsToCart_transf  int64  
 18  pPath_clean                

### Feature Engineering

In addition to feature engineering techniques already explained in previous Notebooks, in this case, we will add a new feature called 'Final_Price' as a result of 'Product_price' minus 'Web_Discount':

In [7]:
# Product SKU as string
df['ga:productSKU']=df['ga:productSKU'].astype('str')

# dateHourMinute as SIN and COS
df['ga:dateHourMinute']=pd.to_datetime(df['ga:dateHourMinute'],format='%Y%m%d%H%M')
df['dateTime_month']=df['ga:dateHourMinute'].dt.month
df['dateTime_dayofweek']=df['ga:dateHourMinute'].dt.dayofweek
df['dateTime_hour']=df['ga:dateHourMinute'].dt.hour
df['month_sin']=np.sin((df.dateTime_month-1)*(2.*np.pi/12)) # I substract minus 1 to 'df.dateTime_month' because its values are coded as 1 to 12 instead of 0 to 11
df['month_cos']=np.cos((df.dateTime_month-1)*(2.*np.pi/12)) # I substract minus 1 to 'df.dateTime_month' because its values are coded as 1 to 12 instead of 0 to 11
df['dayofweek_sin']=np.sin(df.dateTime_dayofweek*(2.*np.pi/7))
df['dayofweek_cos']=np.cos(df.dateTime_dayofweek*(2.*np.pi/7))
df['hour_sin']=np.sin(df.dateTime_hour*(2.*np.pi/24))
df['hour_cos']=np.cos(df.dateTime_hour*(2.*np.pi/24))
df.drop(['dateTime_month','dateTime_dayofweek','dateTime_hour'],axis=1, inplace=True)

# Source Medium as 2 columns:
source_medium = df['ga:sourceMedium'].str.split('/',expand=True)
source_medium.columns=['Source','Medium']
df= df.merge(source_medium,left_index=True,right_index=True,how='left')
del(source_medium)
df['Source']=df['Source'].astype('str').str.strip()
df['Medium']=df['Medium'].astype('str').str.strip()

# City
df['ga:city']=df['ga:city'].astype('str')

# Device
df['ga:deviceCategory']=df['ga:deviceCategory'].astype('str')

# Operating System
df['ga:operatingSystem']=df['ga:operatingSystem'].astype('str')

# User Type
df['Returning_Visitor']= np.where(df['ga:userType']=='Returning Visitor',1,0)

# Page Path, Detail_View and Landing Page Path as string
df['ga:pagePath']=df['ga:pagePath'].astype('str')
df['ga:pagePath'] = df['ga:pagePath'].apply(lambda x: x[:x.find("?pag")] if "?pag" in x else x)
df['Detail_View']=df['ga:pagePath'].apply(lambda url: 1 if url[-5:]=='.html' else 0)
df['ga:landingPagePath']=df['ga:landingPagePath'].astype('str')
df['ga:landingPagePath'] = df['ga:landingPagePath'].apply(lambda x: x[:x.find("?pag")] if "?pag" in x else x)

# Final_Price = Product_price - Product_price * Web_Discount = Product_price * (1 - Web_Discount)
df['Final_Price']=df['Product_price'] * (1 - df['Web_Discount'])

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3628293 entries, 0 to 3628292
Data columns (total 32 columns):
 #   Column                       Dtype         
---  ------                       -----         
 0   ga:productSKU                object        
 1   ga:dateHourMinute            datetime64[ns]
 2   ga:pagePath                  object        
 3   ga:pageDepth                 int64         
 4   ga:userType                  object        
 5   ga:sessionCount              int64         
 6   ga:daysSinceLastSession      int64         
 7   ga:landingPagePath           object        
 8   ga:campaign                  object        
 9   ga:sourceMedium              object        
 10  ga:city                      object        
 11  ga:deviceCategory            object        
 12  ga:operatingSystem           object        
 13  ga:productListViews          int64         
 14  ga:productListClicks         int64         
 15  ga:productDetailViews        int64         
 16  

In [9]:
df[['Product_price', 'Web_Discount', 'Final_Price']].sample(10)

Unnamed: 0,Product_price,Web_Discount,Final_Price
2370055,26.9,0.1,24.21
2273387,3.95,0.25,2.9625
3033703,44.9,0.0,44.9
1802243,93.0,0.25,69.75
2922121,22.7,0.0,22.7
932093,14.99,0.0,14.99
818624,,0.0,
2824945,46.0,0.15,39.1
764819,,0.15,
2484965,64.0,0.0,64.0


Final_Price column has been successfully created but, as we can see, we will have to deal with NaN imputation. We will fill them with Product_Price median minus Web_Discount ratio available (or not) during the event. In order to avoid data leakage we will do it some steps further with the help of SKLEARN Pipeline. 

### Target

In [10]:
df['ga:productAddsToCart_transf']=df["ga:productAddsToCart"].apply(lambda x: 1 if x>1 else x)

print("No. observations per class")
print(df['ga:productAddsToCart_transf'].value_counts())
print("")

print("% observations per class")
print(100*df['ga:productAddsToCart_transf'].value_counts(normalize=True))

No. observations per class
0    3586659
1      41634
Name: ga:productAddsToCart_transf, dtype: int64

% observations per class
0    98.852518
1     1.147482
Name: ga:productAddsToCart_transf, dtype: float64


### Final data to feed the model

In [11]:
# Data

features=['ga:pageDepth','ga:sessionCount','ga:daysSinceLastSession','ga:productSKU',\
          'month_sin','month_cos','dayofweek_sin','dayofweek_cos','hour_sin','hour_cos',\
          'Source','Medium','ga:city','ga:deviceCategory','ga:operatingSystem','Returning_Visitor',\
          'Product_price','Final_Price','Web_Discount','ga:pagePath','Detail_View','ga:landingPagePath']

y=df['ga:productAddsToCart_transf']
X=df[features]
X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=42)

## BASELINE

Unlike our previous classifiers, this one will not take into account 'Product_price' but 'Final_price' after applying 'Web_Discount' in effect during each event (observation). As told before, we will have to preprocess and impute NaNs with the Pipeline. Default transformers as SimpleImputer() will not work in this case, so we will build our own Custom Transformer:

In [38]:
%%time

class FinalPriceTransfomer(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None) : 
        return self
    def transform(self, X, y=None): 
        X_=X.copy()
        X_['Final_Price'].fillna(X_['Product_price']*(1 - X_['Web_Discount']), inplace=True)
        return X_


ProductPrice_preprocessing = ColumnTransformer([
    ('impute_median',SimpleImputer(strategy='median'),['Product_price']),
    ('FinalPrice_preprocessing', FinalPriceTransfomer(),['Final_Price','Product_price','Web_Discount' ])
])



pipeline=Pipeline([
    ('ProductPrice_preprocessing', ProductPrice_preprocessing)
])

X_train_transf= pipeline.fit_transform(X_train)
X_test_transf= pipeline.transform(X_test)

CPU times: user 2.08 s, sys: 95.2 ms, total: 2.18 s
Wall time: 2.19 s


In [40]:
test = pd.DataFrame(X_train_transf)
test.sample(10)

Unnamed: 0,0,1,2,3
1107929,27.55,,,0.0
52653,24.9,24.9,24.9,0.0
608279,2.5,2.25,2.5,0.1
2239176,27.55,,,0.15
482918,63.0,53.55,63.0,0.15
1321012,56.65,56.65,56.65,0.0
1914742,17.9,13.425,17.9,0.25
141218,14.5,14.5,14.5,0.0
1990595,48.0,48.0,48.0,0.0
617467,28.08,28.08,28.08,0.0


In [33]:
test = pd.DataFrame(X_train_transf)
test[test[0]==27.55]

Unnamed: 0,0,1,2,3
14,27.55,,,0.00
27,27.55,,,0.00
30,27.55,,,0.00
31,27.55,,,0.15
46,27.55,,,0.00
...,...,...,...,...
2721140,27.55,,,0.00
2721160,27.55,,,0.00
2721174,27.55,,,0.00
2721183,27.55,,,0.00


In [15]:
%%time

class FinalPriceTransfomer(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None) : 
        return self
    def transform(self, X, y=None): 
        X_=X.copy()
        X_['Final_Price'].fillna(X_['Product_price']*(1 - X_['Web_Discount']), inplace=True)
        return X_


ProductPrice_preprocessing = ColumnTransformer([
    ('impute_median',SimpleImputer(strategy='median'),['Product_price']),
    ('FinalPrice_preprocessing', FinalPriceTransfomer(),['Final_Price','Product_price','Web_Discount'])
], remainder='passthrough')



pipeline=Pipeline([
    ('ProductPrice_preprocessing', ProductPrice_preprocessing)
])

X_train_transf= pipeline.fit_transform(X_train)
X_test_transf= pipeline.transform(X_test)

CPU times: user 12.7 s, sys: 5.03 s, total: 17.7 s
Wall time: 17.7 s


In [29]:
pipeline.named_steps['ProductPrice_preprocessing'].transformers_[1][2]

['Final_Price', 'Product_price', 'Web_Discount']

In [19]:
X_train.columns

Index(['ga:pageDepth', 'ga:sessionCount', 'ga:daysSinceLastSession',
       'ga:productSKU', 'month_sin', 'month_cos', 'dayofweek_sin',
       'dayofweek_cos', 'hour_sin', 'hour_cos', 'Source', 'Medium', 'ga:city',
       'ga:deviceCategory', 'ga:operatingSystem', 'Returning_Visitor',
       'Product_price', 'Final_Price', 'Web_Discount', 'ga:pagePath',
       'Detail_View', 'ga:landingPagePath'],
      dtype='object')

In [17]:
pd.DataFrame(X_train_transf, columns=X_train.columns)

AssertionError: Number of manager items must equal union of block items
# manager items: 22, # tot_items: 23

In [21]:
pd.DataFrame(X_train_transf)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,13,14,15,16,17,18,19,20,21,22
0,3.47,3.47,3.47,0,1,1,0,1008,1,6.12323e-17,...,0.965926,google,organic,Santiago,desktop,Windows,0,/parafarmacia/es/,0,/parafarmacia/es/
1,47.2,35.4,47.2,0.25,10,1,0,6317,-0.866025,0.5,...,-0.707107,google,organic,Madrid,mobile,iOS,0,/es/903-top-ventas,0,/es/
2,160,160,160,0,2,1,0,7217,-0.5,0.866025,...,0.866025,google,organic,(not set),desktop,Windows,0,/es/,0,/es/
3,14.95,12.7075,14.95,0.15,6,25,13,6711,0,1,...,-0.965926,sendinblue,email,(not set),mobile,iOS,1,/es/,0,/es/
4,12.5,10.625,12.5,0.15,48,3,5,1127,-0.5,-0.866025,...,0.866025,cotilleando.com,referral,Valladolid,desktop,Macintosh,1,/parafarmacia/es/,0,/parafarmacia/es/
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2721214,23.89,23.89,23.89,0,22,4,0,1178,-0.5,0.866025,...,-0.707107,(direct),(none),Madrid,desktop,Windows,1,/es/821-dermocosmetica,0,/es/
2721215,59,59,59,0,2,1,0,7484,0,1,...,-0.866025,google,cpc,Madrid,mobile,Android,0,/es/,0,/es/
2721216,6.05,6.05,6.05,0,2,41,7,7382,-0.866025,0.5,...,-0.866025,l.instagram.com,referral,Madrid,mobile,iOS,1,/es/,0,/es/
2721217,53.75,53.75,53.75,0,9,1,0,1141,0,1,...,-0.258819,google,organic,Badajoz,desktop,Windows,0,/es/828-serum,0,/es/


In [18]:
%%time

ProductPrice_preprocessing = ColumnTransformer([
    ('impute_median',SimpleImputer(strategy='median'),['Product_price'])], remainder='passthrough')

pipeline=Pipeline([
    ('ProductPrice_preprocessing', ProductPrice_preprocessing),
])

X_train_transf= pipeline.fit_transform(X_train)
X_test_transf= pipeline.transform(X_test)

CPU times: user 10.9 s, sys: 4.38 s, total: 15.2 s
Wall time: 15.9 s


In [20]:
pd.DataFrame(X_train_transf, columns=X_train.columns)

Unnamed: 0,ga:pageDepth,ga:sessionCount,ga:daysSinceLastSession,ga:productSKU,month_sin,month_cos,dayofweek_sin,dayofweek_cos,hour_sin,hour_cos,...,ga:city,ga:deviceCategory,ga:operatingSystem,Returning_Visitor,Product_price,Final_Price,Web_Discount,ga:pagePath,Detail_View,ga:landingPagePath
0,3.47,1,1,0,1008,1,6.12323e-17,0.974928,-0.222521,0.258819,...,organic,Santiago,desktop,Windows,0,3.47,0,/parafarmacia/es/,0,/parafarmacia/es/
1,47.2,10,1,0,6317,-0.866025,0.5,0.781831,0.62349,-0.707107,...,organic,Madrid,mobile,iOS,0,35.4,0.25,/es/903-top-ventas,0,/es/
2,160,2,1,0,7217,-0.5,0.866025,0.781831,0.62349,-0.5,...,organic,(not set),desktop,Windows,0,160,0,/es/,0,/es/
3,14.95,6,25,13,6711,0,1,-0.974928,-0.222521,-0.258819,...,email,(not set),mobile,iOS,1,12.7075,0.15,/es/,0,/es/
4,12.5,48,3,5,1127,-0.5,-0.866025,0.433884,-0.900969,-0.5,...,referral,Valladolid,desktop,Macintosh,1,10.625,0.15,/parafarmacia/es/,0,/parafarmacia/es/
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2721214,23.89,22,4,0,1178,-0.5,0.866025,0.781831,0.62349,0.707107,...,(none),Madrid,desktop,Windows,1,23.89,0,/es/821-dermocosmetica,0,/es/
2721215,59,2,1,0,7484,0,1,-0.433884,-0.900969,0.5,...,cpc,Madrid,mobile,Android,0,59,0,/es/,0,/es/
2721216,6.05,2,41,7,7382,-0.866025,0.5,-0.781831,0.62349,-0.5,...,referral,Madrid,mobile,iOS,1,6.05,0,/es/,0,/es/
2721217,53.75,9,1,0,1141,0,1,-0.433884,-0.900969,-0.965926,...,organic,Badajoz,desktop,Windows,0,53.75,0,/es/828-serum,0,/es/


In [17]:
%%time

class FinalPriceTransfomer(BaseEstimator, TransformerMixin):
    def __init__(self,Final_Price, Product_price, Web_Discount): 
        #self.something enables you to include the passed parameters
        #as object attributes and use it in other methods of the class
        self.Final_Price = Final_Price
        self.Product_price = Product_price
        self.Web_Discount = Web_Discount
    def fit(self, X, y=None) : 
        return self
    def transform(self, X, y=None): 
        X_=X.copy()
        X_[self.Final_Price].fillna(X_[self.Product_price]*(1 - X_[self.Web_Discount]), inplace=True)
        return X_


ProductPrice_preprocessing = ColumnTransformer([
    ('impute_median',SimpleImputer(strategy='median'),['Product_price'])], remainder='passthrough')

FinalPrice_preprocessing= Pipeline([
    ('FinalPrice_preprocessing', FinalPriceTransfomer('Final_Price', 'Product_price', 'Web_Discount'))
])

pipeline=Pipeline([
    ('ProductPrice_preprocessing', ProductPrice_preprocessing),
    ('FinalPrice_preprocessing', FinalPrice_preprocessing)
])

X_train_transf= pipeline.fit_transform(X_train)
X_test_transf= pipeline.transform(X_test)

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

In [7]:
class FinalPriceTransfomer(BaseEstimator, TransformerMixin):
    def __init__(self,Final_Price, Product_price, Web_Discount): 
        #self.something enables you to include the passed parameters
        #as object attributes and use it in other methods of the class
        self.Final_Price = Final_Price
        self.Product_price = Product_price
        self.Web_Discount = Web_Discount

    def fit(self, X, y=None) : 
        return self
  
    def transform(self, X, y=None): 
        X_=X.copy()
        X_[self.Final_Price].fillna(X_[self.Product_price].median()* (1 - X_[self.Web_Discount]), inplace=True)
        return X_

In [19]:
class FinalPriceTransfomer(BaseEstimator, TransformerMixin):
    def __init__(self,Final_Price, Product_price, Web_Discount): 
        #self.something enables you to include the passed parameters
        #as object attributes and use it in other methods of the class
        self.Final_Price = Final_Price
        self.Product_price = Product_price
        self.Web_Discount = Web_Discount

    def fit(self, X, y=None) : 
        self.map= X[self.Product_price].median()* (1 - X[self.Web_Discount])
        return self
  
    def transform(self, X, y=None): 
        X[self.Final_Price].fillna(self.map, inplace=True)
        return X

In [None]:
     def fit(self, X, y=None) : 
          self.map = X.groupby(self.by)[variable].mean()
          #self.map become an attribute that is, the map of values to
          #impute in function of index (corresponding table, like a dict)
          return self

     def transform(self, X, y=None) : 
          X[variable] = X[variable].fillna(value = X[by].map(self.map))
          #Change the variable column. If the value is missing, value should 
          #be replaced by the mapping of column "by" according to the map you
          #created in fit method (self.map)
          return X

Let's see if it works:

In [20]:
%%time

pipeline=Pipeline([
    ('FinalPriceTransformer', FinalPriceTransfomer('Final_Price', 'Product_price', 'Web_Discount'))
])

X_train_transf= pipeline.fit_transform(X_train)
X_test_transf= pipeline.transform(X_test)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().fillna(


CPU times: user 1.11 s, sys: 43.6 ms, total: 1.15 s
Wall time: 1.16 s


In [21]:
X_train_transf['Final_Price'].isna().any()

False

In [22]:
X_train_transf[['Product_price', 'Web_Discount', 'Final_Price']].sample(10)

Unnamed: 0,Product_price,Web_Discount,Final_Price
3048405,24.9,0.0,24.9
3094046,97.7,0.0,97.7
2990618,30.1,0.0,30.1
3556850,27.0,0.0,27.0
33354,37.0,0.0,37.0
2730598,35.72,0.1,32.148
779356,,0.1,24.795
1030384,39.9,0.15,33.915
2880573,26.9,0.0,26.9
2622313,11.5,0.0,11.5


In [23]:
X_train['Product_price'].median()

27.55

In [24]:
X_test['Product_price'].median()

27.65

In [25]:
X_test_transf[['Product_price', 'Web_Discount', 'Final_Price']].sample(10)

Unnamed: 0,Product_price,Web_Discount,Final_Price
2044257,42.9,0.25,32.175
433346,23.0,0.0,23.0
438980,,0.0,
2186314,84.0,0.25,63.0
580675,2.9,0.15,2.465
2068108,13.5,0.0,13.5
1299736,2.5,0.0,2.5
1046871,9.95,0.15,8.4575
1817828,24.9,0.0,24.9
986470,39.0,0.15,33.15
