# Stacking

The idea in this notebook is to implement stacking methods on the kickstarter data. Hopefully, to reach the ultimate score.

## Summary

<font color = grey>

- [With stolen models](#With-stolen-models)
 1. [Importing and preprocessing data](#Importing-and-preprocessing-data)
 2. [Training simple models withouth dimensionality reduction](#Training-simple-models-without-dimensionality-reduction)
    a. [Vanilla logistic regression](#Vanilla-logistic-regression)
    b. [Random forest](#Random-forest)
    c. [XGBoost](#XGBoost)
 3. [Stacking without dimensionality reduction](#Stacking-without-dimensionality-reduction)
    a. [Logistic regression](#Logistic-regression)
 

</font>

## With stolen models

First of all, we will design our functions with the help of the models stolen from the notebook found online. This will help us gain some time. In fact, the author already ran grid searches over her models (extremely expensive), so instead of starting over, we will just take the same models with the same parameters, without dimensionality reduction. After we built our proper functions, we will just have to start over with our one models that perform as accurately as possible on their own in order to get the best results with the full stack.

### Importing and preprocessing data 

In [None]:
# Importing the required libraries
import pandas as pd
pd.set_option('display.max_columns', 50) # Display up to 50 columns at a time
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import cm
plt.style.use('seaborn')
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 12,5
import glob # To read all csv files in the directory
import seaborn as sns
import calendar
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve, auc
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix, f1_score, precision_recall_fscore_support
import itertools
import time
import xgboost as xgb

In [2]:
df = pd.concat([pd.read_csv(f) for f in glob.glob('D:/Utilisateurs/Bastien/Documents/Cours/CentraleSupelec/Electifs/Machine Learning/Projet/kickstarter-analysis/data/Kickstarter*.csv')], ignore_index = True)

In [7]:
# Dropping columns that are mostly null
df.drop(['friends', 'is_backing', 'is_starred', 'permissions'], axis=1, inplace=True)

# Dropping columns that aren't useful
df.drop(['converted_pledged_amount', 'creator', 'currency', 'currency_symbol', 'currency_trailing_code', 'current_currency', 'fx_rate', 'photo', 'pledged', 'profile', 'slug', 'source_url', 'spotlight', 'state_changed_at', 'urls', 'usd_type'], axis=1, inplace=True)

In [8]:
# Converting dates from unix to datetime
cols_to_convert = ['created_at', 'deadline', 'launched_at']
for c in cols_to_convert:
    df[c] = pd.to_datetime(df[c], origin='unix', unit='s')

In [9]:
# Count length of each blurb
df['blurb_length'] = df['blurb'].str.split().str.len()

# Drop blurb variable
df.drop('blurb', axis=1, inplace=True)

In [10]:
# Extracting the relevant sub-category section from the string
f = lambda x: x['category'].split('/')[1].split('","position')[0]
df['sub_category'] = df.apply(f, axis=1)

# Extracting the relevant category section from the string, and replacing the original category variable
f = lambda x: x['category'].split('"slug":"')[1].split('/')[0]
df['category'] = df.apply(f, axis=1)
f = lambda x: x['category'].split('","position"')[0] # Some categories do not have a sub-category, so do not have a '/' to split with
df['category'] = df.apply(f, axis=1)

In [11]:
df.drop('disable_communication', axis=1, inplace=True)

In [12]:
# Calculate new column 'usd_goal' as goal * static_usd_rate
df['usd_goal'] = round(df['goal'] * df['static_usd_rate'],2)

In [13]:
# Dropping goal and static_usd_rate
df.drop(['goal', 'static_usd_rate'], axis=1, inplace=True)

In [14]:
# Dropping location
df.drop('location', axis=1, inplace=True)

In [15]:
# Count length of each name
df['name_length'] = df['name'].str.split().str.len()
# Drop name variable
df.drop('name', axis=1, inplace=True)

In [16]:
df['usd_pledged'] = round(df['usd_pledged'],2)

In [17]:
# Time between creating and launching a project
df['creation_to_launch_days'] = df['launched_at'] - df['created_at']
df['creation_to_launch_days'] = df['creation_to_launch_days'].dt.round('d').dt.days # Rounding to nearest days, then showing as number only
# Or could show as number of hours:
# df['creation_to_launch_hours'] = df['launched_at'] - df['created_at']
# df['creation_to_launch_hours'] = df['creation_to_launch_hours'].dt.round('h') / np.timedelta64(1, 'h') 

# Campaign length
df['campaign_days'] = df['deadline'] - df['launched_at']
df['campaign_days'] = df['campaign_days'].dt.round('d').dt.days # Rounding to nearest days, then showing as number only

# Launch day of week
df['launch_day'] = df['launched_at'].dt.weekday_name

# Deadline day of week
df['deadline_day'] = df['deadline'].dt.weekday_name

# Launch month
df['launch_month'] = df['launched_at'].dt.month_name()

# Deadline month
df['deadline_month'] = df['deadline'].dt.month_name()

In [18]:
# Launch time
df['launch_hour'] = df['launched_at'].dt.hour # Extracting hour from launched_at

def two_hour_launch(row):
    '''Creates two hour bins from the launch_hour column'''
    if row['launch_hour'] in (0,1):
        return '12am-2am'
    if row['launch_hour'] in (2,3):
        return '2am-4am'
    if row['launch_hour'] in (4,5):
        return '4am-6am'
    if row['launch_hour'] in (6,7):
        return '6am-8am'
    if row['launch_hour'] in (8,9):
        return '8am-10am'
    if row['launch_hour'] in (10,11):
        return '10am-12pm'
    if row['launch_hour'] in (12,13):
        return '12pm-2pm'
    if row['launch_hour'] in (14,15):
        return '2pm-4pm'
    if row['launch_hour'] in (16,17):
        return '4pm-6pm'
    if row['launch_hour'] in (18,19):
        return '6pm-8pm'
    if row['launch_hour'] in (20,21):
        return '8pm-10pm'
    if row['launch_hour'] in (22,23):
        return '10pm-12am'
    
df['launch_time'] = df.apply(two_hour_launch, axis=1) # Calculates bins from launch_time

df.drop('launch_hour', axis=1, inplace=True)

In [19]:
# Deadline time
df['deadline_hour'] = df['deadline'].dt.hour # Extracting hour from deadline

def two_hour_deadline(row):
    '''Creates two hour bins from the deadline_hour column'''
    if row['deadline_hour'] in (0,1):
        return '12am-2am'
    if row['deadline_hour'] in (2,3):
        return '2am-4am'
    if row['deadline_hour'] in (4,5):
        return '4am-6am'
    if row['deadline_hour'] in (6,7):
        return '6am-8am'
    if row['deadline_hour'] in (8,9):
        return '8am-10am'
    if row['deadline_hour'] in (10,11):
        return '10am-12pm'
    if row['deadline_hour'] in (12,13):
        return '12pm-2pm'
    if row['deadline_hour'] in (14,15):
        return '2pm-4pm'
    if row['deadline_hour'] in (16,17):
        return '4pm-6pm'
    if row['deadline_hour'] in (18,19):
        return '6pm-8pm'
    if row['deadline_hour'] in (20,21):
        return '8pm-10pm'
    if row['deadline_hour'] in (22,23):
        return '10pm-12am'
    
df['deadline_time'] = df.apply(two_hour_deadline, axis=1) # Calculates bins from launch_time

df.drop('deadline_hour', axis=1, inplace=True)

In [20]:
# Mean pledge per backer
df['pledge_per_backer'] = round(df['usd_pledged']/df['backers_count'],2)

In [21]:
# Replacing null values for blurb_length with 0
df.blurb_length.fillna(0, inplace=True)

In [22]:
# Dropping projects which are not successes or failures
df = df[df['state'].isin(['successful', 'failed'])]

In [23]:
# Checking for duplicates of individual projects, and sorting by id
duplicates = df[df.duplicated(subset='id')]
print(f"Of the {len(df)} projects in the dataset, there are {len(df[df.duplicated(subset='id')])} which are listed more than once.")
print(f"Of these, {len(df[df.duplicated()])} have every value in common between duplicates.")

Of the 192664 projects in the dataset, there are 23685 which are listed more than once.
Of these, 23674 have every value in common between duplicates.


In [24]:
# Dropping duplicates which have every value in common
df.drop_duplicates(inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [25]:
duplicated = df[df.duplicated(subset='id', keep=False)].sort_values(by='id')

In [26]:
df.drop_duplicates(subset='id', keep='first', inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [27]:
# Setting the id column as the index
df.set_index('id', inplace=True)
df.head()

Unnamed: 0_level_0,backers_count,category,country,created_at,deadline,is_starrable,launched_at,staff_pick,state,usd_pledged,blurb_length,sub_category,usd_goal,name_length,creation_to_launch_days,campaign_days,launch_day,deadline_day,launch_month,deadline_month,launch_time,deadline_time,pledge_per_backer
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
287514992,21,music,US,2013-12-21 21:01:30,2014-02-08 22:37:26,False,2013-12-25 22:37:26,False,successful,802.0,26.0,rock,200.0,4,4,45,Wednesday,Saturday,December,February,10pm-12am,10pm-12am,38.19
385129759,97,art,US,2019-02-08 21:02:48,2019-03-05 16:00:11,False,2019-02-13 16:00:11,False,successful,2259.0,9.0,mixed media,400.0,5,5,20,Wednesday,Tuesday,February,March,4pm-6pm,4pm-6pm,23.29
681033598,88,photography,US,2016-10-23 17:06:24,2016-12-01 15:58:50,False,2016-11-01 14:58:50,True,successful,29638.0,25.0,photobooks,27224.0,9,9,30,Tuesday,Thursday,November,December,2pm-4pm,2pm-4pm,336.8
1031782682,193,fashion,IT,2018-10-24 08:32:00,2018-12-08 22:59:00,False,2018-10-27 23:56:22,False,successful,49075.15,13.0,footwear,45461.0,5,4,42,Saturday,Saturday,October,December,10pm-12am,10pm-12am,254.28
904085819,20,technology,US,2015-03-07 05:35:17,2015-04-08 16:36:57,False,2015-03-09 16:36:57,False,failed,549.0,22.0,software,1000.0,4,2,30,Monday,Wednesday,March,April,4pm-6pm,4pm-6pm,27.45


In [28]:
# Dropping columns and creating new dataframe
df_transformed = df.drop(['backers_count', 'created_at', 'deadline', 'is_starrable', 'launched_at', 'usd_pledged', 'sub_category', 'pledge_per_backer'], axis=1)
df_transformed.head()

Unnamed: 0_level_0,category,country,staff_pick,state,blurb_length,usd_goal,name_length,creation_to_launch_days,campaign_days,launch_day,deadline_day,launch_month,deadline_month,launch_time,deadline_time
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
287514992,music,US,False,successful,26.0,200.0,4,4,45,Wednesday,Saturday,December,February,10pm-12am,10pm-12am
385129759,art,US,False,successful,9.0,400.0,5,5,20,Wednesday,Tuesday,February,March,4pm-6pm,4pm-6pm
681033598,photography,US,True,successful,25.0,27224.0,9,9,30,Tuesday,Thursday,November,December,2pm-4pm,2pm-4pm
1031782682,fashion,IT,False,successful,13.0,45461.0,5,4,42,Saturday,Saturday,October,December,10pm-12am,10pm-12am
904085819,technology,US,False,failed,22.0,1000.0,4,2,30,Monday,Wednesday,March,April,4pm-6pm,4pm-6pm


In [29]:
df_transformed['state'] = df_transformed['state'].replace({'failed': 0, 'successful': 1})

In [30]:
# Converting boolean features to string to include them in one-hot encoding
df_transformed['staff_pick'] = df_transformed['staff_pick'].astype(str)

In [73]:
# Creating dummy variables
df_transformed = pd.get_dummies(df_transformed)

Let us <font color = purple>**log-transform**</font> the data on the appropriate features. It allows better results.

In [74]:
# Assessing skewed distributions
cols_to_log = ['creation_to_launch_days', 'name_length', 'usd_goal']
# Replacing 0s with 0.01 and log-transforming
for col in cols_to_log:
    df_transformed[col] = df_transformed[col].astype('float64').replace(0.0, 0.01)
    df_transformed[col] = np.log(df_transformed[col])

In [75]:
X = df_transformed.drop('state', axis=1)
y = df_transformed.state

We finally save the datasets (both obvservations and labels in seperate datasets) in the right sub folder so as not to redo the whole data processing.

In [76]:
X.to_csv('datasets/observations.csv', header=X.columns )
y = y.values
y = pd.DataFrame(y)
y.to_csv('datasets/labels.csv')

Afterwards, we should separate the dataset in three similar subsets for further trainings, validation and tests. More specifically, one part will be used to train the level 0 models, the second part will be used to train the level 1 model and the last subset will be used to test the whole stack.

In [86]:
n_samples = X.values.shape[0]
n_subset = n_samples//3

X1 = X.iloc[:n_subset,:]
X2 = X.iloc[n_subset:2*n_subset,:]
X3 = X.iloc[2*n_subset:,:]

y1 = y.iloc[:n_subset,:]
y2 = y.iloc[n_subset:2*n_subset,:]
y3 = y.iloc[2*n_subset:,:];

In [87]:
X1.to_csv('datasets/observations1.csv', header=X1.columns )
X2.to_csv('datasets/observations2.csv', header=X2.columns )
X3.to_csv('datasets/observations3.csv', header=X3.columns )
y1.to_csv('datasets/labels1.csv')
y2.to_csv('datasets/labels2.csv')
y3.to_csv('datasets/labels3.csv')

### Training simple models without dimensionality reduction
Now, we should train each of the model specified in the model notebook, on each of the subsets. However, note that the data was <font color=purple>**log-transformed**</font> to accomplish better results.
#### Vanilla logistic regression

In [88]:
# Fitting logistic regression models with default parameters
logreg1 = LogisticRegression()
logreg1.fit(X1,y1)
logreg2 = LogisticRegression()
logreg2.fit(X2,y2)
logreg3 = LogisticRegression()
logreg3.fit(X3,y3)

  y = column_or_1d(y, warn=True)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  y = column_or_1d(y, warn=True)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  y = column_or_1d(y, warn=True)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation f

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

Now, we should save the trained models in the proper sub folder.

In [91]:
import pickle as pk

In [90]:
logistic_list = [logreg1,logreg2,logreg3]

for i in range(3):
    with open('stacking_test/logreg'+str(i+1)+'.txt','wb') as fichier:
        pickler = pk.Pickler(fichier)
        pickler.dump(logistic_list[i])

#### Random forest

In [92]:
rf1 = RandomForestClassifier(max_depth=35, min_samples_split=0.001, n_estimators=400)
rf1.fit(X1, y1)
rf2 = RandomForestClassifier(max_depth=35, min_samples_split=0.001, n_estimators=400)
rf2.fit(X2, y2)
rf3 = RandomForestClassifier(max_depth=35, min_samples_split=0.001, n_estimators=400)
rf3.fit(X3, y3);

  
  after removing the cwd from sys.path.
  


RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=35, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=0.001,
                       min_weight_fraction_leaf=0.0, n_estimators=400,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)

In [93]:
rf_list = [rf1,rf2,rf3]
for i in range(3):
    with open('stacking_test/random_forest_'+str(i+1)+'.txt','wb') as fichier:
        pickler = pk.Pickler(fichier)
        pickler.dump(rf_list[i])

#### XGBoost

In [94]:
xgb1 = xgb.XGBClassifier(learning_rate=0.1, max_depth=35, min_child_weight=100, n_estimators=100, subsample=0.7)
xgb1.fit(X1, y1)
xgb2 = xgb.XGBClassifier(learning_rate=0.1, max_depth=35, min_child_weight=100, n_estimators=100, subsample=0.7)
xgb2.fit(X2, y2)
xgb3 = xgb.XGBClassifier(learning_rate=0.1, max_depth=35, min_child_weight=100, n_estimators=100, subsample=0.7)
xgb3.fit(X3, y3);

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=35,
              min_child_weight=100, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=0.7, verbosity=1)

In [96]:
xgb_list = [xgb1,xgb2,xgb3]
for i in range(3):
    with open('stacking_test/xgb_'+str(i+1)+'.txt','wb') as fichier:
        pickler = pk.Pickler(fichier)
        pickler.dump(xgb_list[i])

### Stacking without dimensionality reduction

In [97]:
import numpy as np
from sklearn.linear_model import LogisticRegression as LR

In [125]:
def stacked_dataset(models, inputX):
    """
    Input : list of learners, np.array
    Output: np.array
    The function takes a list of pretrained models and the training observations to return the concatenated predictions
    of each and every model in a flattened array. The output will be the input of the level 1 model to train with
    trainStack
    """
    stackX = None
    for model in models:
        # make prediction
        yhat = model.predict(inputX)
        # stack predictions into [rows, members, probabilities]
        if stackX is None:
            stackX = yhat
        else:
            stackX = np.dstack((stackX, yhat))
    # flatten predictions to [rows, members x probabilities]
    stackX = stackX.reshape((stackX.shape[1], stackX.shape[2]))
    print("Il y a {0} modèles, le format des observations est : {1} et celui des observations empilées est : {2}".format(len(models), inputX.shape, stackX.shape))
    print("Les cinq premières lignes ressemblent à ceci : {}".format(stackX[:5,:]))
    return stackX

In [112]:
def trainStack(first_models, final_model, X_train, y_train):
    """
    Input : list of learners, learner, np.array, np.array
    Output : learner
    The function takes the level 0 trained learners, the level 1 learner to train, the training observations and the 
    training labels. It returns the level 1 trained model.
    """
    
    X_stacked = stacked_dataset(first_models, X_train)
    final_model.fit(X_stacked, y_train)
    
    return final_model

In [128]:
def predictStack(first_models, final_model, X_test):
    """
    Input : list of learners, learner, array-like
    Output : array-like
    The function takes the first-level trained models, the top-level trained model and the test set and returns 
    the predictions of the stack on the test set.
    """
    X_stacked = stacked_dataset(first_models, X_test)
    y_predicted = final_model.predict(X_stacked)
    return y_predicted

#### Logistic regression
We will use the logistic regression as the top-level trainer.

In [100]:
#Loading the pretrained models
logreg1, rf1, xgb1 = None, None, None

with open('stacking_test/logreg1.txt','rb') as fichier:
    pickler = pk.Unpickler(fichier)
    logreg1 = pickler.load()

with open('stacking_test/random_forest_1.txt','rb') as fichier:
    pickler = pk.Unpickler(fichier)
    rf1 = pickler.load()
    
with open('stacking_test/xgb_1.txt','rb') as fichier:
    pickler = pk.Unpickler(fichier)
    xgb1 = pickler.load()

In [116]:
# Loading the datasets. Since the models were trained on the first part of the dataset, we must choose different datasets this
# time, for instance the second subsets
X_train = pd.read_csv('datasets/observations2.csv')
y_train = pd.read_csv('datasets/labels2.csv')
X_train.drop('id',axis=1,inplace=True)
y_train.drop('Unnamed: 0',axis=1,inplace=True);

In [117]:
top_model1 = LogisticRegression();

In [126]:
stack1 = trainStack([logreg1, rf1, xgb1], top_model1, X_train, y_train)

Il y a 3 modèles, le format des observations est : (56326, 106) et celui des observations empilées est : (56326, 3)
Les cinq premières lignes ressemblent à ceci : [[0 0 0]
 [1 1 0]
 [1 1 1]
 [0 0 0]
 [1 1 1]]


  y = column_or_1d(y, warn=True)


Now, it's time to import the test sets.

In [129]:
X_test = pd.read_csv('datasets/observations3.csv')
y_test = pd.read_csv('datasets/labels3.csv')
X_test.drop('id',axis=1,inplace=True)
y_test.drop('Unnamed: 0',axis=1,inplace=True);

In [130]:
y_predicted = predictStack([logreg1, rf1, xgb1], stack1, X_test)

Il y a 3 modèles, le format des observations est : (56327, 106) et celui des observations empilées est : (56327, 3)
Les cinq premières lignes ressemblent à ceci : [[1 1 1]
 [1 1 1]
 [1 1 1]
 [1 1 1]
 [1 1 1]]


In [131]:
from sklearn.metrics import precision_recall_fscore_support

In [135]:
# Displaying the results
stack1_test_precision, stack1_test_recall, stack1_test_f1score, stack1_test_support = precision_recall_fscore_support(y_test, y_predicted, average='weighted')
stack1_results = {'Precision':[stack1_test_precision], 'Recall':[stack1_test_recall], 'F1_score': [stack1_test_f1score]}
stack1_results = pd.DataFrame(stack1_results)
stack1_results

Unnamed: 0,Precision,Recall,F1_score
0,0.747522,0.736698,0.731683


We should renew the experiment with models trained on different datasets.