Datasets are as follows:

1) Dataset 1 (classification) - download the RAR file from https://archive.ics.uci.edu/ml/machine-learning-databases/00470/ with detailed information here: https://archive-beta.ics.uci.edu/ml/datasets/parkinson+s+disease+classification

2) Dataset 2 (classification) - download the XLS file from https://archive.ics.uci.edu/ml/machine-learning-databases/00350/ with detailed information here: https://archive-beta.ics.uci.edu/ml/datasets/default+of+credit+card+clients

3) Dataset 3 (classification) - download the XLS file from https://archive.ics.uci.edu/ml/machine-learning-databases/00399/ with detailed information here: https://archive-beta.ics.uci.edu/ml/datasets/meu+mobile+ksd

4) Dataset 4 (classification) - download the XLS file from https://archive.ics.uci.edu/ml/machine-learning-databases/00342/ with detailed information here: https://archive-beta.ics.uci.edu/ml/datasets/mice+protein+expression

5) Dataset 5 (regression) - download the train excel/csv file from https://archive.ics.uci.edu/ml/machine-learning-databases/00464/ with detailed information here: https://archive-beta.ics.uci.edu/ml/datasets/superconductivty+data

The following DR and FS methods to be used:

1) LDA (https://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html)

2) PCA (https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)

3) t-SNE (https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html)

4) L1 based feature selection - use the code from this blog and adjust: https://towardsdatascience.com/feature-selection-using-regularisation-a3678b71e499

5) Feature Selection with Random Forest - use the code from this blog and adjust: https://towardsdatascience.com/feature-selection-using-random-forest-26d7b747597f

6) Feature Selection with XGBoost - use the code from this blog and adjust: https://machinelearningmastery.com/feature-importance-and-feature-selection-with-xgboost-in-python/

7) Recursive Feature Elimination - use the code from this blog and adjust: https://towardsdatascience.com/feature-selection-in-python-recursive-feature-elimination-19f1c39b8d15 and https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html

Task:
Compare the classification and regression performance over the 7 different methods above. You can select any classification metrics you want, but its better to take 3-5 metrics in all (eg precsion, recall, accuracy, AUC etc.). For regression, take RMSE, MSE etc. or any other you want.Â 

For the algorithms, consider XGBoost, RF, SVM and LR for classification and MLR and Voting Regression for regression.

The number of features you want to select depends on you, e.g., you can assume 10 in the first go and select 10 from all the 7 methods and then run the classification and regression algorithms and compare. Then, you can repeat with 5 and then with 15. In this way, you will be able to acquire more information, e.g., what impact does selecting more variables or less variables has on the classification and regression performance.

For the above experiments, no hyperparameter tuning is required - just assume some default values

Write a summary of all your results (tetxual summary) by answering the following questions in detail while mentioning values of the performance metrics and graphs if feasible.
- How does the classification performance compare across the 7 DR/FS methods?
- How does the regression performance compare across the 7 DR/FS methods?
- What is the effect of increasing or decreasing the total number of your desired features on the classification and regression performance?

# Imports and Helpers

In [1]:
import pandas as pd 
import numpy as np
import seaborn as sb
import math
import warnings
import matplotlib.pyplot as plt        
get_ipython().run_line_magic('matplotlib', 'inline')

from sklearn import preprocessing
from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model

#import feature selection modules
from sklearn.feature_selection import mutual_info_classif,RFE,RFECV
from sklearn.feature_selection import mutual_info_regression

from sklearn.linear_model import Lasso, Ridge

#import classification modules
from sklearn.linear_model import LogisticRegression
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier


# import regression modules
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import VotingRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor

#import split methods
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectFromModel
from sklearn.preprocessing import StandardScaler


#PCA
from sklearn.decomposition import PCA

#LDA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

#TSNE
from sklearn.manifold import TSNE

#L1
from sklearn.linear_model import Lasso, LogisticRegression



#import performance scores
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve 
from sklearn.metrics import f1_score
from sklearn.metrics import auc
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
from sklearn.utils import shuffle




In [2]:
# CONSTANTS

CLASSIFICATION = 'Classification'
REGRESSION = 'Regression'

REG_LASSO = 'lasso'
FS_RFFS = 'rffs'
FS_REFS = 'refs'
FS_PCA = 'pca'
FS_LDA = 'lda'

In [3]:
!gdown --id "1jLOvY2smMG9UR1AoDB2JO7POQCk3ijt2" -O "dataset1.csv"
dataset1 = pd.read_csv("dataset1.csv") # pd_speech_features

!gdown --id "1Y15hapeg-OfK_rH75VLx9j9tHIq-VT6R" -O "dataset2.csv"
dataset2 = pd.read_csv("dataset2.csv") # defaulters (credit card)

!gdown --id "1yS4AvLDmeN9wF1OEw1xmcK28KGxl6V4s" -O "dataset3.csv"
dataset3 = pd.read_csv("dataset3.csv") # mobile keystroke

!gdown --id "1QaNponzzLsGTiR8I2y7At3OJAWyKkw8a" -O "dataset4.csv"
dataset4 = pd.read_csv("dataset4.csv") # mice protein

!gdown --id "1r6j7SZeqOjD7FnYV9IO_0c2YcYWDc92o" -O "dataset5.csv"
dataset5 = pd.read_csv("dataset5.csv") # superconductor

Downloading...
From: https://drive.google.com/uc?id=1jLOvY2smMG9UR1AoDB2JO7POQCk3ijt2
To: /content/dataset1.csv
100% 5.31M/5.31M [00:00<00:00, 46.4MB/s]
Downloading...
From: https://drive.google.com/uc?id=1Y15hapeg-OfK_rH75VLx9j9tHIq-VT6R
To: /content/dataset2.csv
100% 2.90M/2.90M [00:00<00:00, 92.9MB/s]
Downloading...
From: https://drive.google.com/uc?id=1yS4AvLDmeN9wF1OEw1xmcK28KGxl6V4s
To: /content/dataset3.csv
100% 1.51M/1.51M [00:00<00:00, 46.3MB/s]
Downloading...
From: https://drive.google.com/uc?id=1QaNponzzLsGTiR8I2y7At3OJAWyKkw8a
To: /content/dataset4.csv
100% 992k/992k [00:00<00:00, 63.4MB/s]
Downloading...
From: https://drive.google.com/uc?id=1r6j7SZeqOjD7FnYV9IO_0c2YcYWDc92o
To: /content/dataset5.csv
100% 17.6M/17.6M [00:00<00:00, 64.7MB/s]


In [4]:
label_dataset1 = 'class'
label_dataset2 = 'default_payment_next_month'
label_dataset3 = 'UD_l.Enter'
label_dataset4 = 'Behavior'
label_dataset5 = 'number_of_elements'

print('-----d1-----')
print(dataset1.shape)
dataset1.dropna(inplace=True)
print(dataset1.shape)

print('-----d2-----')
print(dataset2.shape)
dataset2.dropna(inplace=True)
print(dataset2.shape)

print('-----d3-----')
print(dataset3.shape)
dataset3.dropna(inplace=True)
dataset3["AvH"] = pd.to_numeric(dataset3["AvH"])
dataset3["AvP"] = pd.to_numeric(dataset3["AvP"])
dataset3["AvA"] = pd.to_numeric(dataset3["AvA"])
dataset3.dropna(inplace=True)
print(dataset3.shape)

print('-----d4-----')
print(dataset4.shape)
dataset4.dropna(inplace=True)
dataset4.drop(columns=['MouseID'], inplace=True)
print(dataset4.shape)

print('-----d5------')
print(dataset5.shape)
dataset5.dropna(inplace=True)
print(dataset5.shape)
print('------------')

-----d1-----
(756, 755)
(756, 755)
-----d2-----
(30000, 25)
(30000, 25)
-----d3-----
(2911, 72)
(2856, 72)
-----d4-----
(1080, 82)
(1080, 81)
-----d5------
(21263, 82)
(21263, 82)
------------


# Classification Algorithms

In [5]:
# Classification Algorithms

def LogReg(trainX, testX, trainY, testY, verbose=True, clf=None, multi=False):
    if not clf:
        clf  = LogisticRegression()
    clf.fit(trainX , trainY)
    return validationmetrics(clf,testX,testY,verbose=verbose,multi=multi)

def SVM(trainX, testX, trainY, testY, svmtype="SVC", verbose=True, clf=None, multi=False):
    # for one vs all
    if not clf:
        if svmtype == "Linear":
            clf = svm.LinearSVC()
        else:
            clf = svm.SVC()
    clf.fit(trainX , trainY)
    return validationmetrics(clf,testX,testY,verbose=verbose,multi=multi)


def RandomForest(trainX, testX, trainY, testY, verbose=True, clf=None, multi=False):
    if not clf:
        clf  = RandomForestClassifier()
    clf.fit(trainX , trainY)
    return validationmetrics(clf,testX,testY,verbose=verbose,multi=multi)


def XgBoost(trainX, testX, trainY, testY, verbose=True, clf=None, multi=False):
    if not clf:
      objective = 'binary:logistic'
      if multi == True:
        objective = 'multi:softmax'
      clf = XGBClassifier(random_state=1,learning_rate=0.01, silent=1, objective=objective)
    clf.fit(trainX,trainY)
    return validationmetrics(clf,testX,testY,verbose=verbose,multi=multi)

def RunClassificationAlgorithms(trainX, testX, trainY, testY, verbose=True, multi=False):
  print("Running Classification Algorithms")

  print("Running XgBoost")
  XgBoost(trainX, testX, trainY, testY, verbose=verbose, multi=multi)
  print('\n')
  print("Running LR")
  LogReg(trainX, testX, trainY, testY, verbose=verbose, multi=multi)
  print('\n')
  print("Running SVM")
  SVM(trainX, testX, trainY, testY, verbose=verbose, multi=multi)
  print('\n')
  print("Running Random Forest")
  RandomForest(trainX, testX, trainY, testY, verbose=verbose, multi=multi)
  print('\n')


# Regression Algorithms

In [6]:
# Regression Algorithms
def LinearReg(trainX, testX, trainY, testY, verbose=True, clf=None):
    if not clf:
        clf  = LinearRegression()
    clf.fit(trainX , trainY)
    return validationmetrics_reg(clf, testX, testY, verbose=verbose)

def RandomForestReg(trainX, testX, trainY, testY, verbose=True, clf=None):
    if not clf:
        clf = RandomForestRegressor(n_estimators=100)
    clf.fit(trainX , trainY)
    return validationmetrics_reg(clf, testX, testY, verbose=verbose)

def PolynomialReg(trainX, testX, trainY, testY, degree=3, verbose=True, clf=None):
    poly = PolynomialFeatures(degree = degree)
    X_poly = poly.fit_transform(trainX)
    poly.fit(X_poly, trainY)
    if not clf:
        clf = LinearRegression() 
    clf.fit(X_poly, trainY)
    return validationmetrics_reg(clf, poly.fit_transform(testX), testY, verbose=verbose)

def SupportVectorRegression(trainX, testX, trainY, testY, verbose=True, clf=None):
    if not clf:
        clf = SVR(kernel="rbf")
    clf.fit(trainX , trainY)
    return validationmetrics_reg(clf, testX, testY, verbose=verbose)

def DecisionTreeReg(trainX, testX, trainY, testY, verbose=True, clf=None):
    if not clf:
        clf = DecisionTreeRegressor()
    clf.fit(trainX , trainY)
    return validationmetrics_reg(clf, testX, testY, verbose=verbose)

def GradientBoostingReg(trainX, testX, trainY, testY, verbose=True, clf=None):
    if not clf:
        clf = GradientBoostingRegressor()
    clf.fit(trainX , trainY)
    return validationmetrics_reg(clf, testX, testY, verbose=verbose)

def AdaBooostReg(trainX, testX, trainY, testY, verbose=True, clf=None):
    if not clf:
        clf = AdaBoostRegressor(random_state=0, n_estimators=100)
    clf.fit(trainX , trainY)
    return validationmetrics_reg(clf, testX, testY, verbose=verbose)

def VotingReg(trainX, testX, trainY, testY, verbose=True, clf=None):
    rf = RandomForestRegressor(n_estimators=100)
    sv = svm.SVR(kernel="rbf")
    ada = AdaBoostRegressor(random_state=0, n_estimators=100)
    gb = GradientBoostingRegressor()
    dt = DecisionTreeRegressor()

    if not clf:
        clf = VotingRegressor([('rf', rf), ('sv', sv), ('ada', ada), ('gb', gb), ('dt', dt)])
    clf.fit(trainX , trainY)
    return validationmetrics_reg(clf, testX, testY, verbose=verbose)

def RunRegressionAlgorithms(trainX, testX, trainY, testY):
  print("Running Regression Algorithms")

  print("Running Linear Regression")
  LinearReg(trainX, testX, trainY, testY)
  print('\n')
  
  # print("Running Polynomial Regression")
  # PolynomialReg(trainX, testX, trainY, testY, verbose=verbose)

  print("Running Voting Regressor")
  VotingReg(trainX, testX, trainY, testY)
  print('\n')


# Validation metrics


In [7]:
# Validation metrics for classification

def validationmetrics(model, testX, testY, verbose=True, multi=False):   
    predictions = model.predict(testX)
    
    if model.__class__.__module__.startswith('sklearn.linear_model._coordinate_descent') or model.__class__.__module__.startswith('sklearn.linear_model._ridge') or model.__class__.__module__.startswith('lightgbm'):

      for i in range(0, predictions.shape[0]):
        predictions[i] = 1 if predictions[i] >= 0.5 else 0

    average = 'binary'
    if multi == True:
      average='weighted'

    accuracy = accuracy_score(testY, predictions) * 100
    precision = precision_score(testY, predictions, pos_label=1, labels=[0,1], average=average) * 100
    recall = recall_score(testY, predictions,pos_label=1,labels=[0,1], average=average) * 100
    if not multi:
      fpr , tpr, _ = roc_curve(testY, predictions)
      auc_val = auc(fpr, tpr)
    f_score = f1_score(testY, predictions, average=average)

    if verbose:
        print("Prediction Vector: \n", predictions)
        print("\n Accuracy: \n", accuracy)
        print("\n Precision of event Happening: \n", precision)
        print("\n Recall of event Happening: \n", recall)
        if not multi:
          print("\n AUC: \n",auc_val)
          print("\n Confusion Matrix: \n", confusion_matrix(testY, predictions,labels=[0,1]))
        print("\n F-Score:\n", f_score)
        #confusion Matrix
    
    res_map = {
      "accuracy": accuracy,
      "precision": precision,
      "recall": recall,
      "f_score": f_score,
      "model_obj": model
    }

    if not multi:
      res_map["auc_val"]: auc_val
    
    return res_map


In [8]:
#Validation metrics for Regression algorithms

def validationmetrics_reg(model, testX, testY, verbose=True):
    predictions = model.predict(testX)
    
    r2 = r2_score(testY,predictions)
    r2_adjusted = 1-(1-r2)*(testX.shape[0]-1)/(testX.shape[0]-testX.shape[1]-1)
    mse = mean_squared_error(testY,predictions)
    rmse = math.sqrt(mse)
    
    if verbose:
      print("R-Squared Value: ", r2)
      print("Adjusted R-Squared: ", r2_adjusted)
      print("RMSE: ", rmse)
    
    res_map = {
      "r2": r2,
      "r2_adjusted": r2_adjusted,
      "rmse": rmse,
      "model_obj": model
    }

    return res_map


# Dimensionality Reduction & Feature Selection

In [9]:
#Train Test Split: splitting manually
def traintestsplit(df, label_col, split=None, random=None):
  #make a copy of the label column and store in y
  df_cpy = df.copy()
  y = df_cpy[label_col].copy()

  #now delete the original
  X = df_cpy.drop(label_col,axis=1)

  if split == None:
    return X, y

  #manual split
  trainX, testX, trainY, testY = train_test_split(X, y, test_size=split, random_state=random)
  return X, y, trainX, testX, trainY, testY

#helper function which only splits into X and y
def XYsplit(df, label_col):
    df_cpy = df.copy()
    y = df_cpy[label_col].copy()
    X = df_cpy.drop(label_col,axis=1)
    return X,y


##LDA

In [10]:
def LDAImpl(df, label_col, ncomponents, multi=False):    
    df_cpy = df.copy()
    X,y = XYsplit(df_cpy, label_col)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
    
    sc = StandardScaler()
    X_scaled = sc.fit_transform(X_train)

    lda = LinearDiscriminantAnalysis(n_components=ncomponents)
    X_lda = lda.fit_transform(X_scaled, y_train)
    X_lda = pd.DataFrame(X_lda)
    X_train_lda, X_test_lda, y_train, y_test = train_test_split(X_lda, y_train, test_size=0.20, shuffle=True, random_state=2)

    RunClassificationAlgorithms(X_train_lda, X_test_lda, y_train, y_test, verbose=True, multi=multi)
    print('\n\n')
    RunRegressionAlgorithms(X_train_lda, X_test_lda, y_train, y_test)

###Running LDA with 5 features

In [11]:
print('Running on dataset1')
LDAImpl(dataset1, label_dataset1, ncomponents=5, multi=False)
print('-----')
print('Running on dataset2')
LDAImpl(dataset2, label_dataset2, ncomponents=5, multi=False)
print('-----')
print('-----')
print('Running on dataset3')
LDAImpl(dataset3, label_dataset3, ncomponents=5, multi=True)
print('-----')
print('-----')
print('-----')
print('Running on dataset4')
LDAImpl(dataset4, label_dataset4, ncomponents=5, multi=False)
print('-----')
print('-----')
print('-----')
print('Running on dataset5')
LDAImpl(dataset5, label_dataset5, ncomponents=5, multi=True)
print('-----')
print('-----')
print('-----')
print('-----')
print('-----')


Running on dataset1




Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 0 0
 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 0 1 1 1
 0 1 1 0 0 0 1 1 1 1]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 AUC: 
 1.0

 Confusion Matrix: 
 [[24  0]
 [ 0 97]]

 F-Score:
 1.0


Running LR
Prediction Vector: 
 [1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 0 0
 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 0 1 1 1
 0 1 1 0 0 0 1 1 1 1]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 AUC: 
 1.0

 Confusion Matrix: 
 [[24  0]
 [ 0 97]]

 F-Score:
 1.0


Running SVM
Prediction Vector: 
 [1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1



Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 81.5625

 Precision of event Happening: 
 65.27545909849749

 Recall of event Happening: 
 36.61048689138577

 AUC: 
 0.6551853390657177

 Confusion Matrix: 
 [[3524  208]
 [ 677  391]]

 F-Score:
 0.46910617876424715


Running LR
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 81.47916666666667

 Precision of event Happening: 
 72.65822784810126

 Recall of event Happening: 
 26.872659176029963

 AUC: 
 0.619893842503944

 Confusion Matrix: 
 [[3624  108]
 [ 781  287]]

 F-Score:
 0.39234449760765555


Running SVM
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 81.52083333333333

 Precision of event Happening: 
 63.60902255639098

 Recall of event Happening: 
 39.60674157303371

 AUC: 
 0.6656114141888556

 Confusion Matrix: 
 [[3490  242]
 [ 645  423]]

 F-Score:
 0.48817080207732266


Running Random Forest
Prediction Vector: 
 [0 1 0 ... 0 0 0]

 Accuracy: 
 72.25

 Preci

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 512.  462.  887.  581. 1308.  369.  733.  410.  335.  416.  495.  600.
  733.  554.  554.  664. 1308.  512.  369.  462.  624.  442.  347.  554.
  512.  462.  495.  345.  593.  696.  495.  637.  410.  462.  410.  335.
  377.  562.  442.  590.  367. 1308. 1308.  332.  664.  495.  664.  462.
  581.  887.  733.  445.  416.  581.  462.  495.  467.  428.  369.  554.
  345.  410.  696.  664.  637.  335.  495.  335.  520.  294.  384.  593.
  512.  410.  326.  326.  887.  428.  684.  377.  410.  462.  467.  332.
  478.  369.  401.  478.  696.  512.  664.  684.  512.  696.  384.  554.
  664.  462.  505.  478.  449.  733.  462.  423.  554.  535.  887.  410.
  369.  733.  501.  712.  420.  733.  464.  554.  467.  440.  600.  369.
  462.  335.  462.  345.  520.  512.  664.  590.  887.  684.  462.  478.
  733.  887. 1308.  467.  367.  733.  696.  637.  467.  462.  467.  554.
  345.  388.  450. 1308.  495.  637. 1308.  462.  367. 1308.  733.  392.
  345.  416.  345.  712.  332.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  359.   359.  1215.   407.  4311.   359.  3040.   369.   354.   495.
   359.  1182.  2034.   407.   361.   399.  7266.   462.   369.   360.
   459.   407.   369.   384.   345.   369.   359.   369.   520.  1338.
   467.   566.   410.   361.   410.   359.   369.   359.   416.   459.
   391.  4810. 11589.   359.   459.   462.   535.   369.   474.  1505.
  7893.   369.   384.   462.   459.   359.   459.   384.   369.   459.
   359.   407.   690.   600.   599.   354.   407.   359.   495.   359.
   459.   416.   462.   410.   410.   359.   887.   354.  1695.   410.
   459.   345.   410.   410.   407.   369.   369.   410.   797.   410.
   361.  5485.   410.   690.   369.   345.   705.   410.   410.   359.
   410.  3040.   416.   359.   360.   467.  3479.   467.   345.   887.
   359.   948.   384.  6567.   369.   360.   416.   407.  5485.   384.
   384.   354.   369.   410.   462.   410.   655.   407.  4941.  1695.
   360.   467.  3040.  1234.  8450.   410.   384.  1323.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  462.   462.   664.   467.  1308.   410.   467.   410.   410.   462.
   462.   664.   467.   462.   467.   467.  2452.   462.   410.   410.
   467.   462.   410.   467.   462.   462.   462.   410.   467.   467.
   467.   467.   410.   462.   410.   410.   410.   467.   462.   467.
   410.  1308.   410.   410.   467.   462.   467.   462.   467.   664.
   467.   462.   410.   467.   462.   462.   462.   462.   410.   462.
   410.   462.   467.   467.   467.   410.   462.   410.   462.   410.
   410.   467.   462.   410.   410.   410.   664.   462.  1215.   410.
   462.   462.   462.   410.   462.   410.   410.   462.   467.   467.
   467.   664.   467.   467.   410.   467.   467.   462.   462.   462.
   462.   467.   462.   410.   467.   467.  1215.   462.   410.   467.
   462.   467.   462.   467.   462.   467.   462.   462.   467.   410.
   462.   410.   462.   410.   462.   462.   467.   467.  1215.  1215.
   462.   462.   467.  1215.  4535.   462.   410.   467.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  469.   476.  1014.   581.  1214.   345.   814.   427.   234.   403.
   449.   600.   802.   617.   558.   665.  2383.   487.   369.   467.
   588.   456.   222.   553.   505.   440.   532.   329.   636.   740.
   500.   651.   414.   588.   436.   257.   245.   532.   442.   611.
   384.  1112. 12903.   170.   752.   461.   640.   440.   565.   907.
   870.   459.   384.   564.   457.   494.   497.   426.   341.   527.
   353.   407.   649.   707.   653.   315.   534.   195.   524.   243.
   356.   666.   502.   371.   305.   334.   842.   431.   959.   374.
   405.   499.   453.   337.   509.   367.   416.   478.   748.   552.
   613.   965.   547.   688.   388.   565.   712.   445.   554.   478.
   456.   752.   424.   366.   605.   530.   975.   410.   350.   800.
   424.   681.   457.   733.   440.   546.   467.   472.  1218.   369.
   462.   296.   457.   354.   572.   495.   686.   569.   990.   975.
   505.   522.   778.   870.  5392.   456.   384.   838.



Prediction Vector: 
 [0 1 0 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 1
 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 1 0 1 1 1 1
 1 0 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 0 1 1 0 1 0 1 1 0 0
 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 0 1 0 1 0 1 1 0 0 1 0
 1 0 1 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 AUC: 
 1.0

 Confusion Matrix: 
 [[94  0]
 [ 0 79]]

 F-Score:
 1.0





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9705333369082066
Adjusted R-Squared:  0.9703610172410032
RMSE:  0.08550605072822794


Running Voting Regressor
R-Squared Value:  0.9999128894097994
Adjusted R-Squared:  0.9999123799911432
RMSE:  0.004649078158438239


-----
-----
-----
Running on dataset5
Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 7 ... 5 6 6]

 Accuracy: 
 91.26984126984127

 Precision of event Happening: 
 96.07843137254902

 Recall of event Happening: 
 100.0

 F-Score:
 0.911348089070941


Running SVM
Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0


Running Random Forest


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9875876039582869
Adjusted R-Squared:  0.9875693289346684
RMSE:  0.16001253164604926


Running Voting Regressor
R-Squared Value:  0.9998988482530338
Adjusted R-Squared:  0.9998986993252555
RMSE:  0.01444484474275176


-----
-----
-----
-----
-----


###Running LDA with 10 features

In [12]:
print('Running on dataset1')
LDAImpl(dataset1, label_dataset1, ncomponents=10, multi=False)
print('-----')
print('Running on dataset2')
LDAImpl(dataset2, label_dataset2, ncomponents=10, multi=False)
print('-----')
print('-----')
print('Running on dataset3')
LDAImpl(dataset3, label_dataset3, ncomponents=10, multi=True)
print('-----')
print('-----')
print('-----')
print('Running on dataset4')
LDAImpl(dataset4, label_dataset4, ncomponents=10, multi=False)
print('-----')
print('-----')
print('-----')
print('Running on dataset5')
LDAImpl(dataset5, label_dataset5, ncomponents=10, multi=True)
print('-----')
print('-----')
print('-----')
print('-----')
print('-----')


Running on dataset1




Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 0 0
 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 0 1 1 1
 0 1 1 0 0 0 1 1 1 1]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 AUC: 
 1.0

 Confusion Matrix: 
 [[24  0]
 [ 0 97]]

 F-Score:
 1.0


Running LR
Prediction Vector: 
 [1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 0 0
 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 0 1 1 1
 0 1 1 0 0 0 1 1 1 1]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 AUC: 
 1.0

 Confusion Matrix: 
 [[24  0]
 [ 0 97]]

 F-Score:
 1.0


Running SVM
Prediction Vector: 
 [1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1



Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 81.5625

 Precision of event Happening: 
 65.27545909849749

 Recall of event Happening: 
 36.61048689138577

 AUC: 
 0.6551853390657177

 Confusion Matrix: 
 [[3524  208]
 [ 677  391]]

 F-Score:
 0.46910617876424715


Running LR
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 81.47916666666667

 Precision of event Happening: 
 72.65822784810126

 Recall of event Happening: 
 26.872659176029963

 AUC: 
 0.619893842503944

 Confusion Matrix: 
 [[3624  108]
 [ 781  287]]

 F-Score:
 0.39234449760765555


Running SVM
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 81.52083333333333

 Precision of event Happening: 
 63.60902255639098

 Recall of event Happening: 
 39.60674157303371

 AUC: 
 0.6656114141888556

 Confusion Matrix: 
 [[3490  242]
 [ 645  423]]

 F-Score:
 0.48817080207732266


Running Random Forest
Prediction Vector: 
 [0 1 0 ... 0 0 0]

 Accuracy: 
 72.2708333333

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 512.  462.  887.  563. 1215.  359.  733.  428.  335.  405.  495.  600.
  733.  554.  554.  590.  624.  512.  369.  462.  590.  442.  367.  554.
  512.  462.  495.  345.  684.  664.  467.  637.  410.  478.  428.  326.
  377.  505.  405.  590.  326. 1308.  624.  279.  664.  495.  637.  462.
  554.  887.  733.  410.  399.  605.  462.  467.  462.  389.  384.  554.
  359.  456.  696.  686.  637.  354.  501.  335.  480.  335.  369.  684.
  512.  420.  368.  326.  887.  399.  624.  377.  459.  462.  450.  359.
  424.  354.  404.  467.  624.  554.  664.  684.  512.  664.  384.  554.
  664.  462.  521.  478.  467.  733.  420.  423.  554.  535.  887.  426.
  374.  624.  501.  686.  428.  733.  467.  554.  467.  440.  733.  384.
  467.  335.  462.  345.  554.  467.  664.  590.  887.  684.  501.  467.
  624.  887. 1308.  449.  334.  733.  712.  637.  495.  462.  462.  554.
  359.  390.  426. 1215.  495.  637.  624.  462.  369. 1215.  733.  374.
  335.  405.  359.  600.  374.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  512.   410.  1308.   478. 12903.   359.   767.   410.   389.   495.
   384.  5485.  1033.   617.   478.   405. 11422.   467.   369.   600.
   424.   600.   369.   495.   424.   462.   410.   342.   520.  1338.
   359.   512.   359.   535.   359.   478.   410.   332.   416.   445.
   406.  1033. 12903.   410.   497.   462.   809.   445.   369.  1308.
  7893.   410.   369.   467.   459.   359.   459.   369.   410.   459.
   359.   407.  1338.   497.   599.   354.   407.   354.   524.   410.
   380.   384.   389.   405.   369.   359.  1308.   456.   957.   369.
   445.   359.   478.   410.   521.   410.   369.   369.  1033.   410.
   478.   965.   478.   664.   410.   599.   617.   405.   405.   359.
   467.  3040.   384.   359.   360.   380.  1308.   410.   407. 12903.
   359.   361.   406.  1151.   369.   360.   424.   600.  1049.   369.
   380.   354.   354.   369.   384.   467.  1308.   472.  4941.  4402.
   397.   384.  3040.  2034. 11422.   405.   406.  2034.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  462.   410.   664.   467.  1308.   410.   467.   410.   410.   410.
   462.   664.   467.   467.   467.   467.  2383.   467.   410.   467.
   467.   462.   410.   467.   467.   462.   462.   410.   467.   467.
   467.   467.   410.   462.   410.   410.   410.   467.   462.   467.
   410.  1215.   410.   410.   467.   462.   467.   462.   467.   664.
   467.   410.   410.   467.   462.   467.   462.   410.   410.   462.
   410.   462.   467.   467.   467.   410.   462.   410.   462.   410.
   410.   467.   462.   410.   410.   410.   664.   410.  1215.   410.
   410.   462.   462.   410.   462.   410.   410.   467.   467.   467.
   467.   664.   467.   467.   410.   467.   467.   462.   462.   467.
   467.   467.   462.   410.   467.   467.  1215.   410.   410.   467.
   467.   467.   410.   467.   467.   467.   467.   467.   467.   410.
   462.   410.   462.   410.   467.   467.   467.   467.   664.   664.
   467.   467.   467.  1215.  4511.   462.   410.   467.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 512.  420.  910.  581.  933.  428.  814.  434.  281.  407.  520.  730.
  814.  617.  601.  679.  866.  496.  369.  467.  588.  473.  271.  549.
  523.  462.  571.  326.  619.  822.  505.  679.  413.  459.  436.  294.
  245.  532.  372.  611.  320.  809. 6079.  272.  752.  488.  809.  440.
  584. 1683. 1234.  410.  391.  520.  382.  494.  459.  391.  310.  523.
  353.  407.  649.  702.  692.  326.  551.  195.  524.  243.  309.  621.
  480.  399.  282.  353.  733.  376. 1322.  367.  399.  499.  488.  354.
  522.  367.  377.  504.  710.  538.  684. 1012.  469.  641.  421.  554.
  708.  450.  521.  478.  491.  752.  420.  404.  566.  486.  778.  436.
  350.  866.  472. 1151.  406.  752.  473.  522.  478.  493.  539.  341.
  462.  260.  426.  354.  554.  504.  692.  578.  896.  783.  491.  454.
  813.  879. 1531.  456.  377.  838.  692.  637.  462.  440.  480.  534.
  333.  445.  459. 1602.  509.  637. 6072.  439.  368. 2297.  873.  384.
  341.  407.  360.  840.  258.



Prediction Vector: 
 [0 1 0 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 1
 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 1 0 1 1 1 1
 1 0 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 0 1 1 0 1 0 1 1 0 0
 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 0 1 0 1 0 1 1 0 0 1 0
 1 0 1 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 AUC: 
 1.0

 Confusion Matrix: 
 [[94  0]
 [ 0 79]]

 F-Score:
 1.0





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9705333369082066
Adjusted R-Squared:  0.9703610172410032
RMSE:  0.08550605072822794


Running Voting Regressor
R-Squared Value:  0.9999128894097994
Adjusted R-Squared:  0.9999123799911432
RMSE:  0.004649078158438239


-----
-----
-----
Running on dataset5




Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 91.005291005291

 Precision of event Happening: 
 94.23076923076923

 Recall of event Happening: 
 100.0

 F-Score:
 0.9095812061960508


Running SVM
Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0


Running Random Forest


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 99.97060552616108

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 0.9997008295773727





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9877765644354293
Adjusted R-Squared:  0.9877477440745344
RMSE:  0.15878988279560946


Running Voting Regressor
R-Squared Value:  0.9999004482124905
Adjusted R-Squared:  0.9999002134897377
RMSE:  0.014330149315410873


-----
-----
-----
-----
-----


###Running LDA with 15 features

In [13]:
print('Running on dataset1')
LDAImpl(dataset1, label_dataset1, ncomponents=15, multi=False)
print('-----')
print('Running on dataset2')
LDAImpl(dataset2, label_dataset2, ncomponents=15, multi=False)
print('-----')
print('-----')
print('Running on dataset3')
LDAImpl(dataset3, label_dataset3, ncomponents=15, multi=True)
print('-----')
print('-----')
print('-----')
print('Running on dataset4')
LDAImpl(dataset4, label_dataset4, ncomponents=15, multi=False)
print('-----')
print('-----')
print('-----')
print('Running on dataset5')
LDAImpl(dataset5, label_dataset5, ncomponents=15, multi=True)
print('-----')
print('-----')
print('-----')
print('-----')
print('-----')


Running on dataset1




Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 0 0
 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 0 1 1 1
 0 1 1 0 0 0 1 1 1 1]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 AUC: 
 1.0

 Confusion Matrix: 
 [[24  0]
 [ 0 97]]

 F-Score:
 1.0


Running LR
Prediction Vector: 
 [1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 0 0
 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 0 1 1 1
 0 1 1 0 0 0 1 1 1 1]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 AUC: 
 1.0

 Confusion Matrix: 
 [[24  0]
 [ 0 97]]

 F-Score:
 1.0


Running SVM
Prediction Vector: 
 [1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 1



Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 81.5625

 Precision of event Happening: 
 65.27545909849749

 Recall of event Happening: 
 36.61048689138577

 AUC: 
 0.6551853390657177

 Confusion Matrix: 
 [[3524  208]
 [ 677  391]]

 F-Score:
 0.46910617876424715


Running LR
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 81.47916666666667

 Precision of event Happening: 
 72.65822784810126

 Recall of event Happening: 
 26.872659176029963

 AUC: 
 0.619893842503944

 Confusion Matrix: 
 [[3624  108]
 [ 781  287]]

 F-Score:
 0.39234449760765555


Running SVM
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 81.52083333333333

 Precision of event Happening: 
 63.60902255639098

 Recall of event Happening: 
 39.60674157303371

 AUC: 
 0.6656114141888556

 Confusion Matrix: 
 [[3490  242]
 [ 645  423]]

 F-Score:
 0.48817080207732266


Running Random Forest
Prediction Vector: 
 [0 1 0 ... 0 0 0]

 Accuracy: 
 72.2083333333

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 512.  462.  887.  590. 1215.  359.  654.  410.  335.  401.  491.  624.
  733.  505.  520.  590.  624.  512.  369.  467.  590.  442.  345.  554.
  512.  467.  501.  326.  684.  686.  505.  637.  426.  478.  366.  326.
  353.  505.  428.  590.  326. 1308.  624.  279.  637.  581.  637.  410.
  554.  733.  574.  445.  367.  563.  467.  440.  495.  420.  384.  581.
  359.  456.  686.  637.  664.  354.  501.  359.  520.  335.  384.  684.
  495.  420.  368.  326.  887.  410.  624.  334.  410.  462.  462.  359.
  495.  342.  404.  467.  624.  505.  637.  684.  505.  712.  384.  505.
  712.  462.  521.  495.  467.  733.  462.  410.  505.  545.  696.  410.
  384.  624.  505.  637.  410.  733.  467.  505.  467.  449.  600.  384.
  467.  335.  410.  345.  520.  467.  686.  590.  707.  684.  501.  467.
  733.  469. 1308.  440.  334.  696.  664.  637.  462.  410.  491.  589.
  345.  404.  410. 1215.  505.  637.  624.  420.  369. 1215.  531.  426.
  335.  456.  345.  600.  374.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  487.   462.  3453.   405.  2520.   426.   767.   356.   389.   442.
   384.  5485.  1033.   617.   478.   649. 12903.   434.   369.   360.
   424.   584.   367.   426.   424.   401.   351.   350.   520.  1338.
   345.   405.   397.   431.   424.   462.   389.   332.   495.   389.
   406.  1033. 12903.   423.   664.   397.   809.   389.   467.  2534.
  7893.   426.   368.   391.   459.   462.   459.   369.   423.   512.
   442.   407.  1338.   554.   391.   426.   407.   294.   524.   294.
   406.   416.   462.   405.   259.   426.  1068.   395.  2534.   423.
   389.   401.   554.   401.   521.   389.   374.   369.  2520.   426.
   664.   965.   356.  1308.   377.   462.   617.   462.   405.   359.
   369.   664.   416.   410.   410.   380.  1308.   410.   462. 12903.
   359.   361.   406.   733.   356.   360.   424.   535.  1526.   366.
   391.   426.   391.   405.   391.   410.   658.   359.  1140.  4402.
   377.   384.   959.  2520. 11422.   399.   406.   889.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  467.   462.   664.   467.  1308.   410.   467.   410.   410.   410.
   467.   664.   467.   467.   467.   467.  2383.   467.   410.   467.
   467.   410.   410.   467.   467.   462.   467.   410.   467.   467.
   467.   467.   462.   467.   462.   410.   410.   467.   462.   467.
   410.  1308.   462.   410.   467.   467.   462.   467.   467.   664.
   467.   462.   410.   467.   462.   462.   462.   467.   410.   467.
   410.   462.   467.   467.   467.   410.   462.   410.   462.   410.
   410.   467.   462.   462.   410.   410.   664.   410.   664.   410.
   410.   462.   467.   410.   462.   410.   410.   467.   467.   467.
   467.   664.   467.   467.   410.   467.   467.   462.   467.   467.
   467.   467.   462.   410.   467.   467.  1215.   410.   462.   467.
   467.   467.   410.   467.   467.   467.   467.   467.   467.   410.
   410.   410.   410.   410.   467.   467.   467.   467.   664.   664.
   467.   467.   467.  1215.  4511.   462.   410.   467.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  512.   462.  1214.   495.  1091.   398.   814.   404.   281.   372.
   449.   664.   543.   541.   529.   649.   808.   558.   369.   467.
   588.   534.   236.   545.   486.   462.   482.   366.  1393.   740.
   494.   613.   431.   443.   422.   257.   283.   518.   372.   566.
   320.   809.  5631.   160.   664.   576.   809.   463.   554.   940.
   803.   410.   329.   520.   459.   486.   497.   394.   341.   523.
   353.   407.   649.   566.   802.   356.   490.   218.   524.   224.
   345.   624.   581.   399.   305.   354.  1014.   335.  1692.   374.
   382.   421.   480.   259.   522.   367.   452.   531.   866.   505.
   853.   734.   530.   752.   421.   554.   715.   462.   523.   478.
   491.   711.   416.   394.   605.   486.  1827.   410.   398.   674.
   501.  4941.   406.   928.   480.   558.   478.   443.   840.   341.
   403.   288.   410.   354.   512.   464.   674.   566.   670.   783.
   478.   481.   733.   773.   983.   440.   384.   564.



Prediction Vector: 
 [0 1 0 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 1
 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 1 0 1 1 1 1
 1 0 0 0 1 1 1 1 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 0 1 1 0 1 0 1 1 0 0
 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 1 0 0 1 1 1 0 0 0 1 0 1 0 1 1 0 0 1 0
 1 0 1 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 AUC: 
 1.0

 Confusion Matrix: 
 [[94  0]
 [ 0 79]]

 F-Score:
 1.0





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9705333369082066
Adjusted R-Squared:  0.9703610172410032
RMSE:  0.08550605072822794


Running Voting Regressor
R-Squared Value:  0.9999128894097994
Adjusted R-Squared:  0.9999123799911432
RMSE:  0.004649078158438239


-----
-----
-----
Running on dataset5




Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 91.005291005291

 Precision of event Happening: 
 94.23076923076923

 Recall of event Happening: 
 100.0

 F-Score:
 0.9095812061960508


Running SVM
Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0


Running Random Forest


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 99.94121105232216

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 0.9993902164092265





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9877765644354293
Adjusted R-Squared:  0.9877477440745344
RMSE:  0.15878988279560946


Running Voting Regressor
R-Squared Value:  0.9999004482124905
Adjusted R-Squared:  0.9999002134897377
RMSE:  0.014330149315410873


-----
-----
-----
-----
-----


##PCA

In [14]:
# PCA
def PCAImpl(df, label_col, ncomponents, multi=False):    
    df_cpy = df.copy()
    X,y = XYsplit(df_cpy, label_col)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
    
    sc = StandardScaler()
    X_scaled = sc.fit_transform(X_train)

    pca = PCA(n_components=6)
    X_pca = pca.fit_transform(X_scaled)
    X_pca = pd.DataFrame(X_pca)
    X_train_pca, X_test_pca, y_train, y_test = train_test_split(X_pca, y_train, test_size=0.20, shuffle=True, random_state=2)

    RunClassificationAlgorithms(X_train_pca, X_test_pca, y_train, y_test, verbose=True, multi=multi)
    print('\n\n')
    RunRegressionAlgorithms(X_train_pca, X_test_pca, y_train, y_test)

###Running PCA with 5 features

In [15]:
print('Running on dataset1')
PCAImpl(dataset1, label_dataset1, ncomponents=5, multi=False)
print('-----')
print('Running on dataset2')
PCAImpl(dataset2, label_dataset2, ncomponents=5, multi=False)
print('-----')
print('-----')
print('Running on dataset3')
PCAImpl(dataset3, label_dataset3, ncomponents=5, multi=True)
print('-----')
print('-----')
print('-----')
print('Running on dataset4')
PCAImpl(dataset4, label_dataset4, ncomponents=5, multi=False)
print('-----')
print('-----')
print('-----')
print('Running on dataset5')
PCAImpl(dataset5, label_dataset5, ncomponents=5, multi=True)
print('-----')
print('-----')
print('-----')
print('-----')

Running on dataset1
Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 0
 1 1 1 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 0
 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 0 1 1 1 0 1 0 1
 1 1 1 1 1 1 1 1 1 1]

 Accuracy: 
 80.99173553719008

 Precision of event Happening: 
 87.0

 Recall of event Happening: 
 89.69072164948454

 AUC: 
 0.6776202749140894

 Confusion Matrix: 
 [[11 13]
 [10 87]]

 F-Score:
 0.883248730964467


Running LR
Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 0
 1 1 1 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 1
 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1
 1 1 1 1 1 1 1 1 1 1]

 Accuracy: 
 82.64462809917356

 Precision of event Happening: 
 88.0

 Recall of event Happening: 
 90.72164948453609

 AUC: 
 0.7036082474226804

 Confusion Ma

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [456. 462. 359. 512. 346. 464. 437. 410. 505. 456. 389. 503. 467. 530.
 448. 380. 503. 462. 566. 459. 480. 462. 389. 434. 462. 462. 462. 344.
 520. 350. 480. 354. 332. 530. 522. 405. 332. 350. 359. 410. 359. 405.
 462. 345. 434. 434. 405. 389. 384. 328. 467. 389. 433. 335. 405. 464.
 520. 380. 389. 462. 369. 522. 501. 566. 354. 354. 456. 464. 359. 368.
 600. 505. 433. 405. 345. 369. 399. 467. 503. 389. 344. 512. 467. 404.
 462. 389. 410. 481. 422. 464. 359. 405. 433. 433. 505. 354. 456. 405.
 423. 433. 478. 388. 512. 410. 462. 332. 433. 342. 522. 480. 359. 503.
 433. 491. 410. 464. 359. 472. 421. 366. 462. 503. 369. 428. 428. 368.
 464. 384. 504. 554. 593. 332. 380. 399. 503. 456. 433. 516. 424. 332.
 437. 369. 457. 505. 472. 423. 369. 410. 443. 505. 503. 350. 384. 462.
 368. 389. 449. 520. 405. 279. 404. 374. 328. 541. 434. 462. 377. 472.
 328. 359. 450. 503. 462. 503. 434. 410. 512. 410. 505. 443. 480. 416.
 328. 501. 467. 405. 418. 464. 368. 328. 584. 462. 279. 

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 359.  416.  424.  401.  462.  380.  796.  289.  259.  399.  389.  878.
  696. 1323. 1072.  495.  817.  410.  407.  730.  359.  621.  346.  545.
  410.  300.  600.  344.  253.  369.  359.  478.  416.  588.  520.  424.
  384.  442.  359.  371.  520. 2617. 1785.  354.  664.  371. 1112.  437.
  443.  505.  467.  384.  505.  371.  405.  417.  675.  561.  389.  554.
  354.  472.  481.  478.  462.  347.  407.  255.  359.  259.  524.  505.
  505.  459.  354.  442.  424.  437.  733.  384.  347.  382.  520.  354.
  472.  384.  443.  553.  389.  350.  681. 1057.  664.  433.  542.  462.
  530.  467.  467.  440.  371.  733.  389.  462.  410.  500.  345.  462.
  426.  531.  535.  530. 1756.  733.  354.  366.  359.  407.  399.  300.
  462.  293.  410.  371.  440.  259.  380.  344.  545.  654.  359.  384.
  696.  462.  814.  459. 1376.  733.  274.  389.  437.  410.  374.  278.
  462.  369.  410.  467.  545.  505.  497.  531.  410.  462.  354.  367.
  354.  405.  459.  366.  259.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 359.  462.  467.  462.  462.  467.  462.  410.  345.  359.  410.  664.
  359.  424.  424.  380.  410.  410.  410.  424.  359.  462.  410.  410.
  359.  512.  462.  410.  424.  359.  359.  462.  367.  467.  467.  467.
  410.  359.  359.  367.  424.  495.  495.  410.  593.  462.  424.  445.
  410.  359.  410.  410.  433.  462.  405.  433.  380.  417.  367.  410.
  410.  407.  467.  462.  462.  367.  359.  433.  359.  345.  424.  505.
  433.  405.  410.  359.  467.  410.  505.  410.  367.  462.  467.  410.
  462.  410.  410.  359.  345.  359.  664.  424.  505.  377.  505.  410.
  359.  467.  359.  433.  405.  593.  495.  405.  410.  359.  433.  410.
  359.  359.  359.  410.  478.  664.  410.  345.  359.  359.  410.  367.
  462.  467.  359.  410.  462.  367.  467.  410.  467.  424.  359.  410.
  664.  405.  462.  405.  433.  505.  380.  367.  367.  410.  359.  433.
  462.  410.  359.  359.  410.  505.  478.  359.  359.  462.  410.  345.
  410.  405.  405.  410.  345.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  830.   422.   531.   466.   453.   426.   796.   535.   273.   592.
   337.   878.   713.   694.  1206.  1100.  1985.   428.   394.   397.
   309.   389.   319.   316.   462.   416.   520.   334.   253.   384.
   590.   613.   422.   521.   486.   405.   250.   442.   415.   222.
   354.  2147.  1785.   345.   641.   379.  1112.   463.   483.   491.
   371.   301.   346.   367.   366.   377.   453.   351.   335.   620.
   369.   407.   531.   378.   382.   224.   472.   255.   424.   394.
  1811.   293.   376.   405.   345.   369.   380.   501.   733.   289.
   390.   378.   332.   345.   694.   467.   652.   444.   866.   481.
   753.   675.   590.   368.   401.   569.   530.   405.   341.   422.
   478.   491.   367.   324.   923.   710.   433.   670.   830.   662.
   798.   963.   353.   696.   392.   315.   590.   431.   369.   707.
   472.   395.   462.   281.   395.   495.   335.   455.   545.  1308.
   476.   554.   975.   281.  6072.   619.   464.  1060.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 4 5 ... 5 7 5]

 Accuracy: 
 83.45091122868901

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 97.95918367346938

 F-Score:
 0.8326294983882135


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 90.8289241622575

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 0.9071675933400235


Running SVM


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 97.38389182833627

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 0.9720124616575642


Running Random Forest


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 98.94179894179894

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 97.95918367346938

 F-Score:
 0.9893224637813124





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9069877171260884
Adjusted R-Squared:  0.9068233360665174
RMSE:  0.4380224159802926


Running Voting Regressor
R-Squared Value:  0.9856462096481821
Adjusted R-Squared:  0.9856208421247326
RMSE:  0.17207171362577023


-----
-----
-----
-----


###Running PCA with 10 features

In [16]:
print('Running on dataset1')
PCAImpl(dataset1, label_dataset1, ncomponents=10, multi=False)
print('-----')
print('Running on dataset2')
PCAImpl(dataset2, label_dataset2, ncomponents=10, multi=False)
print('-----')
print('-----')
print('Running on dataset3')
PCAImpl(dataset3, label_dataset3, ncomponents=10, multi=True)
print('-----')
print('-----')
print('-----')
print('Running on dataset4')
PCAImpl(dataset4, label_dataset4, ncomponents=10, multi=False)
print('-----')
print('-----')
print('-----')
print('Running on dataset5')
PCAImpl(dataset5, label_dataset5, ncomponents=10, multi=True)
print('-----')
print('-----')
print('-----')
print('-----')

Running on dataset1
Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 0
 1 1 1 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 0
 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 0 1 1 1 0 1 0 1
 1 1 1 1 1 1 1 1 1 1]

 Accuracy: 
 80.99173553719008

 Precision of event Happening: 
 87.0

 Recall of event Happening: 
 89.69072164948454

 AUC: 
 0.6776202749140894

 Confusion Matrix: 
 [[11 13]
 [10 87]]

 F-Score:
 0.883248730964467


Running LR
Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 0
 1 1 1 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 1
 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1
 1 1 1 1 1 1 1 1 1 1]

 Accuracy: 
 82.64462809917356

 Precision of event Happening: 
 88.0

 Recall of event Happening: 
 90.72164948453609

 AUC: 
 0.7036082474226804

 Confusion Ma

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [456. 462. 359. 512. 346. 397. 437. 410. 428. 456. 389. 503. 467. 530.
 448. 434. 503. 462. 566. 459. 480. 462. 389. 434. 462. 462. 462. 347.
 520. 350. 480. 354. 389. 530. 522. 405. 332. 350. 359. 410. 464. 405.
 462. 345. 590. 434. 405. 389. 384. 328. 467. 389. 433. 335. 405. 464.
 520. 359. 389. 462. 369. 522. 501. 566. 354. 354. 456. 464. 359. 368.
 600. 505. 368. 405. 345. 410. 399. 467. 503. 389. 344. 512. 328. 404.
 462. 389. 410. 481. 422. 464. 359. 405. 433. 433. 505. 354. 456. 405.
 423. 433. 478. 335. 512. 410. 462. 332. 433. 342. 522. 480. 359. 503.
 433. 491. 462. 464. 359. 472. 421. 366. 462. 503. 369. 379. 428. 368.
 464. 462. 434. 554. 593. 332. 380. 399. 503. 410. 433. 516. 424. 332.
 388. 369. 457. 505. 472. 423. 369. 410. 443. 505. 503. 410. 384. 462.
 368. 389. 449. 520. 405. 279. 404. 374. 328. 541. 434. 462. 377. 472.
 328. 359. 450. 503. 462. 503. 434. 522. 512. 410. 505. 443. 480. 416.
 328. 501. 467. 405. 590. 464. 368. 328. 584. 462. 279. 

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 359.  416.  424.  401.  462.  380.  796.  289.  259.  399.  389.  878.
  696. 1323. 1072.  495.  817.  410.  407.  730.  359.  621.  346.  545.
  410.  300.  600.  344.  253.  369.  359.  478.  416.  588.  520.  424.
  384.  442.  359.  371.  520. 2617. 1785.  354.  664.  371. 1112.  437.
  443.  505.  467.  384.  505.  371.  405.  417.  675.  561.  389.  554.
  354.  472.  481.  478.  354.  347.  407.  255.  359.  259.  524.  505.
  505.  459.  354.  369.  424.  437.  733.  384.  347.  382.  520.  354.
  472.  384.  443.  553.  389.  350.  681. 1057.  664.  433.  542.  462.
  530.  467.  467.  440.  371.  733.  389.  462.  410.  500.  345.  462.
  426.  531.  535.  530. 1756.  733.  354.  366.  359.  407.  399.  300.
  462.  293.  410.  371.  440.  259.  380.  344.  545.  654.  359.  384.
  696.  462.  814.  459. 1376.  733.  274.  389.  437.  410.  374.  278.
  462.  442.  410.  467.  545.  505.  497.  531.  410.  462.  354.  367.
  354.  405.  459.  366.  389.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 359.  462.  467.  462.  462.  467.  462.  410.  345.  359.  410.  664.
  359.  424.  424.  380.  410.  410.  410.  424.  359.  462.  410.  410.
  359.  512.  462.  410.  424.  359.  359.  462.  367.  467.  467.  467.
  410.  359.  359.  367.  424.  495.  495.  410.  593.  462.  424.  445.
  410.  359.  410.  410.  433.  462.  405.  433.  380.  417.  367.  410.
  410.  407.  467.  462.  462.  367.  359.  433.  359.  345.  424.  505.
  433.  405.  410.  359.  467.  410.  505.  410.  367.  462.  467.  410.
  462.  410.  410.  359.  345.  359.  664.  424.  505.  377.  505.  410.
  359.  467.  359.  433.  405.  593.  495.  405.  410.  359.  433.  410.
  359.  359.  359.  410.  478.  664.  410.  345.  359.  359.  410.  367.
  462.  467.  359.  410.  462.  367.  467.  410.  467.  424.  359.  410.
  664.  405.  462.  405.  433.  505.  380.  367.  367.  410.  359.  433.
  462.  410.  359.  359.  410.  505.  478.  359.  359.  462.  410.  345.
  410.  405.  405.  410.  345.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  407.   384.  1602.   466.   461.   326.   796.   535.   273.   450.
   337.   664.  1308.   694.   535.   578.  1985.   309.   351.  1563.
   309.   389.   170.   518.   462.   460.   462.   334.  1196.   384.
   590.   566.   429.   521.   486.   840.   250.   560.   502.   288.
   354.  1196.   497.   410.   665.   543.   865.   852.   483.   412.
   434.   301.   369.   367.   399.   464.   453.   417.   335.   740.
   441.   407.   531.   661.   382.   224.   524.   255.   424.   294.
  1811.   505.   376.   467.   410.   369.   492.   371.   733.   483.
   390.   348.   332.   345.   730.   467.   520.   481.   594.   448.
   783.   675.   687.   366.   401.   569.   835.   405.   341.   599.
   495.   858.   367.   560.   923.   710.   462.   670.   830.   578.
   798.   963.   353.   733.   356.   283.   399.   431.  1218.   707.
   533.   274.   350.   281.   462.   484.   467.   462.   670.  1308.
   476.   554.   975.   388.   503.   619.   353.   874.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 4 5 ... 5 7 5]

 Accuracy: 
 83.45091122868901

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 97.95918367346938

 F-Score:
 0.8326294983882135


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 90.91710758377425

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 0.9080282348360112


Running SVM


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 97.38389182833627

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 0.9720124616575642


Running Random Forest


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 98.97119341563786

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 97.95918367346938

 F-Score:
 0.9896134389489288





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9069877107521289
Adjusted R-Squared:  0.9068233296812932
RMSE:  0.43802243098872484


Running Voting Regressor
R-Squared Value:  0.9857215757986935
Adjusted R-Squared:  0.9856963414702081
RMSE:  0.17161937838308605


-----
-----
-----
-----


###Running PCA with 15 features

In [17]:
print('Running on dataset1')
PCAImpl(dataset1, label_dataset1, ncomponents=15, multi=False)
print('-----')
print('Running on dataset2')
PCAImpl(dataset2, label_dataset2, ncomponents=15, multi=False)
print('-----')
print('-----')
print('Running on dataset3')
PCAImpl(dataset3, label_dataset3, ncomponents=15, multi=True)
print('-----')
print('-----')
print('-----')
print('Running on dataset4')
PCAImpl(dataset4, label_dataset4, ncomponents=15, multi=False)
print('-----')
print('-----')
print('-----')
print('Running on dataset5')
PCAImpl(dataset5, label_dataset5, ncomponents=15, multi=True)
print('-----')
print('-----')
print('-----')
print('-----')

Running on dataset1
Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 0
 1 1 1 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 0
 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 0 1 1 1 0 1 0 1
 1 1 1 1 1 1 1 1 1 1]

 Accuracy: 
 80.99173553719008

 Precision of event Happening: 
 87.0

 Recall of event Happening: 
 89.69072164948454

 AUC: 
 0.6776202749140894

 Confusion Matrix: 
 [[11 13]
 [10 87]]

 F-Score:
 0.883248730964467


Running LR
Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 0
 1 1 1 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 1
 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1
 1 1 1 1 1 1 1 1 1 1]

 Accuracy: 
 82.64462809917356

 Precision of event Happening: 
 88.0

 Recall of event Happening: 
 90.72164948453609

 AUC: 
 0.7036082474226804

 Confusion Ma

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [456. 462. 359. 512. 346. 359. 437. 410. 428. 520. 389. 503. 467. 530.
 448. 434. 503. 462. 566. 459. 480. 462. 332. 434. 462. 462. 462. 347.
 424. 350. 480. 354. 389. 530. 522. 405. 332. 350. 359. 410. 359. 405.
 462. 345. 590. 434. 405. 389. 384. 328. 467. 389. 433. 335. 405. 464.
 520. 593. 389. 462. 369. 522. 501. 566. 354. 344. 456. 464. 359. 368.
 600. 505. 433. 405. 345. 410. 399. 389. 503. 389. 344. 512. 328. 404.
 462. 389. 410. 481. 422. 464. 359. 405. 433. 464. 505. 354. 456. 405.
 423. 433. 478. 388. 512. 410. 462. 332. 433. 342. 522. 480. 410. 503.
 433. 491. 462. 464. 359. 472. 421. 366. 462. 503. 369. 379. 428. 368.
 464. 410. 434. 554. 593. 332. 380. 399. 503. 456. 433. 516. 424. 332.
 437. 369. 457. 505. 472. 423. 369. 410. 443. 505. 503. 410. 384. 462.
 368. 389. 449. 520. 405. 279. 404. 374. 328. 541. 434. 462. 377. 472.
 328. 380. 450. 503. 554. 503. 451. 522. 512. 410. 505. 443. 480. 426.
 326. 279. 467. 405. 590. 464. 368. 328. 584. 462. 279. 

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 359.  416.  424.  401.  462.  380.  796.  289.  259.  399.  389.  878.
  696. 1323. 1072.  495.  817.  410.  407.  730.  359.  621.  346.  545.
  410.  300.  600.  344.  253.  369.  359.  462.  416.  588.  520.  424.
  384.  442.  359.  371.  520. 2617. 1785.  354.  664.  371. 1112.  437.
  443.  505.  467.  384.  505.  371.  405.  417.  675.  561.  389.  554.
  354.  472.  481.  478.  509.  347.  407.  255.  359.  259.  524.  505.
  505.  459.  354.  369.  424.  437.  733.  384.  347.  382.  424.  354.
  472.  384.  443.  553.  389.  350.  681. 1057.  664.  433.  542.  462.
  530.  467.  467.  440.  371.  733.  389.  462.  410.  500.  345.  462.
  426.  531.  535.  530. 1756.  733.  354.  366.  359.  407.  399.  300.
  462.  293.  410.  371.  440.  259.  380.  344.  545.  654.  359.  384.
  696.  462.  814.  459. 1376.  733.  274.  389.  437.  410.  374.  278.
  462.  442.  410.  467.  545.  505.  497.  531.  410.  462.  354.  367.
  354.  405.  459.  366.  259.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 359.  462.  467.  462.  462.  467.  462.  410.  345.  359.  410.  664.
  359.  424.  424.  380.  410.  410.  410.  424.  359.  462.  410.  410.
  359.  512.  462.  410.  424.  359.  359.  462.  367.  467.  467.  467.
  410.  359.  359.  367.  424.  495.  495.  410.  593.  462.  424.  445.
  410.  359.  410.  410.  433.  462.  405.  433.  380.  417.  367.  410.
  410.  407.  467.  462.  462.  367.  359.  433.  359.  345.  424.  505.
  433.  405.  410.  359.  467.  410.  505.  410.  367.  462.  467.  410.
  462.  410.  410.  359.  345.  359.  664.  424.  505.  377.  505.  410.
  359.  467.  359.  433.  405.  593.  495.  405.  410.  359.  433.  410.
  359.  359.  359.  410.  478.  664.  410.  345.  359.  359.  410.  367.
  462.  467.  359.  410.  462.  367.  467.  410.  467.  424.  359.  410.
  664.  405.  462.  405.  433.  505.  380.  367.  367.  410.  359.  433.
  462.  410.  359.  359.  410.  505.  478.  359.  359.  462.  410.  345.
  410.  405.  405.  410.  345.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  830.   347.   531.   530.   378.   454.   796.   442.   273.   424.
   337.   664.   586.   694.   512.   424.  1985.   428.   505.   397.
   309.   389.   319.   316.   462.   300.   520.   334.   313.   384.
   363.   613.   422.   521.   486.   421.   283.   320.   502.   288.
   354.  1112.  1061.   603.   711.   392.  1112.   346.   289.   842.
   326.   301.   464.   367.   399.   464.   453.   425.   335.   427.
   429.   407.   531.   461.   382.   224.   524.   255.   424.   273.
  1811.   505.   376.   478.   251.   369.   477.   501.   733.   289.
   390.   348.   332.   345.   730.   301.   800.   444.   866.   448.
   783.   675.   593.   369.   270.   569.   835.   405.   341.   676.
   487.   858.   400.   324.   923.  4535.   464.   670.   830.   578.
   798.   963.   353.   696.   356.   717.   384.   456.  1218.   707.
   378.   395.   462.   281.   395.   288.   467.   477.   670.  1308.
   476.   554.   654.   360.  5392.   514.   464.  1060.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 4 5 ... 5 7 5]

 Accuracy: 
 83.45091122868901

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 97.95918367346938

 F-Score:
 0.8326294983882135


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 90.97589653145208

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 0.9085852356418069


Running SVM


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 97.38389182833627

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 0.9720124616575642


Running Random Forest


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 98.97119341563786

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 97.95918367346938

 F-Score:
 0.9896608667912442





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9069876976121515
Adjusted R-Squared:  0.9068233165180934
RMSE:  0.4380224619287474


Running Voting Regressor
R-Squared Value:  0.9857216430956967
Adjusted R-Squared:  0.9856964088861456
RMSE:  0.17161897394477774


-----
-----
-----
-----


##TSNE

In [18]:
# t-SNE
def TSNEImpl(df, label_col, ncomponents, multi=False):
    df_cpy = df.copy()
    X,y = XYsplit(df_cpy, label_col)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
    
    sc = StandardScaler()
    X_scaled = sc.fit_transform(X_train)

    tsne = TSNE(n_components=ncomponents)
    X_tsne = tsne.fit_transform(X_scaled)
    X_tsne = pd.DataFrame(X_tsne)
    X_train_tsne, X_test_tsne, y_train, y_test = train_test_split(X_tsne, y_train, test_size=0.20, shuffle=True, random_state=2)

    RunClassificationAlgorithms(X_train_tsne, X_test_tsne, y_train, y_test, verbose=True, multi=multi)
    print('\n\n')
    RunRegressionAlgorithms(X_train_tsne, X_test_tsne, y_train, y_test)


###Running TSNE with 3 features

In [19]:
print('Running on dataset1')
TSNEImpl(dataset1, label_dataset1, ncomponents=3, multi=False)
print('-----')
print('Running on dataset2')
TSNEImpl(dataset2, label_dataset2, ncomponents=3, multi=False)
print('-----')
print('-----')
print('Running on dataset3')
TSNEImpl(dataset3, label_dataset3, ncomponents=3, multi=True)
print('-----')
print('-----')
print('-----')
print('Running on dataset4')
TSNEImpl(dataset4, label_dataset4, ncomponents=3, multi=False)
print('-----')
print('-----')
print('-----')
print('Running on dataset5')
TSNEImpl(dataset5, label_dataset5, ncomponents=3, multi=True)
print('-----')
print('-----')
print('-----')
print('-----')

Running on dataset1
Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0
 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1
 1 1 1 1 1 1 1 1 1 1]

 Accuracy: 
 83.47107438016529

 Precision of event Happening: 
 85.3211009174312

 Recall of event Happening: 
 95.87628865979381

 AUC: 
 0.6460481099656358

 Confusion Matrix: 
 [[ 8 16]
 [ 4 93]]

 F-Score:
 0.9029126213592232


Running LR
Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0
 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1
 1 1 1 1 1 0 1 1 1 1]

 Accuracy: 
 84.29752066115702

 Precision of event Happening: 
 86.11111111111111

 Recall of event Happening: 
 95.87628865979381

 AUC: 
 0.66688

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [482. 462. 332. 335. 371. 664. 405. 443. 487. 374. 482. 664. 488. 481.
 520. 505. 384. 501. 332. 332. 350. 481. 347. 316. 332. 405. 451. 384.
 520. 448. 482. 405. 664. 481. 448. 332. 350. 588. 316. 467. 448. 359.
 351. 451. 332. 482. 359. 384. 482. 428. 404. 384. 433. 371. 405. 424.
 424. 332. 347. 566. 374. 462. 407. 416. 405. 437. 487. 377. 448. 467.
 588. 520. 433. 467. 467. 423. 332. 384. 664. 482. 371. 462. 350. 467.
 482. 384. 482. 410. 405. 357. 495. 664. 664. 433. 664. 351. 481. 332.
 420. 405. 335. 664. 437. 462. 342. 481. 416. 342. 581. 410. 380. 481.
 433. 505. 581. 487. 369. 593. 359. 369. 371. 259. 359. 335. 451. 487.
 377. 462. 342. 664. 488. 374. 664. 410. 351. 467. 433. 505. 664. 259.
 404. 462. 448. 259. 405. 350. 407. 332. 326. 478. 405. 423. 369. 388.
 332. 467. 332. 451. 450. 424. 467. 733. 405. 664. 350. 416. 664. 487.
 520. 424. 462. 481. 374. 505. 404. 481. 371. 423. 377. 581. 359. 467.
 410. 482. 384. 593. 488. 359. 359. 531. 346. 482. 664. 

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 531.  562.  505.  401.  382.  380.  405.  443.  397.  467.  462.  681.
  578.  545.  664.  505.  531.  501.  467.  450.  467.  724.  401.  501.
  467.  562.  401.  423.  690.  359.  480.  382.  562.  588.  708.  664.
  357.  439.  503.  383.  721.  733. 1182.  401.  505.  562.  733.  346.
  569.  359.  451.  437.  433.  462.  405.  380.  654.  542.  401.  600.
  501.  660.  531.  259.  451.  383.  637.  380. 1215.  401.  721.  733.
  433.  376.  410.  359.  505.  437.  975.  569.  462.  562.  359.  383.
  660.  451.  443.  531.  405.  359.  733.  491.  654.  433.  380.  354.
  545.  450.  481.  653.  462.  593.  437.  462.  501.  481.  416.  490.
  535.  443.  481.  690.  433.  733.  604.  377.  424. 1182.  447.  300.
  613.  259.  424.  462.  401.  240.  505.  405.  708.  593.  359.  569.
  681.  462.  654.  450.  433.  733.  274.  397.  437.  501.  359.  433.
  382.  325.  359.  410.  604.  733. 1182.  359.  359.  613.  437.  705.
  467.  401.  450.  462.  383.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [359. 462. 467. 462. 371. 380. 405. 410. 467. 462. 462. 664. 359. 442.
 593. 433. 410. 462. 467. 359. 359. 481. 467. 462. 467. 462. 462. 354.
 520. 359. 359. 371. 462. 442. 520. 380. 410. 359. 520. 467. 495. 495.
 380. 467. 505. 462. 495. 437. 410. 359. 371. 437. 397. 462. 405. 380.
 380. 505. 467. 462. 462. 462. 424. 462. 462. 467. 405. 345. 424. 467.
 495. 505. 397. 467. 467. 359. 380. 437. 505. 410. 462. 462. 359. 467.
 462. 437. 410. 359. 462. 359. 664. 593. 593. 433. 380. 462. 481. 359.
 481. 388. 462. 505. 437. 405. 462. 481. 462. 405. 462. 410. 481. 520.
 345. 664. 462. 467. 359. 520. 410. 462. 371. 345. 359. 462. 462. 462.
 433. 462. 405. 593. 359. 462. 664. 405. 380. 467. 397. 505. 380. 462.
 437. 462. 359. 345. 462. 359. 359. 410. 462. 505. 380. 359. 359. 405.
 410. 467. 462. 462. 467. 462. 467. 664. 405. 424. 505. 462. 345. 405.
 593. 380. 467. 520. 462. 505. 462. 467. 462. 410. 345. 467. 359. 462.
 437. 501. 462. 505. 359. 359. 359. 424. 371. 410. 664. 

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  798.   676.   330.   377.   472.   321.   613.   480.   270.   316.
   335.   664.   663.   521.  1112.   578.  4535.   354.   423.   514.
   299.   534.   354.   518.   619.   458.   495.   357.   313.   476.
   472.   377.   333.   530.   581.   819.   350.   578.   380.   255.
   320.   554. 11422.   366.   643.   660.   809.   463.   496.   590.
   420.   288.   369.   355.   405.   819.   703.   351.   355.   516.
   597.   407.   531.   416.   458.   398.   524.   323.   534.   603.
   665.   520.   397.   410.   296.   469.   791.   401.   733.   589.
   405.   764.   590.   296.   522.   404.   480.   652.   345.   444.
   674.  1371.   491.   376.   323.   510.   530.   461.   616.   676.
   541.   557.   418.   324.   462.   726.   487.   734.   466.   584.
   486.  4941.   645.  1542.   569.   323.   445.   617.   358.   331.
   382.   217.   523.   541.   377.   240.   406.   590.   581.  1308.
   532.   599.   959.   410.  5392.   397.   415.   542.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 5 ... 2 6 6]

 Accuracy: 
 66.2551440329218

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 91.83673469387756

 F-Score:
 0.6582514456029448


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [4 2 5 ... 5 7 4]

 Accuracy: 
 36.6549088771311

 Precision of event Happening: 
 89.28571428571429

 Recall of event Happening: 
 51.02040816326531

 F-Score:
 0.3558358047841033


Running SVM


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 6 6 6]

 Accuracy: 
 82.06937095825985

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 95.91836734693877

 F-Score:
 0.8182476631539937


Running Random Forest


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [2 3 6 ... 5 6 6]

 Accuracy: 
 98.91240446796003

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 97.95918367346938

 F-Score:
 0.9890968961149026





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.26287471224468784
Adjusted R-Squared:  0.2622239247628556
RMSE:  1.2330958597539088


Running Voting Regressor
R-Squared Value:  0.9068182376235969
Adjusted R-Squared:  0.9067359700287972
RMSE:  0.4384212989717953


-----
-----
-----
-----


##XGBoost

In [20]:
#XGBoost Classifier

def XGBImpl(df, label_col, multi=False):
  df_cpy = df.copy()
  X,y = XYsplit(df_cpy, label_col)
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

  # First we need to normalize the data
  sc = StandardScaler()
  sc.fit_transform(X_train)
  sc.transform(X_test)

  scaler = StandardScaler()
  scaler.fit(X_train.fillna(0))

  # Now perform L1

  model = XGBClassifier()
  model.fit(X_train, y_train)

  # make predictions for test set
  y_pred = model.predict(X_test)
  predictions = [round(value) for value in y_pred]

  accuracy = accuracy_score(y_test, predictions)
  feature_important = model.get_booster().get_score(importance_type='weight')
  selected_feat = list(feature_important.keys())
  
  ndf = df_cpy[df_cpy.columns.intersection(selected_feat)]
  ndf[label_col] = df_cpy[label_col]

  X,y = XYsplit(ndf.copy(), label_col)
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

  RunClassificationAlgorithms(X_train, X_test, y_train, y_test, verbose=True, multi=multi)
  print('\n\n')
  RunRegressionAlgorithms(X_train, X_test, y_train, y_test)

In [21]:
print('Running on dataset1')
XGBImpl(dataset1, label_dataset1, multi=False)
print('-----')
print('Running on dataset2')
XGBImpl(dataset2, label_dataset2, multi=False)
print('-----')
print('-----')
print('Running on dataset3')
XGBImpl(dataset3, label_dataset3, multi=True)
print('-----')
print('-----')
print('-----')
print('Running on dataset4')
XGBImpl(dataset4, label_dataset4, multi=False)
print('-----')
print('-----')
print('-----')
print('Running on dataset5')
XGBImpl(dataset5, label_dataset5, multi=True)
print('-----')
print('-----')
print('-----')
print('-----')

Running on dataset1


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [1 0 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1
 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 1 0 1]

 Accuracy: 
 83.55263157894737

 Precision of event Happening: 
 86.1788617886179

 Recall of event Happening: 
 92.98245614035088

 AUC: 
 0.7412280701754386

 Confusion Matrix: 
 [[ 21  17]
 [  8 106]]

 F-Score:
 0.8945147679324894


Running LR
Prediction Vector: 
 [0 0 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 0 0 0 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 1 1 1 1 1 0 1 1 0 0 1
 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 0 1
 1 0 1 1]

 Accuracy: 
 80.26315789473685

 Preci

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression



 Accuracy: 
 75.0

 Precision of event Happening: 
 75.0

 Recall of event Happening: 
 100.0

 AUC: 
 0.5

 Confusion Matrix: 
 [[  0  38]
 [  0 114]]

 F-Score:
 0.8571428571428571


Running Random Forest
Prediction Vector: 
 [0 0 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1
 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 0 1 1]

 Accuracy: 
 86.18421052631578

 Precision of event Happening: 
 87.2

 Recall of event Happening: 
 95.6140350877193

 AUC: 
 0.7675438596491229

 Confusion Matrix: 
 [[ 22  16]
 [  5 109]]

 F-Score:
 0.9121338912133891





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.19091534446076996
Adjusted R-Squared:  3.002816114531537
RMSE:  0.3894911718044526


Running Voting Regressor
R-Squared Value:  0.49031034550326147
Adjusted 

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.4

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 0.07710100231303006

 AUC: 
 0.5003855050115652

 Confusion Matrix: 
 [[4703    0]
 [1296    1]]

 F-Score:
 0.0015408320493066254


Running SVM


  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.38333333333334

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 AUC: 
 0.5

 Confusion Matrix: 
 [[4703    0]
 [1297    0]]

 F-Score:
 0.0


Running Random Forest
Prediction Vector: 
 [0 0 0 ... 0 1 0]

 Accuracy: 
 82.56666666666666

 Precision of event Happening: 
 67.2627235213205

 Recall of event Happening: 
 37.70239013107171

 AUC: 
 0.6632089525690307

 Confusion Matrix: 
 [[4465  238]
 [ 808  489]]

 F-Score:
 0.483201581027668





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.1269773125755176
Adjusted R-Squared:  0.12347061056745268
RMSE:  0.38460860088708904


Running Voting Regressor
R-Squared Value:  0.18774402341616248
Adjusted R-Squared:  0.1844814052675412
RMSE:  0.3709818689663022


-----
-----
Running on dataset3
Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 590.  590.  410.  456.  483.  467.  410.  357.  366.  345.  470.  463.
  590.  482.  384.  388.  463. 1308.  467.  404. 1308.  369.  566.  664.
  600.  300.  354. 1308.  410.  464.  664.  496.  467.  664.  377.  354.
  467.  532.  608.  462.  410.  855.  367. 1308.  450.  342.  597.  345.
  354.  535.  581.  495. 1308.  664.  593.  679.  664.  399.  482.  429.
  462.  320.  542. 1308.  335.  394.  520.  456.  855.  467.  410.  464.
  335.  382.  388.  366.  464.  369.  590.  366.  855.  399.  664.  306.
  350.  587.  496. 1308.  354.  422.  574.  366.  369.  797.  496.  566.
  495.  449.  494.  464.  326.  300.  417.  410. 1308.  496.  562.  566.
  410.  696.  467.  371.  377. 1308.  733.  467.  590.  494.  467.  344.
  426.  679.  576.  590.  445.  664.  505.  326.  442.  417.  442.  462.
  369.  467.  497.  369.  467.  410.  496.  512.  478.  359.  554.  467.
  369.  322.  733.  410.  377.  478.  294.  399.  410.  797.  494.  345.
 1308.  388.  410.  664.  371.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 300.  517.  349.  388.  509.  466.  481.  349.  405.  389.  373.  459.
  419.  368.  387.  368.  224. 6033.  658.  361. 6033.  349.  636. 1059.
  476.  395.  347.  687.  445.  445.  464.  674. 1453.  320.  251.  421.
  767.  608.  693.  702.  251.  923.  600. 5041.  450.  315.  478.  360.
  349.  425.  906.  468.  965.  647.  684.  566.  469.  439.  552.  377.
  448.  368.  603. 6033.  407.  449.  500.  530.  612.  467.  334.  450.
  399.  437.  354.  495.  259.  589.  581.  494.  634.  504.  576.  336.
  445.  459.  619. 1201.  429.  389.  696.  445.  399.  534. 1244.  588.
  771.  679.  332.  405.  382.  300.  655.  427. 1218. 1215.  401. 1185.
  593.  587.  593.  338.  450. 6033.  612.  504. 1064.  481.  437.  345.
 1308.  887.  487.  847.  305.  416.  505.  359.  501.  679.  428.  599.
  876.  600.  664.  462.  389.  408.  489.  458.  866.  324.  550.  684.
  522.  349.  633.  593.  380.  773.  279.  440.  429.  532.  531.  366.
 1405.  655.  418.  624.  417.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410. 11504.   410.   410.
  5392.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   463.   410.   410.   410.
   410.   410.   410.   462.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 589.  586.  401.  426.  660.  462.  374.  356.  378.  348.  488.  463.
  549.  482.  392.  377.  434. 5392.  453.  443. 5392.  356.  864.  661.
 1206.  258.  380. 1420.  445.  440.  733.  553.  455.  665.  293.  319.
  515.  875.  677.  462.  258. 1175.  403.  814.  441.  332.  590.  332.
  347.  549.  906.  491.  816.  606.  716. 1234.  712.  359.  494.  377.
  462.  326.  888.  753.  257.  377.  584.  411.  867.  462.  401.  496.
  335.  637.  377.  388.  469.  367.  562.  380. 1291.  410.  664.  258.
  375.  855.  864. 1206.  356.  445.  887.  307.  366.  963. 1215.  559.
  518.  462.  516.  411.  279.  336.  416.  418. 4309. 1166.  558.  754.
  395.  797.  496.  390.  366.  866.  718.  504.  600.  528.  437.  294.
  448.  743. 1182.  613.  429.  602.  505.  329.  503.  394.  418.  462.
  423.  489.  589.  354.  450.  410.  576.  523.  797.  375.  556.  457.
  366.  354.  957.  410.  380.  608.  279.  399.  420.  855.  498.  355.
  912.  384.  426.  814.  341.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 1 0 0
 0 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 1
 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 0 0
 0 0 1 1 0 1 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 0
 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 1]

 Accuracy: 
 98.61111111111111

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 97.16981132075472

 AUC: 
 0.9858490566037736

 Confusion Matrix: 
 [[110   0]
 [  3 103]]

 F-Score:
 0.985645933014354


Running LR
Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 1 1 1 0 0
 0 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 1
 0 0 1 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [6 5 4 ... 2 4 4]

 Accuracy: 
 62.896778744415705

 Precision of event Happening: 
 94.64285714285714

 Recall of event Happening: 
 86.88524590163934

 F-Score:
 0.6008483586605413


Running SVM


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [6 5 4 ... 2 5 5]

 Accuracy: 
 63.53162473548084

 Precision of event Happening: 
 91.04477611940298

 Recall of event Happening: 
 100.0

 F-Score:
 0.5977668680507742


Running Random Forest


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9811554179730846
Adjusted R-Squared:  0.9809583738644382
RMSE:  0.1971520990325714


Running Voting Regressor
R-Squared Value:  0.9940497595995743
Adjusted R-Squared:  0.9939875422569843
RMSE:  0.1107835829217884


-----
-----
-----
-----


##L1

In [22]:
# L1
import pandas as pd

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso, LogisticRegression
from sklearn.feature_selection import SelectFromModel
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import SelectFromModel

def L1(df, label_col, nestimators, multi=False):
  df_cpy = df.copy()

  X,y = XYsplit(df_cpy, label_col)
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

  scaler = StandardScaler()
  scaler.fit(X_train.fillna(0))

  sel_ = SelectFromModel((RandomForestClassifier(n_estimators = nestimators)))
  sel_.fit(scaler.transform(X_train.fillna(0)), y_train)
  selected_feat = X_train.columns[(sel_.get_support())]

  X_train_selected = sel_.transform(X_train.fillna(0))
  X_test_selected = sel_.transform(X_test.fillna(0))
  X_train_selected.shape, X_test_selected.shape

  ndf = df_cpy[df_cpy.columns.intersection(selected_feat.values)]
  ndf[label_col] = df_cpy[label_col]

  X,y = XYsplit(ndf.copy(), label_col)
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

  RunClassificationAlgorithms(X_train, X_test, y_train, y_test, verbose=True, multi=multi)
  print('\n\n')
  RunRegressionAlgorithms(X_train, X_test, y_train, y_test)

###Running L1 with 25 estimators

In [23]:
print('Running on dataset1')
L1(dataset1, label_dataset1, 25, multi=False)
print('-----')
print('Running on dataset2')
L1(dataset2, label_dataset2, 25, multi=False)
print('-----')
print('-----')
print('Running on dataset3')
L1(dataset3, label_dataset3, 25, multi=True)
print('-----')
print('-----')
print('-----')
print('Running on dataset4')
L1(dataset4, label_dataset4, 25, multi=False)
print('-----')
print('-----')
print('-----')
print('Running on dataset5')
L1(dataset5, label_dataset5, 25, multi=True)
print('-----')
print('-----')
print('-----')
print('-----')

Running on dataset1


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 1 0 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 1 1 1 1 1 0 1 1 0 1 1
 1 1 1 1 0 1 1 0 0 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 1 1 1]

 Accuracy: 
 85.52631578947368

 Precision of event Happening: 
 87.70491803278688

 Recall of event Happening: 
 93.85964912280701

 AUC: 
 0.7719298245614036

 Confusion Matrix: 
 [[ 23  15]
 [  7 107]]

 F-Score:
 0.9067796610169491


Running LR
Prediction Vector: 
 [1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1
 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1
 1 1 1 1]

 Accuracy: 
 75.6578947368421

 Preci

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1]

 Accuracy: 
 76.31578947368422

 Precision of event Happening: 
 76.35135135135135

 Recall of event Happening: 
 99.12280701754386

 AUC: 
 0.5350877192982456

 Confusion Matrix: 
 [[  3  35]
 [  1 113]]

 F-Score:
 0.8625954198473282


Running Random Forest
Prediction Vector: 
 [0 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 0 1 1 0 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1
 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 0 1 1]

 Accuracy: 
 85.52631578947368

 Precision of event Happening: 
 87.09677419

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 0 ... 0 1 0]

 Accuracy: 
 82.89999999999999

 Precision of event Happening: 
 73.32185886402753

 Recall of event Happening: 
 32.84502698535081

 AUC: 
 0.647746291635238

 Confusion Matrix: 
 [[4548  155]
 [ 871  426]]

 F-Score:
 0.4536741214057508


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.36666666666666

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 AUC: 
 0.4998936848819902

 Confusion Matrix: 
 [[4702    1]
 [1297    0]]

 F-Score:
 0.0


Running SVM
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.38333333333334

 Precision of event Happening: 
 50.0

 Recall of event Happening: 
 0.07710100231303006

 AUC: 
 0.5002791898935554

 Confusion Matrix: 
 [[4702    1]
 [1296    1]]

 F-Score:
 0.001539645881447267


Running Random Forest
Prediction Vector: 
 [1 0 0 ... 0 1 0]

 Accuracy: 
 82.35

 Precision of event Happening: 
 66.95156695156696

 Recall of event Happening: 
 36.237471087124135

 AUC: 
 0.6565222480573515

 Confusion Matrix: 
 [[4471  232]
 [ 827  470]]

 F-Score:
 0.4702351175587794





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.12464421344946319
Adjusted R-Squared:  0.12259659757449115
RMSE:  0.385122179336163


Running Voting Reg

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 590.  590.  410.  467.  487.  462.  405.  353.  366.  344.  504.  462.
  478.  482.  384.  388.  463. 1308.  428.  404. 1308.  369.  566.  664.
  600.  300.  354. 1308.  410.  478.  664.  581.  467.  637.  377.  371.
  467.  532.  855.  462.  377.  855.  399. 1308.  463.  342.  597.  345.
  354.  535.  679.  495. 1308.  664.  581.  718.  664.  399.  482.  429.
  462.  320.  542. 1308.  335.  366.  566.  456.  855.  449.  410.  464.
  335.  637.  388.  366.  512.  369.  562.  366.  855.  410.  664.  306.
  350.  608.  496. 1308.  354.  430.  497.  366.  369.  797.  712.  566.
  512.  449.  512.  512.  326.  300.  417.  410. 1308.  496.  562.  608.
  410.  608.  467.  371.  377. 1308.  733.  467.  590.  494.  467.  322.
  426.  679.  576.  590.  463.  664.  505.  326.  503.  417.  442.  462.
  369.  440.  599.  369.  467.  390.  590.  512.  523.  369.  554.  449.
  369.  322.  733.  410.  345.  478.  294.  384.  410.  797.  494.  335.
 1308.  371.  410.  664.  371.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  571.   517.   349.   388.   680.   466.   410.   349.   497.   389.
   428.   404.   549.   368.   310.   368.   224. 11589.   550.  1376.
  5392.   349.   636.  1059.   476.   395.   347.   687.   445.   516.
   464.   674.   455.   320.   251.   421.   767.   608.   622.   702.
   251.   923.   600.  5041.   450.   315.   597.   360.   349.   425.
   906.   462.   965.   647.   684.   566.   805.   439.   552.   463.
   380.   368.   603.  1991.   407.   449.   500.   530.   612.   467.
   334.   488.   299.   437.   354.   495.   387.   589.   581.   383.
   932.   349.   576.   305.   445.   459.   619.  1201.   429.   389.
   696.   334.   353.   534.   569.   588.   705.   679.   639.   405.
   382.   395.   440.   427.  1215.  1215.   347.  1185.   593.   587.
   593.   563.   450.  6033.   612.   504.  1064.   481.   437.   345.
  1308.   666.   487.   459.   305.   416.   505.   359.   724.   679.
   418.   599.   876.   600.   664.   462.   389.   408.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410. 11504.   410.   410.
  5392.   410.   410.   410.   410.   410.   410.   462.   410.   410.
   410.   410.   410.   410.   410.   410.   463.   410.   410.   410.
   410.   410.   410.  5122.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   456.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   456.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  562.   565.   406.   426.  2520.   463.   418.   354.   371.   350.
   524.   456.   552.   504.   316.   366.   471. 11504.   453.   443.
  5122.   354.   864.   677.  1206.   293.   380.  1949.   418.   429.
   733.   553.   472.   637.   305.   355.   515.   875.   847.   501.
   407.  1009.   379.   790.   420.   335.   590.   346.   329.   538.
   906.   495.  1049.   625.   615.  1234.   712.   345.   494.   366.
   456.   301.  5631.  1084.   305.   366.   567.   451.  1068.   462.
   395.   464.   293.   637.   335.   372.   541.   368.   562.   380.
   734.   404.   696.   258.   369.  2468.  1290.  1206.   350.   446.
  1218.   345.   371.   887.   963.   557.   518.   449.   523.   425.
   279.   309.   377.   427.  1216.   984.   558.  2147.   395.   824.
   501.   359.   379.   866.   731.   481.   562.   498.   467.   294.
   428.   729.   771.   613.   433.   641.   553.   326.   523.   397.
   450.   462.   375.   467.   566.   359.   388.   399.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 1 0 0
 1 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 1
 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 0 0
 0 0 1 1 0 1 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 0
 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 1]

 Accuracy: 
 98.14814814814815

 Precision of event Happening: 
 99.03846153846155

 Recall of event Happening: 
 97.16981132075472

 AUC: 
 0.9813036020583191

 Confusion Matrix: 
 [[109   1]
 [  3 103]]

 F-Score:
 0.9809523809523809


Running LR
Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 1 1 1 0 1
 0 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 1
 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 0 1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [3 5 3 ... 3 5 5]

 Accuracy: 
 37.85563131906889

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 F-Score:
 0.2449103598521767


Running SVM


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [7 5 3 ... 2 4 6]

 Accuracy: 
 65.69480366799905

 Precision of event Happening: 
 92.42424242424242

 Recall of event Happening: 
 100.0

 F-Score:
 0.646658309155232


Running Random Forest


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9540914083390023
Adjusted R-Squared:  0.9539288808726547
RMSE:  0.3077195833273474


Running Voting Regressor
R-Squared Value:  0.9926778997642013
Adjusted R-Squared:  0.9926519777666707
RMSE:  0.12289267159805733


-----
-----
-----
-----


###Running L1 with 50 estimators

In [24]:
print('Running on dataset1')
L1(dataset1, label_dataset1, 50, multi=False)
print('-----')
print('Running on dataset2')
L1(dataset2, label_dataset2, 50, multi=False)
print('-----')
print('-----')
print('Running on dataset3')
L1(dataset3, label_dataset3, 50, multi=True)
print('-----')
print('-----')
print('-----')
print('Running on dataset4')
L1(dataset4, label_dataset4, 50, multi=False)
print('-----')
print('-----')
print('-----')
print('Running on dataset5')
L1(dataset5, label_dataset5, 50, multi=True)
print('-----')
print('-----')
print('-----')
print('-----')

Running on dataset1


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [1 0 1 0 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1
 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 1 0 1]

 Accuracy: 
 82.23684210526315

 Precision of event Happening: 
 85.9504132231405

 Recall of event Happening: 
 91.22807017543859

 AUC: 
 0.7324561403508771

 Confusion Matrix: 
 [[ 21  17]
 [ 10 104]]

 F-Score:
 0.8851063829787233


Running LR
Prediction Vector: 
 [1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1
 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 0 0 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1
 0 1 1 1 0 1 1 1 0 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1
 1 1 1 1]

 Accuracy: 
 75.6578947368421

 Precis

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
 1 1 1 1]

 Accuracy: 
 77.63157894736842

 Precision of event Happening: 
 77.3972602739726

 Recall of event Happening: 
 99.12280701754386

 AUC: 
 0.5614035087719298

 Confusion Matrix: 
 [[  5  33]
 [  1 113]]

 F-Score:
 0.8692307692307693


Running Random Forest
Prediction Vector: 
 [0 0 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1
 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 0 1 1]

 Accuracy: 
 84.86842105263158

 Precision of event Happening: 
 86.4

 Recal

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 0 ... 0 1 0]

 Accuracy: 
 82.89999999999999

 Precision of event Happening: 
 73.32185886402753

 Recall of event Happening: 
 32.84502698535081

 AUC: 
 0.647746291635238

 Confusion Matrix: 
 [[4548  155]
 [ 871  426]]

 F-Score:
 0.4536741214057508


Running LR
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.36666666666666

 Precision of event Happening: 
 33.33333333333333

 Recall of event Happening: 
 0.07710100231303006

 AUC: 
 0.5001728747755456

 Confusion Matrix: 
 [[4701    2]
 [1296    1]]

 F-Score:
 0.0015384615384615382


Running SVM
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.38333333333334

 Precision of event Happening: 
 50.0

 Recall of event Happening: 
 0.07710100231303006

 AUC: 
 0.5002791898935554

 Confusion Matrix: 
 [[4702    1]
 [1296    1]]

 F-Score:
 0.001539645881447267


Running Random Forest
Prediction Vector: 
 [0 0 0 ... 0 1 0]

 Accuracy: 
 82.6

 Prec

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 590.  590.  410.  445.  554.  449.  405.  354.  371.  344.  467.  463.
  478.  482.  366.  388.  463. 1308.  467.  404. 1308.  369.  566.  679.
  600.  300.  354. 1308.  410.  464.  664.  581.  467.  664.  377.  354.
  467.  487.  855.  462.  410.  855.  366. 1308.  463.  342.  590.  342.
  354.  535.  566.  418. 1308.  664.  593.  718.  664.  374.  482.  418.
  462.  306.  549. 1308.  279.  394.  566.  462.  855.  451.  410.  464.
  335.  382.  388.  379.  512.  369.  590.  388.  855.  404.  664.  306.
  380.  664.  496. 1308.  354.  434.  497.  366.  369.  797.  531.  566.
  495.  449.  512.  464.  326.  300.  417.  410. 1308.  496.  562.  797.
  410.  696.  467.  371.  377. 1308.  733.  467.  590.  494.  462.  344.
  426.  679.  576.  590.  463.  664.  505.  329.  442.  405.  442.  462.
  369.  462.  599.  369.  467.  410.  590.  512.  562.  369.  554.  449.
  369.  322.  733.  410.  359.  478.  279.  384.  410.  797.  494.  335.
 1308.  371.  410.  664.  371.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  571.   593.   513.   542.   680.   466.   481.   455.   497.   389.
   551.   404.   549.   368.   310.   368.   391. 11589.   658.  1376.
  5392.   349.   636.  1059.   412.   395.   347.   687.   445.   446.
   464.   674.   703.   320.   251.   421.   767.   608.  1542.   522.
   251.   700.   541.  3135.   450.   411.   664.   360.   349.   425.
   906.   462.   965.   564.   684.   566.   951.   439.   552.   377.
   380.   368.   603.  1991.   407.   509.   500.   530.   612.   467.
   334.   488.   299.   437.   410.   478.   387.   384.   515.   383.
   932.   504.   576.   336.   369.   525.   619.  1201.   431.   389.
   546.   445.   379.   534.  1244.   588.   771.   679.   419.   405.
   382.   300.   341.   427.  1201.   532.   571.   527.   557.   798.
   593.   294.   418.  6033.   612.   504.   668.   481.   437.   345.
   752.   666.   487.   740.   336.   416.   745.   359.   608.   679.
   488.   599.   513.   579.   664.   462.   389.   408.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410. 11504.   410.   410.
  5392.   410.   410.   410.   410.   410.   410.   462.   410.   410.
   410.   410.   410.   410.   410.   410.   463.   410.   410.   410.
   410.   410.   410.  5122.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   467.   410.   410.   410.
   410.   410.   410.   462.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   463.   410.   410.
   410.   456.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   456.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  589.   565.   395.   449.  2281.   462.   394.   384.   410.   350.
   491.   450.   521.   443.   341.   376.   428. 11504.   467.   421.
  5122.   371.   864.   678.  1513.   288.   347.  2254.   392.   430.
   733.   553.   467.   660.   195.   374.   589.  1134.   754.   462.
   360.  1053.   379.  1692.   456.   335.   590.   335.   334.   549.
   906.   464.  1049.   643.   608.  1234.   712.   371.   494.   366.
   342.   301.   852.   759.   305.   378.   558.   424.   960.   481.
   395.   501.   332.   637.   236.   377.   512.   398.   590.   380.
   734.   442.   696.   258.   375.   855.   799.  1323.   354.   434.
   914.   334.   354.  1275.  1059.   556.   771.   462.   505.   425.
   276.   336.   422.   418.  1147.  1174.   558.  1602.   404.   797.
   501.   359.   405.  1366.   783.   470.   624.   501.   467.   322.
   448.   729.   792.   613.   433.   602.   553.   322.   531.   394.
   450.   478.   384.   467.   589.   359.   454.   395.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 1 0 0
 1 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 1
 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 0 0
 0 0 1 1 0 1 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 0
 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 1]

 Accuracy: 
 98.14814814814815

 Precision of event Happening: 
 99.03846153846155

 Recall of event Happening: 
 97.16981132075472

 AUC: 
 0.9813036020583191

 Confusion Matrix: 
 [[109   1]
 [  3 103]]

 F-Score:
 0.9809523809523809


Running LR
Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 1 1 1 0 0
 0 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 1
 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 0 1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [6 5 3 ... 2 7 5]

 Accuracy: 
 55.960498471667066

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 F-Score:
 0.5015333633857311


Running SVM


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [7 6 3 ... 2 4 6]

 Accuracy: 
 66.09452151422526

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 F-Score:
 0.6646187267663749


Running Random Forest


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9525484020669557
Adjusted R-Squared:  0.9523916483220141
RMSE:  0.31284813590268984


Running Voting Regressor
R-Squared Value:  0.9960977963734611
Adjusted R-Squared:  0.9960849056583191
RMSE:  0.08971459785456619


-----
-----
-----
-----


###Running L1 with 100 estimators

In [25]:
print('Running on dataset1')
L1(dataset1, label_dataset1, 100, multi=False)
print('-----')
print('Running on dataset2')
L1(dataset2, label_dataset2, 100, multi=False)
print('-----')
print('-----')
print('Running on dataset3')
L1(dataset3, label_dataset3, 100, multi=True)
print('-----')
print('-----')
print('-----')
print('Running on dataset4')
L1(dataset4, label_dataset4, 100, multi=False)
print('-----')
print('-----')
print('-----')
print('Running on dataset5')
L1(dataset5, label_dataset5, 100, multi=True)
print('-----')
print('-----')
print('-----')
print('-----')

Running on dataset1


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [1 0 1 0 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1
 1 1 1 1 0 1 1 0 0 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 1 0 1]

 Accuracy: 
 82.23684210526315

 Precision of event Happening: 
 85.36585365853658

 Recall of event Happening: 
 92.10526315789474

 AUC: 
 0.7236842105263158

 Confusion Matrix: 
 [[ 20  18]
 [  9 105]]

 F-Score:
 0.8860759493670887


Running LR
Prediction Vector: 
 [1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1
 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 1 1 0 1 1 1 1 1
 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1
 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 0 1
 1 1 1 1]

 Accuracy: 
 80.26315789473685

 Prec

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
 1 1 1 1]

 Accuracy: 
 77.63157894736842

 Precision of event Happening: 
 77.3972602739726

 Recall of event Happening: 
 99.12280701754386

 AUC: 
 0.5614035087719298

 Confusion Matrix: 
 [[  5  33]
 [  1 113]]

 F-Score:
 0.8692307692307693


Running Random Forest
Prediction Vector: 
 [0 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1
 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 0 1 1]

 Accuracy: 
 84.21052631578947

 Precision of event Happening: 
 86.290322580

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 0 ... 0 1 0]

 Accuracy: 
 82.89999999999999

 Precision of event Happening: 
 73.32185886402753

 Recall of event Happening: 
 32.84502698535081

 AUC: 
 0.647746291635238

 Confusion Matrix: 
 [[4548  155]
 [ 871  426]]

 F-Score:
 0.4536741214057508


Running LR
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.36666666666666

 Precision of event Happening: 
 33.33333333333333

 Recall of event Happening: 
 0.07710100231303006

 AUC: 
 0.5001728747755456

 Confusion Matrix: 
 [[4701    2]
 [1296    1]]

 F-Score:
 0.0015384615384615382


Running SVM
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.38333333333334

 Precision of event Happening: 
 50.0

 Recall of event Happening: 
 0.07710100231303006

 AUC: 
 0.5002791898935554

 Confusion Matrix: 
 [[4702    1]
 [1296    1]]

 F-Score:
 0.001539645881447267


Running Random Forest
Prediction Vector: 
 [0 0 0 ... 0 1 0]

 Accuracy: 
 82.75

 Pre

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 590.  590.  410.  456.  554.  449.  405.  367.  366.  344.  504.  463.
  478.  482.  384.  388.  463. 1308.  467.  404. 1308.  354.  552.  664.
  600.  300.  371. 1308.  410.  464.  664.  535.  467.  664.  377.  371.
  467.  532.  855.  462.  377.  855.  399. 1308.  463.  342.  597.  335.
  354.  520.  679.  495. 1308.  664.  593.  497.  664.  374.  482.  429.
  462.  320.  542. 1308.  335.  394.  554.  462.  855.  449.  410.  464.
  335.  382.  388.  366.  512.  369.  530.  379.  600.  410.  664.  306.
  380.  692.  496. 1308.  354.  430.  497.  366.  369.  797.  712.  520.
  522.  449.  512.  464.  326.  300.  417.  410. 1308.  496.  562.  797.
  410.  797.  467.  371.  377. 1308.  733.  467.  590.  494.  462.  344.
  426.  679.  576.  590.  463.  664.  505.  326.  442.  410.  442.  462.
  369.  462.  599.  369.  467.  410.  553.  512.  562.  359.  554.  449.
  369.  322.  733.  410.  345.  478.  294.  388.  410.  797.  494.  335.
 1308.  366.  410.  664.  371.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  300.   513.   349.   542.   680.   664.   458.   349.   497.   389.
   476.   429.   419.   368.   310.   395.   506.  4309.  1308.  1376.
  5392.   349.   752.  1059.   708.   395.   347.   923.   522.   346.
   395.   655.   517.   320.   251.   421.   689.   608.   780.   702.
   251.   923.   600.  5041.   450.   363.   326.   360.   368.   425.
   906.   462.   965.   438.   565.   566.   951.   439.   552.   395.
   385.   368.   603.  2811.   407.   449.   501.   530.   612.   467.
   334.   450.   399.   437.   354.   294.   259.   393.   753.   470.
   634.   504.   576.   336.   424.   715.   619.   742.   578.   389.
   438.   445.   353.   453.   887.   588.   771.   504.   661.   405.
   382.   300.   655.   535.  1542.   963.   306.   253.   593.   587.
   593.   433.   450.  6182.   612.   504.   604.   481.   437.   403.
  1308.   887.   476.   876.   316.   467.   505.   354.   608.   391.
   418.   524.   777.   600.   664.   462.   389.   408.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410. 11504.   410.   410.
  6182.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   463.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  554.   590.   410.   440.  2281.   462.   394.   366.   376.   357.
   524.   463.   552.   504.   392.   376.   456. 11504.   453.   421.
  5122.   391.   864.   678.   965.   217.   380.  1108.   392.   445.
   733.   553.   466.   665.   293.   399.   491.   875.   641.   462.
   387.  1175.   366.   983.   420.   335.   590.   320.   335.   524.
   906.   488.  1049.   664.   608.  1244.   679.   377.   500.   369.
   456.   301.   888.   759.   257.   336.   599.   474.   742.   462.
   401.   496.   255.   712.   366.   373.   469.   356.   520.   380.
   797.   404.   696.   258.   369.  2468.  1290.  1078.   350.   445.
   811.   334.   366.  1563.   813.   553.   771.   449.   516.   425.
   279.   309.   377.   424.  1147.   959.   562.  2147.   395.   696.
   501.   361.   379.   866.   783.   453.   564.   498.   467.   294.
   451.   729.   593.   589.   440.   641.   505.   329.   530.   397.
   449.   478.   390.   467.   562.   360.   397.   399.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 1 0 0
 1 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 1
 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 0 0
 0 0 1 1 0 1 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 0
 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 1]

 Accuracy: 
 98.14814814814815

 Precision of event Happening: 
 99.03846153846155

 Recall of event Happening: 
 97.16981132075472

 AUC: 
 0.9813036020583191

 Confusion Matrix: 
 [[109   1]
 [  3 103]]

 F-Score:
 0.9809523809523809


Running LR
Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 1 1 1 0 1
 0 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [6 5 3 ... 2 7 5]

 Accuracy: 
 55.960498471667066

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 F-Score:
 0.5015333633857311


Running SVM


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [7 6 3 ... 2 4 6]

 Accuracy: 
 66.09452151422526

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 F-Score:
 0.6646187267663749


Running Random Forest


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9525484020669557
Adjusted R-Squared:  0.9523916483220141
RMSE:  0.31284813590268984


Running Voting Regressor
R-Squared Value:  0.9960974673291897
Adjusted R-Squared:  0.9960845755270681
RMSE:  0.0897183802625382


-----
-----
-----
-----


##RFE

In [26]:
# RFE (Recursive Feature Estimator)

def RFEImpl(df, label_col, multi=False):
  df_cpy = df.copy()

  X, y = traintestsplit(df_cpy, label_col)
  clf = RandomForestClassifier()
  selector = RFE(estimator=clf, step=1)
  # selector = RFECV(estimator=rfc, step=1, cv=StratifiedKFold(10), scoring='accuracy')
  selector = selector.fit(X,y)
  feature_list = X.columns[selector.support_].tolist()

  ndf = df_cpy[df_cpy.columns.intersection(feature_list)]
  ndf[label_col] = df_cpy[label_col]

  X,y = XYsplit(ndf.copy(), label_col)
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

  RunClassificationAlgorithms(X_train, X_test, y_train, y_test, verbose=True, multi=multi)
  print('\n\n')
  RunRegressionAlgorithms(X_train, X_test, y_train, y_test)

In [27]:
print('Running on dataset1')
RFEImpl(dataset1, label_dataset1, multi=False)
print('-----')
print('Running on dataset2')
RFEImpl(dataset2, label_dataset2, multi=False)
print('-----')
print('-----')
print('Running on dataset3')
RFEImpl(dataset3, label_dataset3, multi=True)
print('-----')
print('-----')
print('-----')
print('Running on dataset4')
RFEImpl(dataset4, label_dataset4, multi=False)
print('-----')
print('-----')
print('-----')
print('Running on dataset5')
RFEImpl(dataset5, label_dataset5, multi=True)
print('-----')
print('-----')
print('-----')
print('-----')

Running on dataset1


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [1 0 1 0 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1
 1 1 1 1 0 1 1 0 0 1 1 0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 1 0 1]

 Accuracy: 
 83.55263157894737

 Precision of event Happening: 
 86.1788617886179

 Recall of event Happening: 
 92.98245614035088

 AUC: 
 0.7412280701754386

 Confusion Matrix: 
 [[ 21  17]
 [  8 106]]

 F-Score:
 0.8945147679324894


Running LR
Prediction Vector: 
 [0 0 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1
 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 0 1 1 1 1 1
 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1
 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 0 1
 1 0 1 1]

 Accuracy: 
 80.92105263157895

 Preci

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
 1 1 1 1]

 Accuracy: 
 76.97368421052632

 Precision of event Happening: 
 76.87074829931973

 Recall of event Happening: 
 99.12280701754386

 AUC: 
 0.5482456140350878

 Confusion Matrix: 
 [[  4  34]
 [  1 113]]

 F-Score:
 0.8659003831417624


Running Random Forest
Prediction Vector: 
 [0 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 0 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1
 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 0 1 1]

 Accuracy: 
 85.52631578947368

 Precision of event Happening: 
 86.50793650

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 0 ... 0 1 0]

 Accuracy: 
 82.91666666666667

 Precision of event Happening: 
 73.3676975945017

 Recall of event Happening: 
 32.92212798766384

 AUC: 
 0.6481317966468031

 Confusion Matrix: 
 [[4548  155]
 [ 870  427]]

 F-Score:
 0.45449707291112296


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.38333333333334

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 AUC: 
 0.5

 Confusion Matrix: 
 [[4703    0]
 [1297    0]]

 F-Score:
 0.0


Running SVM
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.35

 Precision of event Happening: 
 25.0

 Recall of event Happening: 
 0.07710100231303006

 AUC: 
 0.5000665596575359

 Confusion Matrix: 
 [[4700    3]
 [1296    1]]

 F-Score:
 0.0015372790161414297


Running Random Forest
Prediction Vector: 
 [0 0 0 ... 0 1 0]

 Accuracy: 
 82.31666666666668

 Precision of event Happening: 
 66.95402298850574

 Recall of event Happening: 
 35.92906707787201

 AUC: 
 0.6551928582471105

 Confusion Matrix: 
 [[4473  230]
 [ 831  466]]

 F-Score:
 0.4676367285499248





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.12496539131304862
Adjusted R-Squared:  0.12321152204559527
RMSE:  0.3850515200289288


Running Voting Regressor
R-Squar

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 581.  590.  410.  467.  554.  449.  405.  367.  371.  344.  504.  463.
  478.  482.  366.  388.  463. 1308.  467.  404. 1308.  369.  566.  664.
  600.  306.  354. 1308.  410.  464.  664.  581.  456.  664.  377.  371.
  467.  654.  530.  462.  410.  855.  399. 1308.  450.  342.  593.  335.
  306.  535.  389.  495. 1308.  664.  590.  870.  664.  374.  482.  429.
  462.  306.  855. 1308.  306.  394.  566.  462.  855.  467.  395.  464.
  335.  562.  388.  366.  512.  369.  530.  366.  855.  410.  664.  306.
  399.  608.  496. 1308.  354.  434.  725.  366.  369.  797.  442.  566.
  535.  449.  512.  470.  326.  300.  417.  410. 1308.  496.  562.  608.
  410.  608.  467.  371.  377. 1308.  733.  467.  590.  494.  467.  344.
  426.  679.  576.  590.  463.  664.  505.  326.  503.  417.  442.  462.
  369.  467.  599.  369.  462.  399.  590.  512.  523.  369.  554.  449.
  369.  322.  733.  410.  345.  478.  294.  384.  410.  797.  495.  335.
 1308.  371.  410.  624.  371.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  653.   593.   349.   388.   678.   466.   410.   406.   497.   389.
   551.   404.   733.   368.   387.   354.   224. 11589.   557.  1376.
  5392.   349.   636.  1059.   476.   395.   347.   923.   445.   445.
   400.   587.   455.   320.   259.   307.   689.   608.   674.   702.
   251.   923.   600.  5041.   450.   338.   478.   360.   375.   425.
   667.   468.   739.   647.   684.   566.   805.   439.   552.   307.
   385.   368.   603.  2238.   407.   449.   501.   494.   612.   467.
   300.   450.   391.   437.   354.   495.   259.   589.   498.   494.
   576.   349.   576.   294.   445.   459.   619.  1201.   578.   389.
   696.   380.   399.   534.  1244.   588.   705.   679.   332.   405.
   382.   395.   341.   427.  1201.  1215.   401.  1308.   593.   587.
   593.   306.   450.  6033.   612.   635.   604.   481.   437.   345.
  1308.   514.   407.   876.   305.   416.   707.   375.   501.   679.
   418.   599.  1068.   600.   329.   462.   389.   408.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410. 11504.   410.   410.
  6182.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   467.   410.   410.   410.
   410.   410.   410.  5122.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   771.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   812.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  541.   618.   406.   420.   773.   462.   394.   335.   371.   367.
   504.   467.   549.   482.   341.   366.   449. 11504.   467.   421.
   796.   372.   864.   678.   963.   288.   380.  1108.   405.   416.
   957.   553.   497.   666.   320.   374.   491.   875.   677.   462.
   410.  1053.   366.   983.   420.   332.   590.   332.   319.   588.
   839.   495.  1049.   664.   578.   873.   712.   377.   494.   366.
   456.   301.   888.   759.   279.   380.   584.   474.  1621.   467.
   401.   521.   335.   712.   366.   373.   469.   369.   566.   334.
   840.   404.   643.   258.   384.   715.   799.  1206.   354.   421.
   702.   334.   354.  1275.   963.   578.   518.   449.   509.   474.
   279.   306.   377.   410.  1022.  1174.   542.   911.   406.   797.
   501.   335.   388.  6072.   742.   467.   562.   498.   437.   294.
   451.  1059.   799.   613.   433.   602.   553.   329.   531.   397.
   444.   481.   369.   467.   562.   374.   397.   388.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 1 0 0
 0 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 1
 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 0 0
 0 0 1 1 0 1 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 0
 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 1]

 Accuracy: 
 98.61111111111111

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 97.16981132075472

 AUC: 
 0.9858490566037736

 Confusion Matrix: 
 [[110   0]
 [  3 103]]

 F-Score:
 0.985645933014354


Running LR
Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 1 1 1 0 0
 0 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 1
 0 0 1 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [6 5 4 ... 2 4 5]

 Accuracy: 
 54.33811427227839

 Precision of event Happening: 
 50.0

 Recall of event Happening: 
 11.475409836065573

 F-Score:
 0.5058070318262838


Running SVM


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [7 5 3 ... 2 4 5]

 Accuracy: 
 58.00611333176581

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 F-Score:
 0.5519525207872491


Running Random Forest


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9840536887835358
Adjusted R-Squared:  0.9839022518299132
RMSE:  0.1813586475794066


Running Voting Regressor
R-Squared Value:  0.9938572796434351
Adjusted R-Squared:  0.9937989442174467
RMSE:  0.1125611504020774


-----
-----
-----
-----


##Random Forest

In [28]:
#Random Forest

def RFImpl(df, label_col, nestimators, multi=False):
  df_cpy = df.copy()

  X, y = XYsplit(df_cpy, label_col)
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

  scaler = StandardScaler()
  scaler.fit(X_train.fillna(0))

  clf = RandomForestClassifier(n_estimators = nestimators)
  sel_ = SelectFromModel(clf)
  sel_.fit(scaler.transform(X_train.fillna(0)), y_train)

  selected_feat = X_train.columns[(sel_.get_support())]

  ndf = df_cpy[df_cpy.columns.intersection(selected_feat.values)]
  ndf[label_col] = df_cpy[label_col]

  X,y = XYsplit(ndf.copy(), label_col)
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

  RunClassificationAlgorithms(X_train, X_test, y_train, y_test, verbose=True, multi=multi)
  print('\n\n')
  RunRegressionAlgorithms(X_train, X_test, y_train, y_test)

###Running RF with 25 estimators

In [29]:
print('Running on dataset1')
RFImpl(dataset1, label_dataset1, 25, multi=False)
print('-----')
print('Running on dataset2')
RFImpl(dataset2, label_dataset2, 25, multi=False)
print('-----')
print('-----')
print('Running on dataset3')
RFImpl(dataset3, label_dataset3, 25, multi=True)
print('-----')
print('-----')
print('-----')
print('Running on dataset4')
RFImpl(dataset4, label_dataset4, 25, multi=False)
print('-----')
print('-----')
print('-----')
print('Running on dataset5')
RFImpl(dataset5, label_dataset5, 25, multi=True)
print('-----')
print('-----')
print('-----')
print('-----')

Running on dataset1


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [1 0 1 0 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1
 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 1 0 1]

 Accuracy: 
 82.23684210526315

 Precision of event Happening: 
 84.8

 Recall of event Happening: 
 92.98245614035088

 AUC: 
 0.7149122807017544

 Confusion Matrix: 
 [[ 19  19]
 [  8 106]]

 F-Score:
 0.8870292887029289


Running LR
Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1
 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 1 1 1 1 1 0 1 1 1 1 1
 1 1 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 0 1
 1 1 1 1]

 Accuracy: 
 78.94736842105263

 Precision of even

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
 1 1 1 1]

 Accuracy: 
 77.63157894736842

 Precision of event Happening: 
 77.3972602739726

 Recall of event Happening: 
 99.12280701754386

 AUC: 
 0.5614035087719298

 Confusion Matrix: 
 [[  5  33]
 [  1 113]]

 F-Score:
 0.8692307692307693


Running Random Forest
Prediction Vector: 
 [0 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 0 1 1 0 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1
 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 0 1 1]

 Accuracy: 
 86.8421052631579

 Precision of event Happening: 
 87.9032258064

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 0 ... 0 1 0]

 Accuracy: 
 82.89999999999999

 Precision of event Happening: 
 73.32185886402753

 Recall of event Happening: 
 32.84502698535081

 AUC: 
 0.647746291635238

 Confusion Matrix: 
 [[4548  155]
 [ 871  426]]

 F-Score:
 0.4536741214057508


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.36666666666666

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 AUC: 
 0.4998936848819902

 Confusion Matrix: 
 [[4702    1]
 [1297    0]]

 F-Score:
 0.0


Running SVM
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.38333333333334

 Precision of event Happening: 
 50.0

 Recall of event Happening: 
 0.07710100231303006

 AUC: 
 0.5002791898935554

 Confusion Matrix: 
 [[4702    1]
 [1296    1]]

 F-Score:
 0.001539645881447267


Running Random Forest
Prediction Vector: 
 [1 0 0 ... 0 1 0]

 Accuracy: 
 82.51666666666667

 Precision of event Happening: 
 67.41573033707866

 Recall of event Happening: 
 37.00848111025443

 AUC: 
 0.660377298173003

 Confusion Matrix: 
 [[4471  232]
 [ 817  480]]

 F-Score:
 0.47784967645594817





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.12464421344946319
Adjusted R-Squared:  0.12259659757449115
RMSE:  0.385122179336163


Running

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 505.  590.  410.  429.  487.  462.  405.  367.  371.  344.  504.  462.
  478.  482.  384.  377.  442. 1308.  467.  404. 1308.  354.  566.  664.
  600.  300.  354. 1308.  410.  464.  664.  581.  467.  637.  306.  354.
  467.  923.  608.  462.  410.  855.  399. 1308.  450.  342.  590.  335.
  354.  535.  581.  495. 1308.  664.  590.  543.  664.  374.  482.  429.
  462.  320.  855. 1308.  335.  394.  566.  462.  855.  442.  410.  464.
  335.  637.  388.  366.  512.  369.  599.  366.  855.  410.  664.  300.
  354.  692.  496. 1308.  344.  434.  497.  366.  371.  797.  553.  566.
  512.  449.  512.  512.  326.  300.  417.  410. 1308.  496.  562.  797.
  410.  696.  467.  371.  366. 1308.  733.  467.  590.  494.  467.  344.
  426.  679.  576.  590.  463.  664.  505.  326.  522.  417.  463.  462.
  369.  464.  599.  369.  462.  410.  535.  512.  562.  369.  554.  449.
  369.  354.  733.  410.  345.  478.  279.  384.  410.  797.  494.  335.
 1308.  366.  410.  664.  371.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  571.   593.   349.   388.   739.   442.   446.   513.   497.   389.
   428.   459.   549.   368.   505.   354.   391.  5631.   349.  1376.
  5392.   349.   605.  1059.   708.   270.   347.   923.   445.   320.
   464.   674.   517.   718.   259.   888.   447.  1621.   693.   702.
   251.   923.   541.  3135.   450.   315.   597.   360.   418.   425.
   552.   462.   965.   647.   684.   566.   549.   424.   552.   389.
   522.   368.   603.  2811.   407.   449.   500.   516.   612.   467.
   334.   501.   391.   437.   259.   495.   259.   384.   581.   380.
   932.   504.   576.   305.   501.   459.   715.  1201.   578.   389.
   696.   380.   353.   534.   569.   588.   771.   679.   661.   405.
   375.   437.   440.   427.  1201.  1215.   467.  1185.   593.   587.
   593.   563.   450.  6182.   612.   504.   604.   450.   437.   468.
  1308.   666.   792.   459.   305.   416.   505.   359.   608.   679.
   449.   599.   876.   600.   406.   462.   395.   408.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410. 11504.   410.   410.
  5392.   410.   410.   410.   410.   410.   410.   462.   410.   410.
   410.   410.   410.   410.   410.   410.   463.   410.   410.   410.
   410.   410.   410.  5041.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   467.   410.   410.   410.
   410.   410.   410.   463.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   456.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   456.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  541.   581.   426.   402.   773.   462.   394.   354.   371.   371.
   524.   463.   558.   440.   316.   366.   439. 11504.   453.   443.
  5122.   354.  1234.   672.  1206.   288.   380.   923.   392.   462.
   733.   553.   472.   676.   251.   373.   491.   875.  2764.   462.
   395.  1053.   403.  1692.   420.   332.   590.   332.   354.   549.
  1949.   491.  1049.   663.   608.   855.   669.   384.   494.   366.
   462.   301.   888.  1127.   257.   378.   599.   474.   675.   462.
   422.   493.   335.   637.   388.   377.   509.   367.   590.   379.
   734.   443.   696.   260.   369.   638.   864.  1206.   350.   446.
   660.   334.   371.  1292.   963.   578.   771.   462.   505.   436.
   320.   309.   455.   418.   839.  2149.   558.   754.   410.   855.
   501.   361.   380.  7893.   731.   464.   564.   498.   437.   294.
   434.   729.  1445.   613.   442.   643.   553.   293.   530.   405.
   418.   462.   375.   467.   562.   359.   454.   378.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 1 0 0
 1 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 1
 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 0 0
 0 0 1 1 0 1 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 0
 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 1]

 Accuracy: 
 98.14814814814815

 Precision of event Happening: 
 99.03846153846155

 Recall of event Happening: 
 97.16981132075472

 AUC: 
 0.9813036020583191

 Confusion Matrix: 
 [[109   1]
 [  3 103]]

 F-Score:
 0.9809523809523809


Running LR
Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 1 1 1 0 0
 0 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 1
 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 1 0 0 1 0 1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [7 6 5 ... 2 3 6]

 Accuracy: 
 71.80813543381143

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 F-Score:
 0.7040379130022956


Running SVM


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [6 6 4 ... 2 4 6]

 Accuracy: 
 78.86197977897955

 Precision of event Happening: 
 68.53932584269663

 Recall of event Happening: 
 100.0

 F-Score:
 0.7703029090942005


Running Random Forest


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9659959203010423
Adjusted R-Squared:  0.9658755376728893
RMSE:  0.2648339532149425


Running Voting Regressor
R-Squared Value:  0.9967947042842414
Adjusted R-Squared:  0.9967833567657765
RMSE:  0.08130967069632011


-----
-----
-----
-----


###Running RF with 50 estimators

In [30]:
print('Running on dataset1')
RFImpl(dataset1, label_dataset1, 50, multi=False)
print('-----')
print('Running on dataset2')
RFImpl(dataset2, label_dataset2, 50, multi=False)
print('-----')
print('-----')
print('Running on dataset3')
RFImpl(dataset3, label_dataset3, 50, multi=True)
print('-----')
print('-----')
print('-----')
print('Running on dataset4')
RFImpl(dataset4, label_dataset4, 50, multi=False)
print('-----')
print('-----')
print('-----')
print('Running on dataset5')
RFImpl(dataset5, label_dataset5, 50, multi=True)
print('-----')
print('-----')
print('-----')
print('-----')

Running on dataset1


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 1 0 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 1 1 1 1 1 0 1 1 0 1 1
 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 1 1 1]

 Accuracy: 
 84.86842105263158

 Precision of event Happening: 
 87.60330578512396

 Recall of event Happening: 
 92.98245614035088

 AUC: 
 0.7675438596491228

 Confusion Matrix: 
 [[ 23  15]
 [  8 106]]

 F-Score:
 0.902127659574468


Running LR
Prediction Vector: 
 [0 0 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 1 1
 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 1 1 1 1 1 0 1 1 0 1 1
 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 0 1
 1 0 1 1]

 Accuracy: 
 81.57894736842105

 Preci

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1]

 Accuracy: 
 76.31578947368422

 Precision of event Happening: 
 76.35135135135135

 Recall of event Happening: 
 99.12280701754386

 AUC: 
 0.5350877192982456

 Confusion Matrix: 
 [[  3  35]
 [  1 113]]

 F-Score:
 0.8625954198473282


Running Random Forest
Prediction Vector: 
 [0 0 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 0 1 1 0 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1
 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 0 1 1]

 Accuracy: 
 86.8421052631579

 Precision of event Happening: 
 88.524590163

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 0 ... 0 1 0]

 Accuracy: 
 82.89999999999999

 Precision of event Happening: 
 73.32185886402753

 Recall of event Happening: 
 32.84502698535081

 AUC: 
 0.647746291635238

 Confusion Matrix: 
 [[4548  155]
 [ 871  426]]

 F-Score:
 0.4536741214057508


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.36666666666666

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 AUC: 
 0.4998936848819902

 Confusion Matrix: 
 [[4702    1]
 [1297    0]]

 F-Score:
 0.0


Running SVM
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.38333333333334

 Precision of event Happening: 
 50.0

 Recall of event Happening: 
 0.07710100231303006

 AUC: 
 0.5002791898935554

 Confusion Matrix: 
 [[4702    1]
 [1296    1]]

 F-Score:
 0.001539645881447267


Running Random Forest
Prediction Vector: 
 [0 0 0 ... 0 1 0]

 Accuracy: 
 82.73333333333333

 Precision of event Happening: 
 68.25174825174825

 Recall of event Happening: 
 37.62528912875867

 AUC: 
 0.6639929138555731

 Confusion Matrix: 
 [[4476  227]
 [ 809  488]]

 F-Score:
 0.485089463220676





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.12464421344946319
Adjusted R-Squared:  0.12259659757449115
RMSE:  0.385122179336163


Running 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 505.  590.  410.  429.  554.  462.  405.  354.  371.  344.  504.  463.
  478.  482.  366.  377.  442. 1308.  467.  404. 1308.  354.  566.  664.
  600.  300.  371. 1308.  410.  464.  664.  581.  467.  664.  347.  371.
  467.  532.  855.  462.  410.  855.  384. 1308.  463.  342.  593.  335.
  306.  520.  389.  495. 1308.  664.  599.  718.  664.  374.  482.  388.
  462.  306.  797. 1308.  335.  394.  566.  462.  855.  451.  410.  464.
  335.  382.  388.  379.  512.  369.  599.  388.  855.  410.  664.  306.
  352.  692.  496. 1308.  354.  434.  497.  366.  369.  797.  712.  566.
  495.  449.  512.  470.  326.  300.  417.  410. 1308.  496.  562.  797.
  410.  797.  467.  371.  377. 1308.  733.  467.  590.  494.  462.  322.
  426.  497.  576.  590.  429.  664.  505.  326.  503.  417.  449.  462.
  369.  462.  557.  369.  467.  410.  553.  512.  478.  369.  520.  449.
  369.  354.  733.  410.  377.  478.  279.  384.  410.  797.  495.  335.
 1308.  366.  410.  664.  371.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  571.   513.   349.   542.   509.   664.   458.   349.   497.   389.
   476.   429.   419.   368.   310.   368.   391. 11589.  1308.  1376.
  5392.   349.   752.  1059.   708.   395.   347.   687.   445.   346.
   395.   655.   694.   320.   251.   421.   534.   664.   780.   454.
   251.   923.   388.  5041.   450.   363.   326.   360.   368.   425.
   906.   462.   965.   438.   565.   566.   951.   439.   552.   395.
   599.   368.  1348.  1991.   407.   449.   686.   530.   612.   467.
   334.   489.   259.   437.   354.   294.   259.   393.   753.   445.
  1010.   504.   576.   294.   369.   459.   619.   742.   429.   389.
   438.   445.   353.   632.   887.   588.   705.   504.   554.   405.
   382.   300.   440.   326.  1542.   963.   306.   253.   557.   587.
   593.   433.   450.  2758.   612.   504.   604.   481.   437.   411.
  1308.   887.   476.   459.   316.   467.   505.   354.   501.   391.
   418.   524.   876.   600.   664.   462.   389.   408.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410. 11504.   410.   410.
  6182.   410.   410.   410.   410.   410.   410.   462.   410.   410.
   410.   410.   410.   410.   410.   410.   463.   410.   410.   410.
   410.   410.   410.  5041.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   456.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   487.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  541.   581.   406.   467.   773.   462.   418.   366.   371.   357.
   491.   450.   558.   488.   341.   377.   430. 11504.   453.   421.
  1308.   354.   864.   672.   965.   217.   380.  1108.   405.   429.
   733.   553.   473.   649.   293.   346.   491.   875.  2764.   462.
   388.   643.   371.   503.   441.   335.   590.   332.   319.   524.
   906.   495.  1049.   663.   618.   933.   692.   409.   494.   377.
   493.   301.   596.  2281.   257.   336.   558.   444.   783.   462.
   401.   531.   335.   712.   335.   394.   469.   371.   619.   334.
  1362.   442.   725.   258.   392.   715.   799.  1078.   354.   419.
  2520.   334.   335.   711.   870.   584.   771.   442.   520.   433.
   279.   300.   413.   445.   885.  1174.   562.  2147.   395.   797.
   496.   371.   379. 12903.   887.   467.   562.   528.   437.   294.
   410.   798.   612.   584.   426.   602.   553.   354.   531.   397.
   467.   486.   384.   467.   578.   344.   397.   424.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 1 0 0
 1 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 1
 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 0 0
 0 0 1 1 0 1 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 0
 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 1]

 Accuracy: 
 98.14814814814815

 Precision of event Happening: 
 99.03846153846155

 Recall of event Happening: 
 97.16981132075472

 AUC: 
 0.9813036020583191

 Confusion Matrix: 
 [[109   1]
 [  3 103]]

 F-Score:
 0.9809523809523809


Running LR
Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 1 1 1 0 0
 0 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 1
 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 0 1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [6 5 3 ... 2 7 5]

 Accuracy: 
 55.960498471667066

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 F-Score:
 0.5015333633857311


Running SVM


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [7 6 3 ... 2 4 6]

 Accuracy: 
 66.09452151422526

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 F-Score:
 0.6646187267663749


Running Random Forest


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9525484020669557
Adjusted R-Squared:  0.9523916483220141
RMSE:  0.31284813590268984


Running Voting Regressor
R-Squared Value:  0.9960973062492998
Adjusted R-Squared:  0.9960844139150596
RMSE:  0.08972023183930915


-----
-----
-----
-----


###Running RF with 100 estimators

In [31]:
print('Running on dataset1')
RFImpl(dataset1, label_dataset1, 100, multi=False)
print('-----')
print('Running on dataset2')
RFImpl(dataset2, label_dataset2, 100, multi=False)
print('-----')
print('-----')
print('Running on dataset3')
RFImpl(dataset3, label_dataset3, 100, multi=True)
print('-----')
print('-----')
print('-----')
print('Running on dataset4')
RFImpl(dataset4, label_dataset4, 100, multi=False)
print('-----')
print('-----')
print('-----')
print('Running on dataset5')
RFImpl(dataset5, label_dataset5, 100, multi=True)
print('-----')
print('-----')
print('-----')
print('-----')

Running on dataset1


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 1 0 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 1 1 1 1 1 0 1 1 0 1 1
 1 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 1 0 1]

 Accuracy: 
 84.21052631578947

 Precision of event Happening: 
 87.5

 Recall of event Happening: 
 92.10526315789474

 AUC: 
 0.7631578947368421

 Confusion Matrix: 
 [[ 23  15]
 [  9 105]]

 F-Score:
 0.8974358974358975


Running LR
Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
 1 1 1 1]

 Accuracy: 
 76.97368421052632

 Precision of even

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Prediction Vector: 
 [1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
 1 1 1 1]

 Accuracy: 
 77.63157894736842

 Precision of event Happening: 
 77.3972602739726

 Recall of event Happening: 
 99.12280701754386

 AUC: 
 0.5614035087719298

 Confusion Matrix: 
 [[  5  33]
 [  1 113]]

 F-Score:
 0.8692307692307693


Running Random Forest
Prediction Vector: 
 [0 0 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1
 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 0 1 1 0 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 1 0 1 0 1 1 0 1 1
 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0 1
 1 0 1 1]

 Accuracy: 
 87.5

 Precision of event Happening: 
 88.6178861788618

 Recall

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 0 ... 0 1 0]

 Accuracy: 
 82.89999999999999

 Precision of event Happening: 
 73.32185886402753

 Recall of event Happening: 
 32.84502698535081

 AUC: 
 0.647746291635238

 Confusion Matrix: 
 [[4548  155]
 [ 871  426]]

 F-Score:
 0.4536741214057508


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.36666666666666

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 AUC: 
 0.4998936848819902

 Confusion Matrix: 
 [[4702    1]
 [1297    0]]

 F-Score:
 0.0


Running SVM
Prediction Vector: 
 [0 0 0 ... 0 0 0]

 Accuracy: 
 78.38333333333334

 Precision of event Happening: 
 50.0

 Recall of event Happening: 
 0.07710100231303006

 AUC: 
 0.5002791898935554

 Confusion Matrix: 
 [[4702    1]
 [1296    1]]

 F-Score:
 0.001539645881447267


Running Random Forest
Prediction Vector: 
 [1 0 0 ... 0 1 0]

 Accuracy: 
 82.45

 Precision of event Happening: 
 67.32954545454545

 Recall of event Happening: 
 36.54587509637625

 AUC: 
 0.6582768983396317

 Confusion Matrix: 
 [[4473  230]
 [ 823  474]]

 F-Score:
 0.47376311844077956





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.12464421344946319
Adjusted R-Squared:  0.12259659757449115
RMSE:  0.385122179336163


Running Voting Reg

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [ 590.  590.  410.  456.  554.  462.  405.  353.  366.  344.  504.  462.
  478.  482.  384.  388.  463. 1308.  467.  404. 1308.  369.  552.  664.
  600.  300.  354. 1308.  410.  464.  664.  581.  467.  637.  377.  371.
  467.  532.  608.  462.  377.  855.  399. 1308.  463.  342.  597.  335.
  354.  535.  566.  495. 1308.  664.  593.  497.  664.  399.  501.  429.
  462.  326.  382. 1308.  279.  394.  566.  456.  855.  449.  410.  464.
  335.  637.  388.  366.  512.  369.  597.  366.  855.  410.  664.  306.
  350.  587.  496. 1308.  354.  410.  725.  366.  369.  797.  442.  566.
  512.  449.  512.  512.  326.  300.  417.  410. 1308.  496.  562.  797.
  410.  696.  467.  371.  377. 1308.  733.  467.  590.  494.  467.  344.
  426.  679.  576.  590.  463.  664.  505.  329.  442.  410.  442.  462.
  369.  440.  599.  369.  462.  410.  590.  512.  562.  369.  554.  449.
  369.  322.  733.  410.  345.  478.  279.  374.  410.  797.  495.  335.
 1308.  371.  410.  664.  371.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  314.   401.   334.   395.   678.   457.   429.   368.   353.   389.
   551.   459.   549.   368.   387.   342.   391.  6072.  1308.  1072.
  5392.   349.   605.  1059.   493.   307.   347.   923.   445.   446.
   505.   674.   515.   625.   222.   606.   493.   645.   640.   702.
   251.   923.   600.  2520.   450.   315.   504.   360.   375.   425.
   906.   462.   965.   733.   684.   566.   805.   439.   552.   307.
   422.   368.   802.  1991.   407.   413.   593.   530.  1085.   467.
   395.   462.   299.   611.   354.   316.   387.   440.   684.   383.
   965.   467.   616.   336.   445.   459.   634.  1206.   354.   305.
   696.   334.   335.   493.  1244.   522.   771.   679.   558.   405.
   382.   401.   420.   424.  1215.  1215.   347.   774.   388.   587.
   593.   260.   462.  6033.   612.   906.  1064.   481.   437.   342.
  1308.   718.   533.   876.   733.   416.   647.   359.   564.   679.
   608.   599.   876.   363.   428.   462.   389.   408.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410. 11504.   410.   410.
  5392.   410.   410.   410.   410.   410.   410.   467.   410.   410.
   410.   410.   410.   410.   410.   410.   463.   410.   410.   410.
   410.   410.   410.  5041.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.  1583.   410.   410.
   410.   456.   410.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   456.   410.   410.   410.   410.   410.   410.   410.
   410.   410.   410.   410.   410.   410.   410.   410.

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [  562.   586.   406.   426.   840.   462.   418.   356.   371.   346.
   491.   467.   553.   483.   392.   376.   430. 11504.   467.   421.
  5122.   384.   864.   651.  1112.   259.   380.  1108.   418.   429.
   733.   553.   455.   665.   195.   366.   503.  1134.   798.   462.
   395.   643.   366.   503.   441.   332.   590.   326.   334.   549.
   906.   488.  1182.   663.   618.  1234.   664.   388.   504.   377.
   462.   326.   611.   759.   257.   335.   567.   411.   975.   462.
   401.   496.   355.   814.   388.   378.   509.   370.   597.   334.
  1073.   423.   643.   258.   392.   715.  1290.  1064.   354.   419.
   725.   334.   358.  1275.  1505.   566.   771.   449.   516.   514.
   279.   306.   278.   424.  5392.  1166.   558.  2147.   395.   855.
   501.   359.   377.  2764.   887.   467.   624.   528.   437.   322.
   451.   729.   799.   613.   429.   654.   553.   326.   531.   405.
   449.   456.   390.   461.   566.   374.   388.   399.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


-----
-----
-----
Running on dataset4
Running Classification Algorithms
Running XgBoost
Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 1 0 0
 1 0 1 0 1 1 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 1
 0 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 0 0
 0 0 1 1 0 1 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 0
 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 1]

 Accuracy: 
 98.14814814814815

 Precision of event Happening: 
 99.03846153846155

 Recall of event Happening: 
 97.16981132075472

 AUC: 
 0.9813036020583191

 Confusion Matrix: 
 [[109   1]
 [  3 103]]

 F-Score:
 0.9809523809523809


Running LR
Prediction Vector: 
 [0 0 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 1 0 0 0 0
 0 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 1 1 1 0 0
 0 0 1 0 1 1 1 1 1 0 0 1 1 0 0 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Running Classification Algorithms
Running XgBoost


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0


Running LR


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [4 5 3 ... 2 5 5]

 Accuracy: 
 45.074065365624264

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 F-Score:
 0.37260801590617393


Running SVM


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [7 5 3 ... 3 4 5]

 Accuracy: 
 43.24006583588055

 Precision of event Happening: 
 0.0

 Recall of event Happening: 
 0.0

 F-Score:
 0.35081541055373494


Running Random Forest


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Prediction Vector: 
 [9 6 4 ... 2 3 6]

 Accuracy: 
 100.0

 Precision of event Happening: 
 100.0

 Recall of event Happening: 
 100.0

 F-Score:
 1.0





Running Regression Algorithms
Running Linear Regression
R-Squared Value:  0.9553086198818425
Adjusted R-Squared:  0.9551398139135019
RMSE:  0.30361276980676033


Running Voting Regressor
R-Squared Value:  0.984887813020904
Adjusted R-Squared:  0.9848307320502558
RMSE:  0.1765516618382713


-----
-----
-----
-----
