##**Housekeeping note:**
This problem set is due after return from spring break, on March 31.  Please remember final project proposals are also due 3/31.  Please send your project proposals via email to me at dfenster@umbc.edu.  Refer to the project guidelines [here](https://drive.google.com/file/d/183eg4ccE9QgIer8irM-Mjco-vsw27nKZ/view?usp=sharing).

# Preamble
This problem set is an extension of Problem Set 6.  You will need the following artifacts from last week:

* The MNIST 784 dataset from OpenML, with dimensionality reduced to about 75\%.
* The Support Vector Machine classifier.
* Recoded target variables, such that the target variable is 1 if the digit is less than 5, and 0 otherwise.

(You may copy these artifacts from the posted solutions if needed.)  As with last week, please use the first 60,000 observations as training data, and the remaining 10,000 images as test data.

In [1]:
from sklearn.datasets import fetch_openml
from sklearn.preprocessing import StandardScaler
import numpy as np
from sklearn.svm import SVC

In [2]:
#fetch OpenML data
X, y = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)

#split into test/training sets
N=60000
X_train, y_train = X[:N, :], y[:N]
X_test, y_test = X[N:, :], y[N:]

#scale data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [3]:
from sklearn.decomposition import PCA
pca_75= PCA(n_components=0.75,random_state=2020)
X_train= pca_75.fit_transform(X_train)
X_test = pca_75.transform(X_test)

In [4]:
clf_svc = SVC(coef0=0.3864834922270453, degree=4, kernel='poly')

In [5]:
recode_fn = lambda y: np.choose(np.isin(y, list("01234")), [-1,1])
y_test_rcd, y_train_rcd = (recode_fn(y) for y in [y_test, y_train])

In [6]:
clf_svc.fit(X_train, y_train_rcd)

SVC(coef0=0.3864834922270453, degree=4, kernel='poly')

# Problem 1 -- Classifiers

Construct 3 classifiers using different algorithms, not including the SVM model built last week, that classify the MNIST dataset with an $F_1$ score of at least 0.9.  At least one classifier must use gradient boosting (AdaBoost, Gradient Boost, or xgboost).  Show the $F_1$ score and classification report for each model.

## Using Gradient Boosting Classifier

In [60]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier
gb_clf = GradientBoostingClassifier(random_state=0)
params = {
    "max_features": ["sqrt", None]
}
gscv=GridSearchCV(gb_clf,params,verbose=3,cv=2,scoring='f1')

In [62]:
gscv.fit(X_train[:10000, :], y_train_rcd[:10000])

Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.893 total time=   3.2s
[CV 2/2] END .................max_features=sqrt;, score=0.893 total time=   2.1s
[CV 1/2] END .................max_features=None;, score=0.907 total time=  21.5s
[CV 2/2] END .................max_features=None;, score=0.910 total time=  23.0s


GridSearchCV(cv=2, estimator=GradientBoostingClassifier(random_state=0),
             param_grid={'max_features': ['sqrt', None]}, scoring='f1',
             verbose=3)

In [64]:
print("Best Score using Gradient Boosting Classifier",gscv.best_score_)

Best Score using Gradient Boosting Classifier 0.9084856530206052


## Using Logistic Regression

In [18]:
from sklearn.model_selection import GridSearchCV

cvalues = np.logspace(-4, 4, 5)
cvalues
parameters = {'C':cvalues
              
              }

In [66]:
from sklearn.linear_model import LogisticRegression
model= LogisticRegression(max_iter=10000)
gscv1=GridSearchCV(model,parameters,verbose=3,cv=2,scoring='f1')

In [69]:
gscv1.fit(X_train[:20000, :], y_train_rcd[:20000])

Fitting 2 folds for each of 5 candidates, totalling 10 fits
[CV 1/2] END ..........................C=0.0001;, score=0.834 total time=   0.2s
[CV 2/2] END ..........................C=0.0001;, score=0.828 total time=   0.2s
[CV 1/2] END ............................C=0.01;, score=0.867 total time=   0.5s
[CV 2/2] END ............................C=0.01;, score=0.850 total time=   0.5s
[CV 1/2] END .............................C=1.0;, score=0.869 total time=   2.2s
[CV 2/2] END .............................C=1.0;, score=0.851 total time=   1.5s
[CV 1/2] END ...........................C=100.0;, score=0.869 total time=   2.2s
[CV 2/2] END ...........................C=100.0;, score=0.852 total time=   2.4s
[CV 1/2] END .........................C=10000.0;, score=0.869 total time=   2.1s
[CV 2/2] END .........................C=10000.0;, score=0.852 total time=   3.0s


GridSearchCV(cv=2, estimator=LogisticRegression(max_iter=10000),
             param_grid={'C': array([1.e-04, 1.e-02, 1.e+00, 1.e+02, 1.e+04])},
             scoring='f1', verbose=3)

In [70]:
print("Best Score using Logistic Regression:",gscv1.best_score_)

Best Score using Logistic Regression: 0.8607403447643879


In [None]:
pred_lr= cross_val_predict(gscv1,X_train[:10000, :], y_train_rcd[:10000], verbose=3, cv=5)

## Using RandomForest Classifier

In [21]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import ParameterGrid, RandomizedSearchCV

In [54]:
params = {
    
    "max_features": ['auto', 'sqrt', 'log2']
}



In [55]:
model1= RandomForestClassifier()

In [56]:
x=RandomizedSearchCV(model1,params,verbose=3,cv=2,scoring='f1')

In [57]:
x.fit(X_train[:10000, :], y_train_rcd[:10000])



Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.920 total time=   4.7s
[CV 2/2] END .................max_features=auto;, score=0.930 total time=   4.7s
[CV 1/2] END .................max_features=sqrt;, score=0.925 total time=   4.7s
[CV 2/2] END .................max_features=sqrt;, score=0.922 total time=   4.7s
[CV 1/2] END .................max_features=log2;, score=0.919 total time=   3.1s
[CV 2/2] END .................max_features=log2;, score=0.929 total time=   3.6s


RandomizedSearchCV(cv=2, estimator=RandomForestClassifier(),
                   param_distributions={'max_features': ['auto', 'sqrt',
                                                         'log2']},
                   scoring='f1', verbose=3)

In [71]:
print("Best Score using Random Forest Classifier:",x.best_score_)

Best Score using Random Forest Classifier: 0.9248543592513794


# Problem 2 -- Voting ensemble model

Build a voting ensemble model that combines the three classifiers from the previous problem, in addition to the SVM model developed last week.  What is the $F_1$ score of the ensemble model?

In [96]:
from sklearn.ensemble import VotingClassifier

models = [
    ("svm", clf_svc),
    ("gb",gscv),
    ("lr",gscv1),
    ("rf",x)
]

In [97]:
from sklearn.model_selection import cross_val_predict
clf_v = VotingClassifier(
    estimators=models)

In [98]:
X_train_sample,y_train_rcd_sample=X_train[:1000, :], y_train_rcd[:1000]

In [99]:
y_pred = cross_val_predict(clf_v,X_train_sample,y_train_rcd_sample, verbose=3, cv=5)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.818 total time=   0.2s
[CV 2/2] END .................max_features=sqrt;, score=0.830 total time=   0.2s
[CV 1/2] END .................max_features=None;, score=0.797 total time=   1.3s
[CV 2/2] END .................max_features=None;, score=0.860 total time=   1.3s
Fitting 2 folds for each of 5 candidates, totalling 10 fits
[CV 1/2] END ..........................C=0.0001;, score=0.765 total time=   0.0s
[CV 2/2] END ..........................C=0.0001;, score=0.756 total time=   0.0s
[CV 1/2] END ............................C=0.01;, score=0.830 total time=   0.0s
[CV 2/2] END ............................C=0.01;, score=0.837 total time=   0.0s
[CV 1/2] END .............................C=1.0;, score=0.800 total time=   0.1s
[CV 2/2] END .............................C=1.0;, score=0.815 total time=   0.0s
[CV 1/2] END ...........................C=100.0;, score=0.767 total tim



[CV 1/2] END .................max_features=auto;, score=0.837 total time=   0.4s
[CV 2/2] END .................max_features=auto;, score=0.849 total time=   0.4s
[CV 1/2] END .................max_features=sqrt;, score=0.859 total time=   0.4s
[CV 2/2] END .................max_features=sqrt;, score=0.841 total time=   0.4s
[CV 1/2] END .................max_features=log2;, score=0.834 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.824 total time=   0.3s


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   11.1s remaining:    0.0s


Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.835 total time=   0.2s
[CV 2/2] END .................max_features=sqrt;, score=0.814 total time=   0.2s
[CV 1/2] END .................max_features=None;, score=0.827 total time=   1.3s
[CV 2/2] END .................max_features=None;, score=0.828 total time=   1.3s
Fitting 2 folds for each of 5 candidates, totalling 10 fits
[CV 1/2] END ..........................C=0.0001;, score=0.743 total time=   0.0s
[CV 2/2] END ..........................C=0.0001;, score=0.739 total time=   0.0s
[CV 1/2] END ............................C=0.01;, score=0.804 total time=   0.0s
[CV 2/2] END ............................C=0.01;, score=0.816 total time=   0.0s
[CV 1/2] END .............................C=1.0;, score=0.807 total time=   0.1s
[CV 2/2] END .............................C=1.0;, score=0.806 total time=   0.1s
[CV 1/2] END ...........................C=100.0;, score=0.762 total tim



[CV 1/2] END .................max_features=auto;, score=0.805 total time=   0.4s
[CV 2/2] END .................max_features=auto;, score=0.844 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.839 total time=   0.4s
[CV 2/2] END .................max_features=sqrt;, score=0.842 total time=   0.4s
[CV 1/2] END .................max_features=log2;, score=0.821 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.836 total time=   0.3s


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   22.1s remaining:    0.0s


Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.839 total time=   0.2s
[CV 2/2] END .................max_features=sqrt;, score=0.822 total time=   0.2s
[CV 1/2] END .................max_features=None;, score=0.820 total time=   1.3s
[CV 2/2] END .................max_features=None;, score=0.811 total time=   1.3s
Fitting 2 folds for each of 5 candidates, totalling 10 fits
[CV 1/2] END ..........................C=0.0001;, score=0.756 total time=   0.0s
[CV 2/2] END ..........................C=0.0001;, score=0.734 total time=   0.0s
[CV 1/2] END ............................C=0.01;, score=0.816 total time=   0.0s
[CV 2/2] END ............................C=0.01;, score=0.812 total time=   0.0s
[CV 1/2] END .............................C=1.0;, score=0.782 total time=   0.1s
[CV 2/2] END .............................C=1.0;, score=0.833 total time=   0.0s
[CV 1/2] END ...........................C=100.0;, score=0.758 total tim



Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.852 total time=   0.4s
[CV 2/2] END .................max_features=auto;, score=0.804 total time=   0.4s
[CV 1/2] END .................max_features=sqrt;, score=0.850 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.828 total time=   0.4s
[CV 1/2] END .................max_features=log2;, score=0.834 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.836 total time=   0.3s
Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.891 total time=   0.2s
[CV 2/2] END .................max_features=sqrt;, score=0.851 total time=   0.2s
[CV 1/2] END .................max_features=None;, score=0.899 total time=   1.3s
[CV 2/2] END .................max_features=None;, score=0.831 total time=   1.3s
Fitting 2 folds for each of 5 candidates, totalling 10 fits
[CV 1/2] END



[CV 1/2] END .................max_features=auto;, score=0.870 total time=   0.4s
[CV 2/2] END .................max_features=auto;, score=0.820 total time=   0.4s
[CV 1/2] END .................max_features=sqrt;, score=0.873 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.850 total time=   0.4s
[CV 1/2] END .................max_features=log2;, score=0.883 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.827 total time=   0.3s
Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.873 total time=   0.2s
[CV 2/2] END .................max_features=sqrt;, score=0.814 total time=   0.2s
[CV 1/2] END .................max_features=None;, score=0.896 total time=   1.3s
[CV 2/2] END .................max_features=None;, score=0.804 total time=   1.3s
Fitting 2 folds for each of 5 candidates, totalling 10 fits
[CV 1/2] END ..........................C=0.0001;, score=0.752 total tim



Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.826 total time=   0.7s
[CV 2/2] END .................max_features=auto;, score=0.810 total time=   0.7s
[CV 1/2] END .................max_features=sqrt;, score=0.861 total time=   0.6s
[CV 2/2] END .................max_features=sqrt;, score=0.820 total time=   0.5s
[CV 1/2] END .................max_features=log2;, score=0.824 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.791 total time=   0.3s


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   56.8s finished


In [118]:
from sklearn.metrics import f1_score
Score=f1_score(y_train_rcd_sample, y_pred)
print("F1 Score using Voting Ensemble is",Score)

F1 Score using Voting Ensemble is 0.9067201604814442


## Problem 3 -- Stacking ensemble model
Stacking uses a final classifier (often a logistic regression) that outputs an aggregate of the predictors. Repeat the previous problem using a `StackingClassifier` rather than voting to compute the final prediction.  What is the $F_1$ score of the stacking classifier?


In [106]:
from sklearn.ensemble import StackingClassifier

models = [
    ("svm", clf_svc),
    ("gb",gscv),
    ("lr",gscv1),
    ("rf",x)
]

In [111]:
from sklearn.model_selection import cross_val_predict
clf_s = StackingClassifier(
    estimators=models,
    final_estimator=LogisticRegression())

In [112]:
y_pred_2 = cross_val_predict(clf_s,X_train_sample,y_train_rcd_sample, verbose=3, cv=5)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.818 total time=   0.2s
[CV 2/2] END .................max_features=sqrt;, score=0.830 total time=   0.2s
[CV 1/2] END .................max_features=None;, score=0.797 total time=   1.3s
[CV 2/2] END .................max_features=None;, score=0.860 total time=   1.3s
Fitting 2 folds for each of 5 candidates, totalling 10 fits
[CV 1/2] END ..........................C=0.0001;, score=0.765 total time=   0.0s
[CV 2/2] END ..........................C=0.0001;, score=0.756 total time=   0.0s
[CV 1/2] END ............................C=0.01;, score=0.830 total time=   0.0s
[CV 2/2] END ............................C=0.01;, score=0.837 total time=   0.0s
[CV 1/2] END .............................C=1.0;, score=0.800 total time=   0.1s
[CV 2/2] END .............................C=1.0;, score=0.815 total time=   0.0s
[CV 1/2] END ...........................C=100.0;, score=0.767 total tim



[CV 1/2] END .................max_features=auto;, score=0.838 total time=   0.4s
[CV 2/2] END .................max_features=auto;, score=0.845 total time=   0.4s
[CV 1/2] END .................max_features=sqrt;, score=0.829 total time=   0.4s
[CV 2/2] END .................max_features=sqrt;, score=0.813 total time=   0.4s
[CV 1/2] END .................max_features=log2;, score=0.845 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.821 total time=   0.3s
Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.848 total time=   0.2s
[CV 2/2] END .................max_features=sqrt;, score=0.801 total time=   0.2s
[CV 1/2] END .................max_features=None;, score=0.798 total time=   1.1s
[CV 2/2] END .................max_features=None;, score=0.854 total time=   1.1s
Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.842 total time



[CV 1/2] END .................max_features=auto;, score=0.823 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.791 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.818 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.837 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.804 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.839 total time=   0.3s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.846 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.832 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.827 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.871 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.849 total time=   0.2s
[CV 2/2] END .................max_features=log2;, score=0.825 total time=   0.3s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.838 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.801 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.808 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.805 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.859 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.824 total time=   0.3s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.791 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.778 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.764 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.759 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.787 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.752 total time=   0.2s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.773 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.805 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.777 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.803 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.771 total time=   0.2s
[CV 2/2] END .................max_features=log2;, score=0.777 total time=   0.2s


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   46.2s remaining:    0.0s


Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.835 total time=   0.2s
[CV 2/2] END .................max_features=sqrt;, score=0.814 total time=   0.2s
[CV 1/2] END .................max_features=None;, score=0.827 total time=   1.3s
[CV 2/2] END .................max_features=None;, score=0.828 total time=   1.3s
Fitting 2 folds for each of 5 candidates, totalling 10 fits
[CV 1/2] END ..........................C=0.0001;, score=0.743 total time=   0.0s
[CV 2/2] END ..........................C=0.0001;, score=0.739 total time=   0.0s
[CV 1/2] END ............................C=0.01;, score=0.804 total time=   0.0s
[CV 2/2] END ............................C=0.01;, score=0.816 total time=   0.0s
[CV 1/2] END .............................C=1.0;, score=0.807 total time=   0.1s
[CV 2/2] END .............................C=1.0;, score=0.806 total time=   0.0s
[CV 1/2] END ...........................C=100.0;, score=0.762 total tim



[CV 1/2] END .................max_features=auto;, score=0.859 total time=   0.4s
[CV 2/2] END .................max_features=auto;, score=0.823 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.840 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.831 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.851 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.810 total time=   0.3s
Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.828 total time=   0.2s
[CV 2/2] END .................max_features=sqrt;, score=0.814 total time=   0.2s
[CV 1/2] END .................max_features=None;, score=0.779 total time=   1.0s
[CV 2/2] END .................max_features=None;, score=0.850 total time=   1.0s
Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.837 total time



[CV 1/2] END .................max_features=auto;, score=0.838 total time=   0.4s
[CV 2/2] END .................max_features=auto;, score=0.854 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.799 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.841 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.808 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.824 total time=   0.3s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.788 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.819 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.838 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.836 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.825 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.845 total time=   0.3s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.826 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.830 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.836 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.813 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.809 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.812 total time=   0.3s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.810 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.772 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.804 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.783 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.826 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.778 total time=   0.3s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.824 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.771 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.811 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.779 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.757 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.801 total time=   0.3s


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  1.5min remaining:    0.0s


Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.839 total time=   0.2s
[CV 2/2] END .................max_features=sqrt;, score=0.822 total time=   0.2s
[CV 1/2] END .................max_features=None;, score=0.820 total time=   1.4s
[CV 2/2] END .................max_features=None;, score=0.811 total time=   1.3s
Fitting 2 folds for each of 5 candidates, totalling 10 fits
[CV 1/2] END ..........................C=0.0001;, score=0.756 total time=   0.0s
[CV 2/2] END ..........................C=0.0001;, score=0.734 total time=   0.0s
[CV 1/2] END ............................C=0.01;, score=0.816 total time=   0.0s
[CV 2/2] END ............................C=0.01;, score=0.812 total time=   0.0s
[CV 1/2] END .............................C=1.0;, score=0.782 total time=   0.1s
[CV 2/2] END .............................C=1.0;, score=0.833 total time=   0.1s
[CV 1/2] END ...........................C=100.0;, score=0.758 total tim



[CV 1/2] END .................max_features=auto;, score=0.880 total time=   0.4s
[CV 2/2] END .................max_features=auto;, score=0.843 total time=   0.4s
[CV 1/2] END .................max_features=sqrt;, score=0.848 total time=   0.4s
[CV 2/2] END .................max_features=sqrt;, score=0.822 total time=   0.4s
[CV 1/2] END .................max_features=log2;, score=0.827 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.809 total time=   0.3s
Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.824 total time=   0.2s
[CV 2/2] END .................max_features=sqrt;, score=0.836 total time=   0.2s
[CV 1/2] END .................max_features=None;, score=0.821 total time=   1.0s
[CV 2/2] END .................max_features=None;, score=0.844 total time=   1.0s
Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.856 total time



[CV 1/2] END .................max_features=auto;, score=0.863 total time=   0.4s
[CV 2/2] END .................max_features=auto;, score=0.842 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.840 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.851 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.819 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.848 total time=   0.3s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.842 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.826 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.851 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.854 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.834 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.869 total time=   0.2s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.817 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.852 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.816 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.856 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.819 total time=   0.2s
[CV 2/2] END .................max_features=log2;, score=0.820 total time=   0.3s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.830 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.814 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.813 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.799 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.799 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.807 total time=   0.3s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.863 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.823 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.828 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.820 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.866 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.799 total time=   0.3s
Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.891 total time=   0.2s
[CV 2/2] END .................max_features=sqrt;, score=0.851 total time=   0.2s
[CV 1/2] END .................max_features=None;, score=0.899 total time=   1.3s
[CV 2/2] END .................max_features=None;, score=0.831 total time=   1.3s
Fitting 2 folds for each of 5 candidates, totalling 10 fits
[CV 1/2] END



[CV 1/2] END .................max_features=auto;, score=0.884 total time=   0.4s
[CV 2/2] END .................max_features=auto;, score=0.830 total time=   0.4s
[CV 1/2] END .................max_features=sqrt;, score=0.874 total time=   0.4s
[CV 2/2] END .................max_features=sqrt;, score=0.841 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.870 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.835 total time=   0.3s
Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.865 total time=   0.2s
[CV 2/2] END .................max_features=sqrt;, score=0.794 total time=   0.2s
[CV 1/2] END .................max_features=None;, score=0.859 total time=   1.0s
[CV 2/2] END .................max_features=None;, score=0.809 total time=   1.0s
Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.878 total time



Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.845 total time=   0.4s
[CV 2/2] END .................max_features=auto;, score=0.806 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.831 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.792 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.850 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.786 total time=   0.3s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.854 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.829 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.863 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.825 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.893 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.818 total time=   0.2s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.876 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.837 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.852 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.820 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.834 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.834 total time=   0.3s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.843 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.870 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.844 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.863 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.842 total time=   0.2s
[CV 2/2] END .................max_features=log2;, score=0.861 total time=   0.2s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.857 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.846 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.846 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.827 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.869 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.841 total time=   0.3s
Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.873 total time=   0.2s
[CV 2/2] END .................max_features=sqrt;, score=0.814 total time=   0.2s
[CV 1/2] END .................max_features=None;, score=0.896 total time=   1.3s
[CV 2/2] END .................max_features=None;, score=0.804 total time=   1.3s
Fitting 2 folds for each of 5 candidates, totalling 10 fits
[CV 1/2] END



[CV 1/2] END .................max_features=auto;, score=0.830 total time=   0.4s
[CV 2/2] END .................max_features=auto;, score=0.799 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.858 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.788 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.846 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.816 total time=   0.3s
Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.809 total time=   0.2s
[CV 2/2] END .................max_features=sqrt;, score=0.792 total time=   0.2s
[CV 1/2] END .................max_features=None;, score=0.784 total time=   1.0s
[CV 2/2] END .................max_features=None;, score=0.784 total time=   1.0s
Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV 1/2] END .................max_features=sqrt;, score=0.828 total time



Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.796 total time=   0.4s
[CV 2/2] END .................max_features=auto;, score=0.787 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.780 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.791 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.766 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.748 total time=   0.3s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.831 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.776 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.791 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.773 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.764 total time=   0.2s
[CV 2/2] END .................max_features=log2;, score=0.774 total time=   0.3s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.842 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.772 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.833 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.795 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.818 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.788 total time=   0.3s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.816 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.822 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.832 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.841 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.849 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.822 total time=   0.2s




Fitting 2 folds for each of 3 candidates, totalling 6 fits
[CV 1/2] END .................max_features=auto;, score=0.839 total time=   0.3s
[CV 2/2] END .................max_features=auto;, score=0.821 total time=   0.3s
[CV 1/2] END .................max_features=sqrt;, score=0.871 total time=   0.3s
[CV 2/2] END .................max_features=sqrt;, score=0.805 total time=   0.3s
[CV 1/2] END .................max_features=log2;, score=0.838 total time=   0.3s
[CV 2/2] END .................max_features=log2;, score=0.802 total time=   0.2s


[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:  3.8min finished


In [119]:
from sklearn.metrics import f1_score
Score1=f1_score(y_train_rcd_sample, y_pred_2)
print("F1 Score using Stacking Ensemble Model is:",Score1)

F1 Score using Stacking Ensemble Model is: 0.9214145383104125


### F1 Score using Gradient Boosting Classifier is: 0.9084856530206052
### F1 Score using Logistic Regression Classifer is: 0.8607403447643879
### F1 Score using Random Forest Classifier is: 0.9248543592513794
### F1 Score using Voting Ensemble Model is : 0.9067201604814442
### F1 Score using Stacking Ensemble Model is: 0.9214145383104125

## Problem 4 -- Evaluation

At this point in the assignment, you have six classifiers:

* the support vector classifier from last week,
* the three classifiers from problem 1,
* the voting classifier from problem 2, and
* the stacking classifier from problem 3

Identify the model with the highest $F_1$ score, and train this model with the full training dataset.  Finally, score the test data against this model.  Does the model demonstrate predictive validity (i.e., are the $F_1$ scores for the test data comparable to the training data)?

In [121]:
final = RandomForestClassifier()
final.fit(X_train, y_train_rcd)

RandomForestClassifier()

In [123]:
from sklearn.metrics import classification_report
pred_final = final.predict(X_test)
print(classification_report(y_test_rcd, pred_final))

              precision    recall  f1-score   support

          -1       0.95      0.96      0.96      4861
           1       0.96      0.96      0.96      5139

    accuracy                           0.96     10000
   macro avg       0.96      0.96      0.96     10000
weighted avg       0.96      0.96      0.96     10000



In [126]:
finalf1_score = f1_score(y_test_rcd, pred_final)
print("F1-score: ",finalf1_score)

F1-score:  0.9581060015588465
