### If we train diverse classification models and them combine the predictions from these to one classifier, we get much better predictions. This project is to illustrate that point.

### Note that this is a conditional statement. For example, if the sample is very large say 1M, then SVC works better than the ensemble.

In [1]:
N=10**3

In [2]:
from sklearn.datasets import make_moons

In [3]:
X, y = make_moons(n_samples=N, noise=.3, random_state=42)

In [4]:
from sklearn.model_selection import train_test_split

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

We will train many models now

In [6]:
from sklearn.linear_model import LogisticRegression

In [7]:
from sklearn.svm import SVC

In [8]:
from sklearn.ensemble import RandomForestClassifier

In [9]:
rfc_clf=RandomForestClassifier()

In [10]:
svc_clf=SVC(probability=True)

In [11]:
log_clf=LogisticRegression()

Now we will train an ensemble model with 'hard' voting, that is the voting is based on prediction by majority votes for each case

In [12]:
from sklearn.ensemble import VotingClassifier

In [13]:
vot_clf=VotingClassifier(
    estimators=[('lr',log_clf),('rf',rfc_clf),('sv',svc_clf)],
    voting='hard'
    )

In [14]:
vot_clf.fit(X_train, y_train)

VotingClassifier(estimators=[('lr', LogisticRegression()),
                             ('rf', RandomForestClassifier()),
                             ('sv', SVC(probability=True))])

In [15]:
from sklearn.metrics import accuracy_score, recall_score,precision_score

In [16]:
for clsfr in (rfc_clf, svc_clf, log_clf,vot_clf):
    clsfr.fit(X_train,y_train)
    y_pred=clsfr.predict(X_test)
    print(clsfr.__class__.__name__,'\t'
          'accuracy=', accuracy_score(y_test,y_pred),'\t' 
          'recall=',recall_score(y_test,y_pred),'\t'
          'precision=',precision_score(y_test,y_pred)
         )

RandomForestClassifier 	accuracy= 0.90448 	recall= 0.8997148288973384 	precision= 0.9100961538461538
SVC 	accuracy= 0.91332 	recall= 0.909458174904943 	precision= 0.9181127548980408
LogisticRegression 	accuracy= 0.85392 	recall= 0.8530576679340938 	precision= 0.8569950660512494
VotingClassifier 	accuracy= 0.91036 	recall= 0.9068441064638784 	precision= 0.9148885159434188


Now we will train an ensemble model with 'soft' voting, that is the voting is based on average probability for class by each model.

In [17]:
vot_clf_soft=VotingClassifier(
    estimators=[('lr',log_clf),('rf',rfc_clf),('sv',svc_clf)],
    voting='soft'
    )

In [20]:
for clsfr in ([vot_clf_soft]):
    clsfr.fit(X_train,y_train)
    y_pred=clsfr.predict(X_test)
    print(clsfr.__class__.__name__,'\t'
          'accuracy=', accuracy_score(y_test,y_pred),'\t' 
          'recall=',recall_score(y_test,y_pred),'\t'
          'precision=',precision_score(y_test,y_pred)
         )

VotingClassifier 	accuracy= 0.91012 	recall= 0.9070025348542459 	precision= 0.9143176555138545
