A voting classifier is a type of machine learning estimator that combines the predictions of multiple base models or estimators to make a final prediction. It aggregates the findings of each base estimator to improve the overall accuracy and reliability of the prediction.

There are two main approaches to implementing a voting classifier:

1. **Majority Voting:** Each base model or estimator in the ensemble makes a prediction for a given test instance, and the final output prediction is determined by the majority of votes among the models. If a prediction receives more than half of the votes, it is chosen as the final prediction. However, if none of the predictions receive more than half of the votes, it indicates that the ensemble method couldn't reach a consensus or make a stable prediction for that particular instance.

2. **Weighted Voting:** In contrast to majority voting, weighted voting allows assigning different weights or importance to individual models. Some models may have higher weights, indicating that their predictions are considered more reliable or accurate. This means that the predictions of the better-performing models are given more weight and are counted multiple times during the aggregation process.

Ensemble methods, such as the voting classifier, aim to leverage the strengths of multiple weak classifiers. Instead of relying on a single classifier that may be limited in its ability to generalize well, ensemble methods train and combine multiple weak classifiers. Each weak classifier specializes in different parts of the input space, allowing the ensemble to make more informed predictions across a broader range of scenarios. This diversity among the weak classifiers helps improve the overall performance and robustness of the voting classifier.

# *Importing Libraries*

In [1]:
import pandas as pd
import numpy as np

# *Importing Dataset*

In [2]:
Data_set = pd.read_csv("cancer.csv")
Data_set

Unnamed: 0,Sample Code Number,Clump Thickness,Uniformity of Cell Size,Uniformity of Cell Shape,Marginal Adhesion,Single Epithelial Cell Size,Bare Nuclei,Bland Chromatin,Normal Nucleoli,Mitoses,Class
0,1000025,5,1,1,1,2,1,3,1,1,2
1,1002945,5,4,4,5,7,10,3,2,1,2
2,1015425,3,1,1,1,2,2,3,1,1,2
3,1016277,6,8,8,1,3,4,3,7,1,2
4,1017023,4,1,1,3,2,1,3,1,1,2
...,...,...,...,...,...,...,...,...,...,...,...
694,776715,3,1,1,1,3,2,1,1,1,2
695,841769,2,1,1,1,2,1,1,1,1,2
696,888820,5,10,10,3,7,3,8,10,2,4
697,897471,4,8,6,4,3,4,10,6,1,4


In [3]:
Data_set.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 699 entries, 0 to 698
Data columns (total 11 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   Sample Code Number           699 non-null    int64 
 1   Clump Thickness              699 non-null    int64 
 2   Uniformity of Cell Size      699 non-null    int64 
 3   Uniformity of Cell Shape     699 non-null    int64 
 4   Marginal Adhesion            699 non-null    int64 
 5   Single Epithelial Cell Size  699 non-null    int64 
 6   Bare Nuclei                  699 non-null    object
 7   Bland Chromatin              699 non-null    int64 
 8   Normal Nucleoli              699 non-null    int64 
 9   Mitoses                      699 non-null    int64 
 10  Class                        699 non-null    int64 
dtypes: int64(10), object(1)
memory usage: 60.2+ KB


In [4]:
Input = Data_set["Bare Nuclei"]
Input.drop(Input.index[Input == '?'], inplace=True)
Input = Input.values.tolist()