<h2 align="center">Codebasics ML Course: Emsemble Learning: Voting Classifier</h2>

### Problem Statement:  Classify raisins into one of the two categories,
1. Kecimen
1. Besni

### Dataset Citation
This dataset is used under citation guidelines from the original authors. For detailed study and dataset description, see the following references:

- **Citation**: Cinar, I., Koklu, M., & Tasdemir, S. (2020). Classification of Raisin Grains Using Machine Vision and Artificial Intelligence Methods. *Gazi Journal of Engineering Sciences, 6*(3), 200-209. DOI: [10.30855/gmbd.2020.03.03](https://doi.org/10.30855/gmbd.2020.03.03)
- **Dataset available at**: [Murat Koklu's Dataset Page](https://www.muratkoklu.com/datasets/)
- **Article download**: [DergiPark](https://dergipark.org.tr/tr/download/article-file/1227592)


In [1]:
import pandas as pd

df = pd.read_excel("Raisin_Dataset.xlsx")
df.sample(5)

Unnamed: 0,Area,MajorAxisLength,MinorAxisLength,Eccentricity,ConvexArea,Extent,Perimeter,Class
328,57310,351.396481,209.881228,0.802035,59225,0.699833,940.413,Kecimen
347,61967,364.784018,218.566173,0.800625,63724,0.6873,981.059,Kecimen
552,204864,596.639802,440.497127,0.674476,209457,0.751009,1726.246,Besni
602,74966,465.360816,209.279646,0.893172,78159,0.712415,1157.089,Besni
889,79058,454.437216,236.964252,0.853285,82555,0.578256,1175.034,Besni


In [2]:
df.shape

(900, 8)

There are total 900 records and using all the features that we have available, we will build a classification model by using support vector machine 

### Train Test Split

In [3]:
X = df[["Area", "MajorAxisLength", "MinorAxisLength", "Eccentricity", "ConvexArea", "Extent", "Perimeter"]]
y = df["Class"]

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)

### Model Training Using SVM: RBF Kernel

In [5]:
from sklearn.svm import SVC

model = SVC(kernel="rbf")
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import classification_report

report = classification_report(y_test, y_pred)
print(report)

# model.n_iter_

              precision    recall  f1-score   support

       Besni       0.86      0.75      0.80        83
     Kecimen       0.81      0.90      0.85        97

    accuracy                           0.83       180
   macro avg       0.83      0.82      0.82       180
weighted avg       0.83      0.83      0.83       180



### Train different models and use majority voting

In [6]:
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

# Create different models
log_model = LogisticRegression()
dt_model = DecisionTreeClassifier()
svm_model = SVC(probability=True)

# Create a voting classifier
voting_clf = VotingClassifier(
    estimators=[('lr', log_model), ('dt', dt_model), ('svm', svm_model)],
    voting='hard')  # Use 'soft' for soft voting

voting_clf.fit(X_train, y_train)

VotingClassifier(estimators=[('lr', LogisticRegression()),
                             ('dt', DecisionTreeClassifier()),
                             ('svm', SVC(probability=True))])

In [7]:
y_pred = voting_clf.predict(X_test)

report = classification_report(y_test, y_pred)
print(report)

              precision    recall  f1-score   support

       Besni       0.93      0.83      0.88        83
     Kecimen       0.87      0.95      0.91        97

    accuracy                           0.89       180
   macro avg       0.90      0.89      0.89       180
weighted avg       0.90      0.89      0.89       180



As you can see above, the model performance (i.e. precision, recall, accuracy etc.) has improved compared to stand alone SVC model that we trained earlier