# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint
### Not for Grading

## Learning Objective

At the end of the experiment, you will be able to :

* classify using bagging classifer

## Dataset

### Description

The dataset consists of the below 7 columns,

- **species:** penguin species (Chinstrap, Adélie, or Gentoo)
- **culmen length & depth:** The culmen is the upper ridge of a bird's beak
- **flipper_length_mm:** flipper length
- **body_mass_g:** body mass
- **island:** island name (Dream, Torgersen, or Biscoe)
- **sex:** penguin sex

In [None]:
!  wget -qq https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/penguins.zip
!  unzip -qq penguins.zip

#### Loading the `penguins_size.csv` data

In [None]:
import numpy as np
import pandas as pd

df = pd.read_csv('penguins_size.csv')
df.head()

### Data Pre-Processing

#### Exercise 02: Data Cleaning

*  Count the NaN values in each column of the dataframe
*  Drop the records where sex column has NaN values
    *   Print the unique values from the sex column
*  Drop the records where the sex column has ' . ' values
    * Print the unique values after removing records with ' . ' 








In [None]:
# Count NaN values in each column of the dataframe
df.isna().sum()

In [None]:
# Drop the records where the sex column has '.' values
df = df[df.sex != '.']

print("Unique values after removing records with '.' : ",df.sex.unique())

In [None]:
df['island'].unique()

In [None]:
df['species'].unique()

In [None]:
# Drop the records where sex column has NaN values
df.dropna(subset = ['sex'], inplace = True)

# Print the unique() elements from the sex column
print("Unique values after dropping NA values : ",df.sex.unique())

# Drop the records where the sex column has '.' values
df = df[df.sex != '.']

print("Unique values after removing records with '.' : ",df.sex.unique())

#### Converting categorical values to numerical

In [None]:
from sklearn import preprocessing 
LE = preprocessing.LabelEncoder()

In [None]:
df['island'] = LE.fit_transform(df['island'])
df['sex'] = LE.fit_transform(df['sex'])
df['species'] = LE.fit_transform(df['species'])
df.head()

In [None]:
df.sex.unique()

In [None]:
df['species'].unique()

In [None]:
df['island'].unique()

In [None]:
# Create a temp variable for dataframe and copying (not prefered)
# df_encode = df.copy()
# df_encode['island'] = LE.fit_transform(df_encode['island'])
# df_encode['sex'] = LE.fit_transform(df_encode['sex'])
# df_encode['species'] = LE.fit_transform(df_encode['species'])
# df.head()

#### Considering the target labels as **species** and the remaining as the features


In [None]:
X = df.drop(['species'], axis=1)
y = df['species']

#### Split the data into train and test sets




In [None]:
from sklearn.model_selection import train_test_split

# Split data into training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

#### Perform Bagging classifier on the extracted data




In [None]:
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier as KNN
from sklearn.ensemble import BaggingClassifier
from sklearn.svm import SVC

# Instantiate classifiers
knn = KNN(3)
dt = DecisionTreeClassifier(max_depth=2)
svm = SVC()

classifiers = [('KNN', knn), ('Decision_Tree', dt), ('SVM', svm)]

# Using different classifiers as base_estimator
for clf_name, clf in classifiers:
    # Instantiate bagging classifier
    model = BaggingClassifier(base_estimator = clf, bootstrap=True)
    
    # Fit model on training dataset
    model.fit(X_train, y_train)
    
    # Prediction on test dataset
    y_pred = model.predict(X_test)
    
    # Evaluate the accuracy of clf on the test set
    print('{:s} : {:.3f}'.format(clf_name, accuracy_score(y_test, y_pred)))