**Machine Learning Lab - CSE 432**

# 4 Classification 01

**Classification** and **regression** are two types of supervised machine learning tasks. In classification, the goal is to assign a discrete label to an input, such as "spam" or "not spam" for an email. In regression, the goal is to predict a continuous value, such as the price of a house or the age of a person. Both tasks require a training dataset that contains input-output pairs, and a learning algorithm that can find a function that maps the inputs to the outputs. The performance of the classifier or regressor can be evaluated using metrics such as accuracy, precision, recall, F1-score, mean squared error, or R-squared.

**4.1 Importing pandas**

In [1]:
import pandas as pd

**4.2 Importing Data Set**

The zoo dataset (https://archive.ics.uci.edu/dataset/111/zoo) will be used for this task. The dataset contains data of multiple animals, each under a different category.

**Features**
   1. animal name:      Unique for each instance
   2. hair:		        Boolean
   3. feathers:		    Boolean
   4. eggs:		        Boolean
   5. milk:		        Boolean
   6. airborne:		    Boolean
   7. aquatic:		    Boolean
   8. predator:		    Boolean
   9. toothed:		    Boolean
   10. backbone:		    Boolean
   11. breathes:		    Boolean
   12. venomous:		    Boolean
   13. fins:		        Boolean
   14. legs:		        Numeric (set of values: {0,2,4,5,6,8})
   15. tail:		        Boolean
   16. domestic:		    Boolean
   17. catsize:		    Boolean

**Class**

   18. type:		        Numeric (integer values in range [1,7])
  
       1 (41) aardvark, antelope, bear, boar, buffalo, calf,
              cavy, cheetah, deer, dolphin, elephant,
              fruitbat, giraffe, girl, goat, gorilla, hamster,
              hare, leopard, lion, lynx, mink, mole, mongoose,
              opossum, oryx, platypus, polecat, pony,
              porpoise, puma, pussycat, raccoon, reindeer,
              seal, sealion, squirrel, vampire, vole, wallaby,wolf
       2 (20) chicken, crow, dove, duck, flamingo, gull, hawk,
              kiwi, lark, ostrich, parakeet, penguin, pheasant,
              rhea, skimmer, skua, sparrow, swan, vulture, wren
       3 (5)  pitviper, seasnake, slowworm, tortoise, tuatara
       4 (13) bass, carp, catfish, chub, dogfish, haddock,
              herring, pike, piranha, seahorse, sole, stingray, tuna
       5 (4)  frog, frog, newt, toad
       6 (8)  flea, gnat, honeybee, housefly, ladybird, moth, termite, wasp
       7 (10) clam, crab, crayfish, lobster, octopus,
              scorpion, seawasp, slug, starfish, worm

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
# Importing the data

df = pd.read_csv('/content/drive/MyDrive/ML Lab/Weak 4/zoo.csv')
df

Unnamed: 0,aardvark,1,0,0.1,1.1,0.2,0.3,1.2,1.3,1.4,1.5,0.4,0.5,4,0.6,0.7,1.6,1.7
0,antelope,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1
1,bass,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4
2,bear,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1
3,boar,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1
4,buffalo,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,wallaby,1,0,0,1,0,0,0,1,1,1,0,0,2,1,0,1,1
96,wasp,1,0,1,0,1,0,0,0,0,1,1,0,6,0,0,0,6
97,wolf,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1
98,worm,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,7


The dataset does not have any column header. We shall add that manually.

In [None]:
column_headers = ['animal_name', 'hair', 'feathers', 'eggs', 'milk', 'airborne', 'aquatic', 'predator', 'toothed', 'backbone', 'breathes', 'venomous', 'fins', 'legs', 'tail', 'domestic', 'catsize', 'type']
df.columns = column_headers
df

Unnamed: 0,animal_name,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,breathes,venomous,fins,legs,tail,domestic,catsize,type
0,antelope,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1
1,bass,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4
2,bear,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1
3,boar,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1
4,buffalo,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,wallaby,1,0,0,1,0,0,0,1,1,1,0,0,2,1,0,1,1
96,wasp,1,0,1,0,1,0,0,0,0,1,1,0,6,0,0,0,6
97,wolf,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1
98,worm,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,7


**4.3 Creating Feature and Class Set**

We will use each column (except animal_name and type) as our feature. Copy the contents from column_names list to features and remove the necessary column names.

In [None]:
features = ['hair', 'feathers', 'eggs', 'milk', 'airborne', 'aquatic', 'predator', 'toothed', 'backbone', 'breathes', 'venomous', 'fins', 'legs', 'tail', 'domestic', 'catsize']
X = df[features]
y = df['type']

'type' column is our class

In [None]:
y = df['type']
y

**4.4 Installing scikit-learn**

scikit-learn is a Python library that provides simple and efficient tools for predictive data analysis. It offers various algorithms for classification, regression, clustering, dimensionality reduction, model selection and preprocessing. It is built on NumPy, SciPy, and matplotlib and has a BSD license. It requires Python 3.7 or newer and supports various data mining tasks.

In [None]:
#!pip install -U scikit-learn

**4.5 Train Test Split**

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=16)

**4.6 Classification**

Most of the classification tasks we will perform consist of the following basic steps
1. Import the necessary library or class or method
2. Build or create model
3. Train model
4. Test model

**4.6.1 Decision Tree**

In [None]:
from sklearn.tree import DecisionTreeClassifier
model_dTree = DecisionTreeClassifier()

In [None]:
# Build or Create Model
hist_dTree = model_dTree.fit(X_train, y_train)

In [None]:
# See the prediction for an animal of your choice
random_animal = [[1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1]]
model_dTree.predict(random_animal)



array([1])

In [None]:
# Get prediction of the model for test dataset
result_dTree = model_dTree.predict(X_test)

from sklearn.metrics import classification_report
# Get Classification accuray, precision, etc.
print(classification_report(y_test, result_dTree))

              precision    recall  f1-score   support

           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00         1
           3       0.00      0.00      0.00         1
           4       1.00      1.00      1.00         5
           5       0.50      0.50      0.50         2
           6       0.75      1.00      0.86         3
           7       1.00      0.75      0.86         4

    accuracy                           0.88        25
   macro avg       0.75      0.75      0.74        25
weighted avg       0.89      0.88      0.88        25



**4.6.2 Logistic Regression**

In [None]:
# Import LogisticRegression from sklearn.linear_model
from sklearn.linear_model import LogisticRegression

In [None]:
# Build or Create Model
model_logReg = LogisticRegression(random_state=16)

In [None]:
# Train Model
hist_logReg = model_logReg.fit(X_train, y_train)

# Get prediction of the model for test dataset
result_logReg = model_logReg.predict(X_test)

In [None]:
print(classification_report(y_test, result_logReg))

**4.6.3 Naive Bayes**

Read this link https://scikit-learn.org/stable/modules/naive_bayes.html and complete yourself.

**4.6.4 Support Vector Machine**

Read this link https://scikit-learn.org/stable/modules/svm.html and complete yourself.

**4.6.5 Random Forest**

Read this link https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html and complete yourself.