## Overview

The idea behind this project is to deploy a decision tree classification model to categorize animals into specific catgories (e.g mammal, fish, bird etc) based on the features such as does it have hair?, feathers? etc. 

In this case the target variable from our model will be the animal type as mentioned above. This is defined by the 'class' column we will observe in our dataset. Each unique number in the class corresponds to a specific class of animal. Our dataset contains int values in all attribute columns with most being (1/0) as to be treated as booleans.

Thus, this decision tree will work by way of compaing these boolean attributes starting from the root node, following the path with which attributes match.

In [1]:
from sklearn.tree import DecisionTreeClassifier # importing the decsion tree classifier
from sklearn.model_selection import train_test_split 
import pandas as pd #pandas

In [2]:
import os


In [17]:
ZooDF = pd.read_csv("zoocmonpls.csv") # reading in our csv file into a pandas data fram
ZooDF.fillna(0) # this data set may have NAN values, to replace those with 0

ZooDF.dtypes #understand our datatypes

animal_name    object
hair            int64
feathers        int64
eggs            int64
milk            int64
airbone         int64
aquatic         int64
predator        int64
toothed         int64
backbone        int64
breathes        int64
venomous        int64
fins            int64
legs            int64
tail            int64
domestic        int64
catsize         int64
class           int64
dtype: object

In [4]:
ZooDF.head() #confirm data has loaded in correctly 

Unnamed: 0,animal_name,hair,feathers,eggs,milk,airbone,aquatic,predator,toothed,backbone,breathes,venomous,fins,legs,tail,domestic,catsize,class
0,aardvark,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1
1,antelope,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1
2,bass,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4
3,bear,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1
4,boar,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1


In [18]:

ZooDF = ZooDF.drop("animal_name", axis = 1)

In [19]:
#features vs target variables declared (independent vs dependent)
features = ZooDF.drop("class", axis = 1)
targets = ZooDF["class"]

Now that we have split our features and target, we can train the model.

Using train_test_split we can inlcude the feature and target variables and declare the split we want between them
In this case I will try and experiment with a 75/25 split with the training size from our data to be 75%.

In [20]:
train_feat, test_feat, train_targets, test_targets = \
        train_test_split(features, targets, train_size=0.75)

In [37]:
tree = DecisionTreeClassifier(criterion="entropy", max_depth=6)
tree = tree.fit(train_feat, train_targets)

For the depth of our model I will experiment with having a max depth of 6 (this depth is examining how complex our tree can get).

To note: extending the depth may overfit.

In [38]:
predictions = tree.predict(test_feat) #predicting


In [39]:
accuracy = tree.score(test_feat, test_targets) 
print("The prediction accuracy is: {:0.2f}%".format(accuracy * 100))
# Checking the accuracy of this model.

The prediction accuracy is: 96.15%


Now that we have a model developed, let us try and input a new 'animal' and allow the model to predict what class it shall fall under.

Below I have added in raw information on the attributes of the said 'animal'.

In [40]:
Tfeatures = {
    "hair": 1,
    "feathers": 1,
    "eggs": 1,
    "milk": 0,
    "airbone": 1,
    "aquatic": 1,
    "predator": 0,
    "toothed": 1,
    "backbone": 1,
    "breathes": 1,
    "venomous": 0,
    "fins": 0,
    "legs": 1,
    "tail": 1,
    "domestic": 0,
    "catsize": 0
}

#using this model to predict on this raw data
Tfeatures = pd.DataFrame([Tfeatures], columns=train_feat.columns)
prediction = tree.predict(Tfeatures)[0]
print([prediction])

[2]


We get a resultant attribute of '2'. This class number refers to the animal type Bird.
This does make sense if we think about what the model did. Having 'feathers' as 1 (yes) for example was one of the nodes in the decision tree categorixing it as a bird due to the fact that all birds have feathers.

For Example we can take a look below

In [41]:
ZooDF.loc[ZooDF.feathers == 1] #viewing the animals with feathers (all have class '2' - Bird)

Unnamed: 0,hair,feathers,eggs,milk,airbone,aquatic,predator,toothed,backbone,breathes,venomous,fins,legs,tail,domestic,catsize,class
11,0,1,1,0,1,0,0,0,1,1,0,0,2,1,1,0,2
16,0,1,1,0,1,0,1,0,1,1,0,0,2,1,0,0,2
20,0,1,1,0,1,0,0,0,1,1,0,0,2,1,1,0,2
21,0,1,1,0,1,1,0,0,1,1,0,0,2,1,0,0,2
23,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,1,2
33,0,1,1,0,1,1,1,0,1,1,0,0,2,1,0,0,2
37,0,1,1,0,1,0,1,0,1,1,0,0,2,1,0,0,2
41,0,1,1,0,0,0,1,0,1,1,0,0,2,1,0,0,2
43,0,1,1,0,1,0,0,0,1,1,0,0,2,1,0,0,2
56,0,1,1,0,0,0,0,0,1,1,0,0,2,1,0,1,2


### Next Steps: Building front end and image showcase using pyxl .. in progress