# Identifying Crocodiles Using Classification
### Author: Adrian Khlim

In [2]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [3]:
crocodile_data = pd.read_csv("crocodile_dataset.csv")

In [33]:
crocodile_data.head()

Unnamed: 0,Observation ID,Common Name,Scientific Name,Family,Genus,Observed Length (m),Observed Weight (kg),Age Class,Sex,Date of Observation,Country/Region,Habitat Type,Conservation Status,Observer Name,Notes,Observed Length (ft),Observed Weight (lbs)
0,1,Morelet's Crocodile,Crocodylus moreletii,Crocodylidae,Crocodylus,1.9,62.0,Adult,Male,31-03-2018,Belize,Swamps,Least Concern,Allison Hill,Cause bill scientist nation opportunity.,6.233596,136.68644
1,2,American Crocodile,Crocodylus acutus,Crocodylidae,Crocodylus,4.09,334.5,Adult,Male,28-01-2015,Venezuela,Mangroves,Vulnerable,Brandon Hall,Ago current practice nation determine operatio...,13.418636,737.44539
2,3,Orinoco Crocodile,Crocodylus intermedius,Crocodylidae,Crocodylus,1.08,118.2,Juvenile,Unknown,07-12-2010,Venezuela,Flooded Savannas,Critically Endangered,Melissa Peterson,Democratic shake bill here grow gas enough ana...,3.543307,260.586084
3,4,Morelet's Crocodile,Crocodylus moreletii,Crocodylidae,Crocodylus,2.42,90.4,Adult,Male,01-11-2019,Mexico,Rivers,Least Concern,Edward Fuller,Officer relate animal direction eye bag do.,7.939633,199.297648
4,5,Mugger Crocodile (Marsh Crocodile),Crocodylus palustris,Crocodylidae,Crocodylus,3.75,269.4,Adult,Unknown,15-07-2019,India,Rivers,Vulnerable,Donald Reid,Class great prove reduce raise author play mov...,12.30315,593.924628


Checking my features (columns)

In [5]:
crocodile_data.columns

Index(['Observation ID', 'Common Name', 'Scientific Name', 'Family', 'Genus',
       'Observed Length (m)', 'Observed Weight (kg)', 'Age Class', 'Sex',
       'Date of Observation', 'Country/Region', 'Habitat Type',
       'Conservation Status', 'Observer Name', 'Notes'],
      dtype='object')

In [27]:
print("\n".join([col for col in crocodile_data.columns]))

Observation ID
Common Name
Scientific Name
Family
Genus
Observed Length (m)
Observed Weight (kg)
Age Class
Sex
Date of Observation
Country/Region
Habitat Type
Conservation Status
Observer Name
Notes


In [29]:
crocodile_data["Observed Length (ft)"] = crocodile_data["Observed Length (m)"] * 3.28084

In [30]:
crocodile_data["Observed Weight (lbs)"] = crocodile_data["Observed Weight (kg)"] * 2.20462

In [31]:
crocodile_data.isnull().sum()

Observation ID           0
Common Name              0
Scientific Name          0
Family                   0
Genus                    0
Observed Length (m)      0
Observed Weight (kg)     0
Age Class                0
Sex                      0
Date of Observation      0
Country/Region           0
Habitat Type             0
Conservation Status      0
Observer Name            0
Notes                    0
Observed Length (ft)     0
Observed Weight (lbs)    0
dtype: int64

In [32]:
crocodile_data.dtypes

Observation ID             int64
Common Name               object
Scientific Name           object
Family                    object
Genus                     object
Observed Length (m)      float64
Observed Weight (kg)     float64
Age Class                 object
Sex                       object
Date of Observation       object
Country/Region            object
Habitat Type              object
Conservation Status       object
Observer Name             object
Notes                     object
Observed Length (ft)     float64
Observed Weight (lbs)    float64
dtype: object

In [12]:
crocodile_data.describe()

Unnamed: 0,Observation ID,Observed Length (m),Observed Weight (kg)
count,1000.0,1000.0,1000.0
mean,500.5,2.41511,155.7719
std,288.819436,1.097542,175.186788
min,1.0,0.14,4.4
25%,250.75,1.6375,53.225
50%,500.5,2.43,100.6
75%,750.25,3.01,168.875
max,1000.0,6.12,1139.7


I want to classify based on any statistical data, what kind of crocodile a crocodile is. Thus, their genus will be my classification label, so I want to know what kind of genera (plural for genus, I know it's kind of cool) one is and how many labels I will have.

In [13]:
crocodile_data["Genus"].describe()

count           1000
unique             3
top       Crocodylus
freq             784
Name: Genus, dtype: object

In [None]:
crocodile_data["Genus"].value_counts() 

Genus
Crocodylus     784
Mecistops      111
Osteolaemus    105
Name: count, dtype: int64

Okay, sweet, I have 3 Genera:
- Crocodylus
- Mecistops
- Osteolaemus