# Project: Multiclass Classification of Flower Species

reference: Deep Learning with Python chapter 10

You will know:
- How to load data from csv and make it available to Keras
- How to prepare multiclass classification data for modeling with neural networks
- How to evaluata Keras neural network models with scikit-learn

In [1]:
'''
We use the standard machine learning problem called the iris flowers dataset. 
This dataset is well studied and is a good problem for practicing on neural networks 
because all of the 4 input variables are numeric and have the same scale in centimeters. 
Each instance describes the properties of an observed flower measurements and 
the output variable is specific iris species. 

The attributes for this dataset can be summarized as follows:
1. Sepal length in centimeters. 
2. Sepal width in centimeters. 
3. Petal length in centimeters. 
4. Petal width in centimeters. 
5. Class.

This is a multiclass classification problem, 
meaning that there are more than two classes to be predicted, 
in fact there are three flower species.
'''

'\nWe use the standard machine learning problem called the iris flowers dataset. \nThis dataset is well studied and is a good problem for practicing on neural networks \nbecause all of the 4 input variables are numeric and have the same scale in centimeters. \nEach instance describes the properties of an observed flower measurements and \nthe output variable is specific iris species. \n\nThe attributes for this dataset can be summarized as follows:\n1. Sepal length in centimeters. \n2. Sepal width in centimeters. \n3. Petal length in centimeters. \n4. Petal width in centimeters. \n5. Class.\n\nThis is a multiclass classification problem, \nmeaning that there are more than two classes to be predicted, \nin fact there are three flower species.\n'

In [1]:
### Import Classes and Functions
import numpy
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline

Using TensorFlow backend.


In [2]:
### Initialize Random Number Generator
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

In [3]:
### Load the Dataset
dataframe = read_csv("../datasets/iris.csv", header=None)
dataset = dataframe.values
X = dataset[:, 0:4].astype(float)
Y = dataset[:, 4]

In [6]:
### Encode the Output Variable
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

In [8]:
### Define the Neural Network Model
# define baseline model
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(8, input_dim=4, activation='relu'))
    model.add(Dense(3, activation='softmax'))
    # compile model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)

In [9]:
### Evaluate the Model with k-Fold Cross-Validation
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)

results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Accuracy: 97.33% (4.42%)
