# 项目练习 01

这是本书中的第一个项目练习，我们选用 [Iris Data Set](https://archive.ics.uci.edu/ml/datasets/Iris) 数据集，利用`Keras`建立神经网络模型用来解决多分类问题。

在这个项目中，我们可学会：


# 1.Iris Flowers 数据集

我们在这个项目中使用 [Iris Data Set](https://archive.ics.uci.edu/ml/datasets/Iris)，这个数据集中的每个样本有4个特征，1个类别。该数据集[^1]中的样本类别数为3类，每类样本数目为50个，总共150个样本。

属性信息：

- 花萼长度 sepal length(cm)
- 花萼宽度 sepal width(cm)
- 花瓣长度 petal length(cm)
- 花瓣宽度 petal width(cm)
- 类别：
    - Iris Setosa
    - Iris Versicolour
    - Iris Virginica

样本特征数据是数值型的，而且单位都相同（厘米）。

我们建立神经网络模型，经过已知数据集的训练，得到合适的网络参数，进而预测未知类别的iris plant的类别。这是一个多分类问题（3分类）。我们期望神经网络的目标分类准确率在95%-97%范围之内。


[^1]:    [iris data info](https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.names)

# 2.Import Classes and Functions

In [1]:
import numpy as np

import pandas as pd

from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline

Using Theano backend.


# 3.Initialize Random Number Generator

In [2]:
# 随机数参数设置
seed = 7
np.random.seed(seed)

# 4.Load dataset

In [3]:
# 加载数据
url = "http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
df = pd.read_csv(url, header=None)
dataset = df.values

# 样本特征和类别划分
X = dataset[:, 0:4].astype(float)
Y = dataset[:, 4]

In [4]:
# 看一看样本特征
X

array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2],
       [ 5.4,  3.9,  1.7,  0.4],
       [ 4.6,  3.4,  1.4,  0.3],
       [ 5. ,  3.4,  1.5,  0.2],
       [ 4.4,  2.9,  1.4,  0.2],
       [ 4.9,  3.1,  1.5,  0.1],
       [ 5.4,  3.7,  1.5,  0.2],
       [ 4.8,  3.4,  1.6,  0.2],
       [ 4.8,  3. ,  1.4,  0.1],
       [ 4.3,  3. ,  1.1,  0.1],
       [ 5.8,  4. ,  1.2,  0.2],
       [ 5.7,  4.4,  1.5,  0.4],
       [ 5.4,  3.9,  1.3,  0.4],
       [ 5.1,  3.5,  1.4,  0.3],
       [ 5.7,  3.8,  1.7,  0.3],
       [ 5.1,  3.8,  1.5,  0.3],
       [ 5.4,  3.4,  1.7,  0.2],
       [ 5.1,  3.7,  1.5,  0.4],
       [ 4.6,  3.6,  1. ,  0.2],
       [ 5.1,  3.3,  1.7,  0.5],
       [ 4.8,  3.4,  1.9,  0.2],
       [ 5. ,  3. ,  1.6,  0.2],
       [ 5. ,  3.4,  1.6,  0.4],
       [ 5.2,  3.5,  1.5,  0.2],
       [ 5.2,  3.4,  1.4,  0.2],
       [ 4.7,  3.2,  1.6,  0.2],
       [ 4

In [5]:
# 看一看样本类别
Y

array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versic

# 5.Encode The output variable

参考：

- [sklearn.preprocessing.LabelEncoder — scikit-learn 0.19.0 documentation](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html)
- [4.8. Transforming the prediction target (y) — scikit-learn 0.19.0 documentation](http://scikit-learn.org/stable/modules/preprocessing_targets.html#preprocessing-targets)
- [Utils - Keras Documentation](https://keras.io/utils/#to_categorical)

In [6]:
# encode class values as interger
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert intergers to dummy variables(i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

In [7]:
dummy_y

array([[ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  0.

# 6.Define NN model

4-4-3

In [8]:
# 定义基准模型
def baseline_model():
    # 创建模型
    model = Sequential()
    model.add(Dense(4, input_dim=4, init='normal', activation='relu'))
    model.add(Dense(3, init='normal', activation='sigmoid'))
    # 编译模型
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)

# 7.Evaluate the model with k-fold cross validation

In [9]:
# 交叉验证准备
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)

# 结果
results = cross_val_score(estimator, X, dummy_y, cv=kfold)

print("Accuracy: {0:.2f}% ({1:.2f}%)".format(results.mean()*100, results.std()*100))



Accuracy: 46.67% (22.11%)


# 8.Summary