# 課程主題：使用Google Colab執行Python


教學案例：鳶尾花資料集

[安德森鳶尾花卉數據集（英文：Anderson's Iris data set）](https://zh.wikipedia.org/wiki/%E5%AE%89%E5%BE%B7%E6%A3%AE%E9%B8%A2%E5%B0%BE%E8%8A%B1%E5%8D%89%E6%95%B0%E6%8D%AE%E9%9B%86)

Now, we will use the pandas library to load the Iris data set into a DataFrame object:

現在，我們將使用pandas庫將Iris數據集加載到DataFrame對像中：

In [1]:
import pandas as pd
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header=None)

df.tail()

Unnamed: 0,0,1,2,3,4
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica
149,5.9,3.0,5.1,1.8,Iris-virginica


In [0]:
df.iloc[145:150, 0:5]

Next, we extract the first 100 class labels that correspond to the 50 Iris-Setosa and 50 Iris-Versicolor flowers, respectively:

接下來，我們分別提取對應於50個Iris-Setosa和50個Iris-Versicolor花的前100個標籤：

In [0]:
import matplotlib.pyplot as plt
import numpy as np

y = df.iloc[0:149, 4].values
y

The we want to convert the class labels into the two integer class labels 1 (Versicolor) and -1 (Setosa) that we assign to a vector y where the values method of a pandas DataFrame yields the corresponding NumPy representation.

我們想要將類標籤轉換為我們分配給向量y的兩個整數類標籤1（Versicolor）和-1（Setosa），其中pandas DataFrame的values方法產生相應的NumPy表示。

鳶尾屬下的三個亞屬，山鳶尾令為(-1)、變色鳶尾和維吉尼亞鳶尾令為(1)。

In [0]:
y = np.where(y == 'Iris-setosa', -1, 1)
y

Also, we need to extract the first feature column (sepal length) and the third feature column (petal length) of those 100 training samples and assign them to a feature matrix X:

此外，我們需要提取這100個訓練樣本的第一個特徵列（萼片長度）(0) 和第三個特徵列（花瓣長度）(2) ，並將它們分配給特徵矩陣X：

In [0]:
X = df.iloc[0:100, [0, 2]].values
X

We can visualize via a two-dimensional scatter plot using the matplotlib:

我們可以使用matplotlib通過二維散點圖進行可視化：

(virginica暫時被忽略，X 花萼長度，Y 花瓣長度)

In [0]:
plt.scatter(X[:50, 0], X[:50, 1], color='red', marker='o', label='setosa')
plt.scatter(X[50:100, 0], X[50:100, 1], color='blue', marker='x', label='versicolor')
plt.xlabel('petal length')
plt.ylabel('sepal length')
plt.legend(loc='upper left')
plt.show()

**Code : Perceptron learning algorithm**

The following code defines perceptron interface as a Python Class:

![Perceptron](https://www.bogotobogo.com/python/scikit-learn/images/PerceptronModel/PerceptronDiagram.png)

代碼：感知器學習算法

以下代碼將perceptron接口定義為Python類：

We can see the plot of the misclassification errors versus the number of epochs as shown below:

我們可以看到錯誤分類錯誤與時期數量的關係圖，如下所示：

In [0]:
# perceptron.py
import numpy as np
class Perceptron(object):
   def __init__(self, rate = 0.01, niter = 10):
      self.rate = rate
      self.niter = niter

   def fit(self, X, y):
      """Fit training data
      X : Training vectors, X.shape : [#samples, #features]
      y : Target values, y.shape : [#samples]
      """

      # weights
      self.weight = np.zeros(1 + X.shape[1])

      # Number of misclassifications
      self.errors = []  # Number of misclassifications

      for i in range(self.niter):
         err = 0
         for xi, target in zip(X, y):
            delta_w = self.rate * (target - self.predict(xi))
            self.weight[1:] += delta_w * xi
            self.weight[0] += delta_w
            err += int(delta_w != 0.0)
         self.errors.append(err)
      return self

   def net_input(self, X):
      """Calculate net input"""
      return np.dot(X, self.weight[1:]) + self.weight[0]

   def predict(self, X):
      """Return class label after unit step"""
      return np.where(self.net_input(X) >= 0.0, 1, -1)
#------------------------------------------
# import Perceptron from perceptron.py
# from perceptron import Perceptron
pn = Perceptron(0.1, 10)
pn.fit(X, y)
plt.plot(range(1, len(pn.errors) + 1), pn.errors, marker='o')
plt.xlabel('Epochs')
plt.ylabel('Number of misclassifications')
plt.show()


Visualize the decision boundaries

To visualize the decision boundaries for our 2D datasets, let's implement a small convenience function:

可視化決策邊界

為了可視化我們的2D數據集的決策邊界，讓我們實現一個小的便利功能：

In [0]:
#------------------------------------------
from matplotlib.colors import ListedColormap
def plot_decision_regions(X, y, classifier, resolution=0.02):
   # setup marker generator and color map
   markers = ('s', 'x', 'o', '^', 'v')
   colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
   cmap = ListedColormap(colors[:len(np.unique(y))])

   # plot the decision surface
   x1_min, x1_max = X[:,  0].min() - 1, X[:, 0].max() + 1
   x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
   xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
   np.arange(x2_min, x2_max, resolution))
   Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
   Z = Z.reshape(xx1.shape)
   plt.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap)
   plt.xlim(xx1.min(), xx1.max())
   plt.ylim(xx2.min(), xx2.max())

   # plot class samples
   for idx, cl in enumerate(np.unique(y)):
      plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1],
      alpha=0.8, c=cmap(idx),
      marker=markers[idx], label=cl)
      

In the code above, we define a number of colors and markers and create a color map from the list of colors via ListedColormap.

Then, we determine the minimum and maximum values for the two features and use those feature vectors to create a pair of grid arrays xx1 and xx2 via the NumPy meshgrid function.

Since we trained our perceptron classifier on two feature dimensions, we need to flatten the grid arrays and create a matrix that has the same number of columns as the Iris training subset so that we can use the predict method to predict the class labels Z of the corresponding grid points.

After reshaping the predicted class labels Z into a grid with the same dimensions as xx1 and xx2 , we can now draw a contour plot via matplotlib's contourf function that maps the different decision regions to different colors for each predicted class in the grid array:

在上面的代碼中，我們定義了許多顏色和標記，並通過ListedColormap從顏色列表創建顏色映射。

然後，我們確定兩個要素的最小值和最大值，並使用這些要素向量通過NumPy meshgrid函數創建一對網格數組xx1和xx2。

由於我們在兩個特徵維度上訓練我們的感知器分類器，我們需要展平網格陣列並創建一個與Iris訓練子集具有相同列數的矩陣，以便我們可以使用預測方法來預測類別標籤Z 相應的網格點。

在將預測的類標籤Z重新塑造成與xx1和xx2具有相同尺寸的網格之後，我們現在可以通過matplotlib的contourf函數繪製輪廓圖，該函數將不同的決策區域映射到網格陣列中每個預測類的不同顏色：

In [0]:
# X = df.iloc[0:100, [0, 2]].values

plot_decision_regions(X, y, classifier=pn)
plt.xlabel('sepal length [cm]')
plt.ylabel('petal length [cm]')
plt.legend(loc='upper left')
plt.show()