# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Breast-Cancer-Wisconsin-(Diagnostic)-Data-Set" data-toc-modified-id="Breast-Cancer-Wisconsin-(Diagnostic)-Data-Set-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Breast Cancer Wisconsin (Diagnostic) Data Set</a></div><div class="lev2 toc-item"><a href="#Attribute-Information:" data-toc-modified-id="Attribute-Information:-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Attribute Information:</a></div><div class="lev2 toc-item"><a href="#分類器" data-toc-modified-id="分類器-12"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>分類器</a></div><div class="lev2 toc-item"><a href="#仮説クラス" data-toc-modified-id="仮説クラス-13"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>仮説クラス</a></div><div class="lev1 toc-item"><a href="#最急降下法" data-toc-modified-id="最急降下法-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>最急降下法</a></div><div class="lev1 toc-item"><a href="#code(python)" data-toc-modified-id="code(python)-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>code(python)</a></div><div class="lev2 toc-item"><a href="#wをprint" data-toc-modified-id="wをprint-31"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>wをprint</a></div><div class="lev1 toc-item"><a href="#結果" data-toc-modified-id="結果-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>結果</a></div>

# Breast Cancer Wisconsin (Diagnostic) Data Set

<https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(diagnostic)>

## Attribute Information:

1. ID number 
1. Diagnosis (M = malignant, B = benign) M:悪性，B:良性
1. 3-32

Ten real-valued features are computed for each cell nucleus: 

* 半径radius (mean of distances from center to points on the perimeter) 
* テクスチャtexture (standard deviation of gray-scale values) 
* 境界の長さperimeter 
* 面積area 
* なめらかさsmoothness (local variation in radius lengths) 
* コンパクトさcompactness (perimeter^2 / area - 1.0) 
* くぼみ度合いconcavity (severity of concave portions of the contour) 
* くぼみの数concave points (number of concave portions of the contour) 
* 対称性symmetry 
* フラクタル次元fractal dimension ("coastline approximation" - 1)

http://people.idsia.ch/~juergen/deeplearningwinsMICCAIgrandchallenge.html

## 分類器
与えられた特徴ベクトル$\boldsymbol{a}$に対し，
細胞組織が悪性か良性かを分類する関数$C(\boldsymbol{y})$を選び出すプログラムを作成しよう．

## 仮説クラス
分類器は可能な分類器の集合(**仮説クラス**)から選ばれる．この場合，仮説クラスとは特徴ベクトルの空間$\mathbb{R}^D$から$\mathbb{R}$への線形関数$h(\cdot)$である．すると分類器は次のような関数として定義される．

$$
C(\boldsymbol{y}) = 
\left\{ \begin{array}{ccc}
+1 &  {\rm when} & h(\boldsymbol{y})\geq 0\\
-1 &  {\rm when} & h(\boldsymbol{y})<0
\end{array} \right.
$$

各線形関数$h:\mathbb{R}^D \rightarrow \mathbb{R}$に対して，
次のような$D$ベクトル$\boldsymbol{w}$が存在する．
$$
h(\boldsymbol{y}) = \boldsymbol{w}\cdot \boldsymbol{y}
$$
したがって，そのような線形関数を選ぶことは，結局$D$ベクトル$\boldsymbol{w}$を
選ぶことに等しい．特に，$\boldsymbol{w}$を選ぶことは，仮説クラス$h$を
選ぶことと等価なので，$\boldsymbol{w}$を**仮説ベクトル**と呼ぶ．

単に，ベクトルの掛け算で分類器はできそう．問題はどうやってこの仮説ベクトルを決定するか？ですよね．


# 最急降下法

損失関数に
$$
L(w)=\sum_{i=1}^m (a_i \cdot w - b_i)^2
$$
を選ぶと
$$
\begin{aligned}
\frac{\partial L}{\partial w_j} &= 
\sum_{i=1}^m \frac{\partial}{\partial w_j}(a_i \cdot w -b_i)^2 \\
&= \sum_{i=1}^m 2(a_i \cdot w -b_i) a_{ij}
\end{aligned}
$$
となる．
ここで，$a_{ij}$は$a_i$の$j$番目の要素です．
こいつを勾配として，local minimumを求める．

このsumはiのmまでの集計と記述していますが，テキストではデータ数の和を意図しています．jはベクトル$w$の要素となります．

# code(python)

* file:/Users/bob/Github/TeamNishitani/coding_the_matrix/codes/my_cancer_detector.py


## wをprint

In [1]:
def print_w(w):
  params = ["radius", "texture","perimeter","area",
    "smoothness","compactness","concavity","concave points",
    "symmetry","fractal dimension"];
  print("    (params)      :    (mean)    (stderr)     (worst)")
  for i, param in enumerate(params):
    print("%18s:" %param, end="")
    for j in range(3):
        print("%12.8f" % w[i*3+j], end="")
    print()

In [2]:
import numpy as np
tmp = np.fromfile('./codes/train_A.data', np.float64, -1, " ")
A = tmp.reshape(300,30)
tmp = np.fromfile('./codes/train_b.data', np.float64, -1, " ")
b = tmp.reshape(300,1)
w = np.zeros(30).reshape(30,1)
for i in range(30):
    w[i] = 0

In [3]:
loop, sigma = 300, 3.0*10**(-9)
for i in range(loop):
  dLw = A.dot(w)-b
  w = w - (dLw.transpose().dot(A)).transpose()*sigma

print_w(w)

    (params)      :    (mean)    (stderr)     (worst)
            radius:  0.00042700  0.00074182  0.00254888
           texture:  0.00168795  0.00000471  0.00000013
         perimeter: -0.00000397 -0.00000208  0.00000895
              area:  0.00000360  0.00000257  0.00007032
        smoothness:  0.00000114 -0.00088178  0.00000043
       compactness:  0.00000044  0.00000072  0.00000027
         concavity:  0.00000120  0.00000019  0.00041150
    concave points:  0.00092197  0.00239514 -0.00193279
          symmetry:  0.00000593 -0.00000375 -0.00000815
 fractal dimension: -0.00000234  0.00001156  0.00000352


# 結果

In [4]:
def show_correct_error(mA, vb, vw):
    # Diagnosis (M = malignant, B = benign) M:悪性(-1)，B:良性(1)
    correct,safe_error,critical_error=0,0,0
    predict = mA.dot(vw)
    n = vb.size
    for i in range(n):
        if predict[i]*vb[i]>0:
            correct += 1
        elif (predict[i]<0 and vb[i]>0):
            safe_error += 1
        elif (predict[i]>0 and vb[i]<0):
            critical_error += 1
    print("       correct: %4d/%4d" % (correct,n))
    print("    safe error: %4d" % safe_error)
    print("critical error: %4d" % critical_error)


In [5]:
show_correct_error(A, b, w)

       correct:  274/ 300
    safe error:    5
critical error:   21


In [6]:
tmp = np.fromfile('./codes/validate_A.data', np.float64, -1, " ")
A = tmp.reshape(260,30)
tmp = np.fromfile('./codes/validate_b.data', np.float64, -1, " ")
b = tmp.reshape(260,1)

show_correct_error(A, b, w)

       correct:  240/ 260
    safe error:   10
critical error:   10
