# 聲納數據集 [數據下載](https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks)
### 輸入60個變數，為聲納在不同角度的返回強度，來自於「礦石」或是「岩石」

### 你會學到
1. 如何改善提高精確度的兩個方法
2. 使用Pipeline
3. Relu Activation 缺失

In [1]:
import numpy as np
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

Using TensorFlow backend.


In [2]:
seed = 87
np.random.seed(seed)

In [3]:
df = read_csv("sonar.all-data.txt", header=None)

In [4]:
dataset = df.values
# split into input and output variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]

In [5]:
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

## 建模

#### Q: 輸出為什麼不用2 卻使用1？這也會涉及到loss函數


In [7]:
# define baseline model
def create_baseline():
  # create model
    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_initializer= 'normal' , activation= 'relu' ))
    model.add(Dense(1, kernel_initializer= 'normal' , activation= 'sigmoid' ))
  # Compile model
    model.compile(loss= 'binary_crossentropy' , optimizer= 'adam' , metrics=[ 'accuracy' ])
    return model

In [38]:
# evaluate model with standardized dataset
estimator = KerasClassifier(build_fn=create_baseline, epochs=100, batch_size=8, verbose=0)
kfold = KFold(n_splits=5, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Baseline: 79.35% (4.84%)


# 改善表現，從數據準備下手

    - StandardScaler()
    - Relu activation 從下圖知道，如果初始權重為常態分佈N(0,1)，會把一半數據歸零(就是無意義數據)

![](http://cs231n.github.io/assets/nn1/relu.jpeg)

### 正規化

In [8]:
# evaluate baseline model with standardized dataset
estimators = []
estimators.append(( 'standardize' , StandardScaler()))
estimators.append(( 'mlp' , KerasClassifier(build_fn=create_baseline, epochs=100,
    batch_size=8, verbose=0)))

In [9]:
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=5, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Standardized: 85.15% (4.77%)


### 初始值微移

In [35]:
from keras import initializers

def weight_baseline():
  # create model
    model = Sequential()
    model.add(Dense(60, input_dim=60, 
                    kernel_initializer=initializers.RandomNormal(mean=0.0001, stddev=0.5, seed=87), 
                    activation= 'relu' ))
    
    model.add(Dense(1, kernel_initializer=initializers.RandomNormal(mean=0.0001, stddev=0.5, seed=87), 
                    activation= 'sigmoid' ))
  # Compile model
    model.compile(loss= 'binary_crossentropy' , optimizer= 'adam' , metrics=[ 'accuracy' ])
    return model

In [36]:
estimator = KerasClassifier(build_fn=weight_baseline, epochs=100, batch_size=8, verbose=0)
kfold = KFold(n_splits=5, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("InitialWeight: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

InitialWeight: 84.16% (2.75%)


### 正規化，以及微調初始權重，兩者皆有效提高正確率。

- 請自行微調 RandomNormal(mean=0.001, stddev=1, seed=None) 裡面數據，想看看為什麼



## 結合正規化，微調權重

In [None]:
estimators = []
estimators.append(( 'standardize' , StandardScaler()))
estimators.append(( 'mlp' , KerasClassifier(build_fn=create_baseline, 
                                            epochs=100, 
                                            batch_size=8, 
                                            verbose=0)))
pipeline = Pipeline(estimators)

kfold = KFold(n_splits=5, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)

print("Standardized + Weight: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

- ? 不升卻微降