### 学習用 : チューニング用 : テスト用 のデータ生成

データ数　　　　　　168　個  

学習用データ　　　　119　個  
チューニングデータ　35 　 個  
テストデータ　　　　14　  個  

学習 : チューニング : 評価 = 17 : 5 : 2

学習用データは　random.seedによって生成するものを固定する

In [22]:
import random

random.seed(5)

piano_notes = ['do', 're', 'mi', 'fa', 'so', 'ra', 'si']
piano_all_sounds = list(range(24))

piano_train_sounds = random.sample(piano_all_sounds, 17)

set_tune = set(piano_all_sounds) - set(piano_train_sounds)
piano_tune_sounds = random.sample(list(set_tune), 5)

set_test = set(piano_all_sounds) - set(piano_train_sounds) - set(piano_tune_sounds)
piano_test_sounds = random.sample(list(set_test), 2)

print("all_sounds : {}".format(sorted(piano_all_sounds)))
print("train_sounds : {}".format(sorted(piano_train_sounds)))
print("tune_sounds : {}".format(sorted(piano_tune_sounds)))
print("test_sounds : {}".format(sorted(piano_test_sounds)))




all_sounds : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
train_sounds : [0, 1, 2, 3, 5, 6, 7, 8, 11, 12, 14, 15, 16, 17, 19, 20, 23]
tune_sounds : [4, 9, 13, 18, 21]
test_sounds : [10, 22]


### 前処理

特徴量の選択にmfccを選択  
mfccは173次元で構成されている


In [23]:
import scipy.io.wavfile as wav
import librosa
from sklearn.svm import SVC
import numpy

def get_mfcc(fname):
    y, sr = librosa.load(fname)
    return librosa.feature.mfcc(y, sr)

if __name__  ==  '__main__':

    piano_note_training = []
    piano_sound_training = []

    for piano_note in piano_notes:
        print('Reading data of {}...'.format(piano_note))
        for piano_sound in piano_train_sounds:
            
            # get mfcc 173次元
            mfcc = get_mfcc('{}/{}{}.wav'.format(piano_note, piano_note, piano_sound))
            piano_sound_training.append(mfcc.T)
            
            label = numpy.full((mfcc.shape[1], ), 
                               piano_notes.index(piano_note), dtype=numpy.int)
            piano_note_training.append(label)
    
    piano_sound_training = numpy.concatenate(piano_sound_training)
    piano_note_training = numpy.concatenate(piano_note_training)
    print('done.\n')

Reading data of do...
Reading data of re...
Reading data of mi...
Reading data of fa...
Reading data of so...
Reading data of ra...
Reading data of si...
done.



### パラメータチューニング &　テストの評価

チューニングの検証とテストの評価は同時に行うべきとの指摘のため一緒に実行  
1e-1〜1e-7までのgamma値を設定しより良い評価を探す  



In [24]:
# gamma値の選択

gamma_list = [1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7]

for gamma in gamma_list:
    print('\n----- gamma={} -----\n'.format(gamma))
    svc = SVC(gamma = gamma)
    svc.fit(piano_sound_training, piano_note_training)
    print('----- Learning Done -----\n')

    # 正答率
    sounds_num = 0
    correct_sounds = 0
    correct_rate = 0.0

    print("----- tune -----")
    for piano_note in piano_notes:
        for piano_sound in piano_tune_sounds:
            sounds_num += 1
            mfcc = get_mfcc('{}/{}{}.wav'.format(piano_note, piano_note,piano_sound))
            prediction = svc.predict(mfcc.T)
            counts = numpy.bincount(prediction) 
            result = piano_notes[numpy.argmax(counts)] # 音階の判定
            original_title = '{}'.format(piano_note)

            if result == original_title:
                correct_sounds += 1
                
    correct_rate = correct_sounds / sounds_num
    print('{} : correct rate : {}%.'.format(gamma,correct_rate*100))
    
    # 正答率
    sounds_num = 0
    correct_sounds = 0
    correct_rate = 0.0
    
    print("----- test -----")
    for piano_note in piano_notes:
        for piano_sound in piano_test_sounds:
            sounds_num += 1
            mfcc = get_mfcc('{}/{}{}.wav'.format(piano_note, piano_note,piano_sound))
            prediction = svc.predict(mfcc.T)
            counts = numpy.bincount(prediction) 
            result = piano_notes[numpy.argmax(counts)] # 音程の判定
            original_title = '{}'.format(piano_note)

            if result == original_title:
                correct_sounds += 1

    correct_rate = correct_sounds / sounds_num
    print('{} correct rate : {}%\n\n.'.format(gamma, correct_rate*100))


----- gamma=0.1 -----

----- Learning Done -----

----- tune -----
0.1 : correct rate : 48.57142857142857%.
----- test -----
0.1 correct rate : 42.857142857142854%

.

----- gamma=0.01 -----

----- Learning Done -----

----- tune -----
0.01 : correct rate : 74.28571428571429%.
----- test -----
0.01 correct rate : 50.0%

.

----- gamma=0.001 -----

----- Learning Done -----

----- tune -----
0.001 : correct rate : 94.28571428571428%.
----- test -----
0.001 correct rate : 100.0%

.

----- gamma=0.0001 -----

----- Learning Done -----

----- tune -----
0.0001 : correct rate : 91.42857142857143%.
----- test -----
0.0001 correct rate : 92.85714285714286%

.

----- gamma=1e-05 -----

----- Learning Done -----

----- tune -----
1e-05 : correct rate : 77.14285714285715%.
----- test -----
1e-05 correct rate : 64.28571428571429%

.

----- gamma=1e-06 -----

----- Learning Done -----

----- tune -----
1e-06 : correct rate : 42.857142857142854%.
----- test -----
1e-06 correct rate : 21.4285714285

### 感想

gamma = 1e-3がとても良い結果を出すことがわかった  
今度は機械音声にも通用するか試してみたい  
検証とテストはモデル保存をしないとDNNでは値が大きく変わるため注意したい  




検証したgamma値のまま評価も一緒に行う
分けるならモデルをpklで保存をしないといけない