<a href="https://colab.research.google.com/github/EGEG1212/TIL_AudioSpeechProcessing/blob/main/4_%ED%99%94%EC%9E%90%EB%B6%84%EB%A6%ACSpeaker_Diarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 화자분리(Speaker Diarization)
- Speaker Diarization(화자 분리)는 오디오에서 각 부분에서의 화자를 인식하는 기술 

## UIS-RNN
 - 대표적인 모델 Unbounded Interleaved-State Recurrent Neural Network(UIS-RNNMM)

## 라이브러리 설치
- uisrnn을 간편하게 구현할 수 있는 uisrnn 라이브러리가 존재
- 여기서는 uisrnn 라이브러리를 통해 uisrnn을 구현하고 학습, 평가
- 실습을 위해 uisrnn과 easydict 라이브러리를 설치

In [1]:
!pip install uisrnn easydict

Collecting uisrnn
  Downloading https://files.pythonhosted.org/packages/49/ac/240e688480bbcb541458642e36b2211ac58fa17e8c9239cc55e22a244822/uisrnn-0.1.0-py3-none-any.whl
Installing collected packages: uisrnn
Successfully installed uisrnn-0.1.0


## 데이터 다운로드
- 학습 및 평가에는 uisrnn 라이브러리에서 제공하는 sample dataset을 사용
- urlretrieve를 통해  url에서 데이터를 받아옴
- <https://github.com/google/uis-rnn/blob/master/data/toy_training_data.npz?raw=True>
- <https://github.com/google/uis-rnn/blob/master/data/toy_testing_data.npz?raw=True>

In [2]:
import urllib.request

training_url = 'https://github.com/google/uis-rnn/blob/master/data/toy_training_data.npz?raw=True'
urllib.request.urlretrieve(training_url, './toy_training_data.npz')

testing_url = 'https://github.com/google/uis-rnn/blob/master/data/toy_testing_data.npz?raw=True'
urllib.request.urlretrieve(testing_url, './toy_testing_data.npz')

('./toy_testing_data.npz', <http.client.HTTPMessage at 0x7f75a9d6e0d0>)

- 다운로드한 데이터에서 데이터를 받아오고 sequence와 label을 분리

In [3]:
import numpy as np
import uisrnn

train_data = np.load('./toy_training_data.npz', allow_pickle=True)
test_data = np.load('./toy_testing_data.npz', allow_pickle=True)

train_sequence = train_data['train_sequence']
train_cluster_id = train_data['train_cluster_id']

test_sequences = test_data['test_sequences'].tolist()
test_cluster_ids = test_data['test_cluster_ids'].tolist()

## 파라미터 설정
- model, training, inference에 필요한 인자들은 uisrnn.parse_arguments()를 통해 얻을 수 있는데...
- colab환경에서는 argument를 사용할 수 없기 때문에 easydict로 대체

In [4]:
import easydict

model_args = easydict.EasyDict({"crp_alpha": 1.0,
                                "enable_cuda": True,
                                "observation_dim": 256,
                                "rnn_depth": 1,
                                "rnn_dropout": 0.2,
                                "rnn_hidden_size": 512,
                                "sigma2": None,
                                "transition_bias": None,
                                "verbosity": 2})

training_args = easydict.EasyDict({"batch_size": 10,
                                   "enforce_cluster_id_uniqueness": True,
                                   "grad_max_norm": 5.0,
                                   "learning_rate": 0.001,
                                   "num_permutations": 10,
                                   "optimizer": 'adam',
                                   "regularization_weight": 1e-05,
                                   "sigma_alpha": 1.0,
                                   "sigma_beta": 1.0,
                                   "train_iteration": 5000})

inference_args = easydict.EasyDict({"batchsize": 100,
                                    "look_ahead": 1,
                                    "test_iteration": 2,
                                    "beam_size": 10})

## UISRNN모델 학습
- 앞서 구성한 argument를 사용해 모델을 구성
- 데이터를 입력해 모델을 훈련

In [5]:
model = uisrnn.UISRNN(model_args)

model.fit(train_sequence, train_cluster_id, training_args)

Iter: 0  	Training Loss: -282.8517    
    Negative Log Likelihood: 6.3097	Sigma2 Prior: -289.1620	Regularization: 0.0006
Iter: 10  	Training Loss: -298.2873    
    Negative Log Likelihood: 5.6249	Sigma2 Prior: -303.9128	Regularization: 0.0006
Iter: 20  	Training Loss: -311.7488    
    Negative Log Likelihood: 6.2804	Sigma2 Prior: -318.0299	Regularization: 0.0006
Iter: 30  	Training Loss: -328.1163    
    Negative Log Likelihood: 7.0495	Sigma2 Prior: -335.1664	Regularization: 0.0006
Iter: 40  	Training Loss: -343.1385    
    Negative Log Likelihood: 8.4133	Sigma2 Prior: -351.5525	Regularization: 0.0006
Iter: 50  	Training Loss: -365.6828    
    Negative Log Likelihood: 10.2739	Sigma2 Prior: -375.9574	Regularization: 0.0007
Iter: 60  	Training Loss: -402.6993    
    Negative Log Likelihood: 13.8049	Sigma2 Prior: -416.5049	Regularization: 0.0007
Iter: 70  	Training Loss: -439.7920    
    Negative Log Likelihood: 22.8929	Sigma2 Prior: -462.6856	Regularization: 0.0007
Iter: 80  	Tra

## 모델평가
- 평가 데이터를 사용해 각 sequence에 해당하는 화자id를 얻음

In [10]:
predicted_cluster_ids = []
test_record = []

for (test_sequence, test_cluster_id) in zip(test_sequences, test_cluster_ids):
  predicted_cluster_id = model.predict(test_sequence, inference_args)
  predicted_cluster_ids.append(predicted_cluster_id) #결과통계내려고
  accuracy = uisrnn.compute_sequence_match_accuracy(test_cluster_id, predicted_cluster_id)#정답레이블과 예측한id를 맞춤
  test_record.append((accuracy, len(test_cluster_id)))
  print('Ground truth labels:')
  print(test_cluster_id)
  print('Predicted labels')
  print(predicted_cluster_id)
  print('-' * 100 )

output_result = uisrnn.output_result(model_args, training_args, test_record)
print(output_result)

Ground truth labels:
['15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_2', '15_2', '15_2', '15_2', '15_2', '15_2', '15_2', '15_2', '15_2', '15_2', '15_2', '15_2', '15_0', '15_0', '15_0', '15_0', '15_0', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_2', '15_2', '15_2', '15_2', '15_2', '15_2', '15_2', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_0', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1', '15_1']
Predicted labels
[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

NameError: ignored

In [11]:
output_result = uisrnn.output_result(model_args, training_args, test_record)
print(output_result)

Config:
  sigma_alpha: 1.0
  sigma_beta: 1.0
  crp_alpha: 1.0
  learning rate: 0.001
  regularization: 1e-05
  batch size: 10

Performance:
  averaged accuracy: 0.998022
  accuracy numbers for all testing sequences:
    1.000000
    1.000000
    1.000000
    0.989362
    1.000000
    1.000000
    0.989583
    1.000000
    1.000000
    1.000000
    0.989362
    1.000000
    1.000000
    1.000000
    1.000000
    0.991597
    0.990654
    1.000000
    1.000000
    1.000000
    1.000000
    1.000000
    1.000000
    1.000000
    1.000000

