# Установка Tacotron2

- Импорт [модели из github](https://github.com/NVIDIA/tacotron2) + [замена исходных файлов](https://github.com/ViktorKrasnorutskiy/tacotron2_hifitts/tree/main/model_changed_files)

In [18]:
import os
project_path = os.getcwd().replace('\\', '/')
taco_path = project_path + '/tacotron2'

!git clone https://github.com/NVIDIA/tacotron2.git
os.chdir(taco_path)
!git submodule init
!git submodule update
os.chdir(project_path)

fatal: destination path 'tacotron2' already exists and is not an empty directory.


# Подготовка датасета.

- Скачивание датасета [hifitts](http://www.openslr.org/109/)
- Формирование общего файла с информацией по всему датасету ( -> ./dataset.json)
- Создание mel-спектограмм из аудио файлов ( -> ./tacotron2/hifitts/\*.npy)
- Просчет длины спектограмм (обучение только на тех что <1000)
- Формирование текстовых списков по датасету для обучения ( -> ./tacotron2/hifitts/train.txt + val.txt)

In [56]:
import json
import pandas as pd

hifitts_path = 'C:/Users/79581/Documents/Projects/final_project/hi_fi_tts_v0'

def read_json(json_path):
    dataset_type = json_path.split('_')[-1].replace('.json', '')
    with open(json_path, encoding='utf-8') as f:
        cond = "[" + f.read().replace("}\n{", "},\n{") + "]"
        json_data = json.loads(cond)
        for item in json_data:
            item['dataset_type'] = dataset_type
    return json_data

In [57]:
manifests = [manifest for manifest in os.listdir(hifitts_path) if 'manifest' in manifest]
manifest_paths = [f'{hifitts_path}/{manifest}' for manifest in manifests]
manifest_jsons = [read_json(manifest_path) for manifest_path in manifest_paths]
manifest_dfs = [pd.DataFrame(manifest_json) for manifest_json in manifest_jsons]
manifests_df = pd.concat(manifest_dfs, axis=0)
manifests_df.head()

Unnamed: 0,audio_filepath,text,duration,text_no_preprocessing,text_normalized,dataset_type
0,audio/11614_other/12352/prideofjennico_01_cast...,some decision,1.03,"some decision,","some decision,",dev
1,audio/11614_other/12352/prideofjennico_01_cast...,i fear me that those around him then did not f...,7.96,I fear me that those around him then did not f...,I fear me that those around him then did not f...,dev
2,audio/11614_other/12352/prideofjennico_02_cast...,to keep myself something in countenance despit...,10.86,To keep myself something in countenance despit...,To keep myself something in countenance despit...,dev
3,audio/11614_other/12352/prideofjennico_03_cast...,under my gaze,1.06,"under my gaze,","under my gaze,",dev
4,audio/11614_other/12352/prideofjennico_04_cast...,in the vineyards,0.93,"In the vineyards,","In the vineyards,",dev


In [58]:
df = manifests_df.reset_index(drop=True).copy()
df['reader_id'] = df['audio_filepath'].apply(lambda x: x.split('/')[1].split('_')[0])
df['data_quality'] = df['audio_filepath'].apply(lambda x: x.split('/')[1].split('_')[1])
df['book_id'] = df['audio_filepath'].apply(lambda x: x.split('/')[2])
df['book_chapter'] = df['audio_filepath'].apply(lambda x: x.split('/')[3].replace('.flac', ''))
df['mel_path'] = 'mels/' + df.index.astype('string') + '_' + df['dataset_type'] + '_' + df['reader_id']
readers_list = [reader_id for reader_id in df.reader_id.unique()]
readers_dict = {reader_id: str(readers_list.index(reader_id)) for reader_id in readers_list}
df['reader_id_norm'] = df['reader_id'].apply(lambda x: readers_dict[x])
df['txt_line'] = df['mel_path'] + '|' + df['text'] + '|' + df['reader_id_norm'] + '\n'
df.head()

Unnamed: 0,audio_filepath,text,duration,text_no_preprocessing,text_normalized,dataset_type,reader_id,data_quality,book_id,book_chapter,mel_path,reader_id_norm,txt_line
0,audio/11614_other/12352/prideofjennico_01_cast...,some decision,1.03,"some decision,","some decision,",dev,11614,other,12352,prideofjennico_01_castle_0028,mels/0_dev_11614,0,mels/0_dev_11614|some decision|0\n
1,audio/11614_other/12352/prideofjennico_01_cast...,i fear me that those around him then did not f...,7.96,I fear me that those around him then did not f...,I fear me that those around him then did not f...,dev,11614,other,12352,prideofjennico_01_castle_0119,mels/1_dev_11614,0,mels/1_dev_11614|i fear me that those around h...
2,audio/11614_other/12352/prideofjennico_02_cast...,to keep myself something in countenance despit...,10.86,To keep myself something in countenance despit...,To keep myself something in countenance despit...,dev,11614,other,12352,prideofjennico_02_castle_0407,mels/2_dev_11614,0,mels/2_dev_11614|to keep myself something in c...
3,audio/11614_other/12352/prideofjennico_03_cast...,under my gaze,1.06,"under my gaze,","under my gaze,",dev,11614,other,12352,prideofjennico_03_castle_0044,mels/3_dev_11614,0,mels/3_dev_11614|under my gaze|0\n
4,audio/11614_other/12352/prideofjennico_04_cast...,in the vineyards,0.93,"In the vineyards,","In the vineyards,",dev,11614,other,12352,prideofjennico_04_castle_0087,mels/4_dev_11614,0,mels/4_dev_11614|in the vineyards|0\n


In [59]:
import numpy as np
import soundfile as sf
import torch

import sys
sys.path.append(taco_path)
from tacotron2 import layers
from tacotron2.hparams import create_hparams

In [61]:
_hparams = create_hparams()

_stft = layers.TacotronSTFT(
    _hparams.filter_length, _hparams.hop_length, _hparams.win_length,
    _hparams.n_mel_channels, _hparams.sampling_rate, _hparams.mel_fmin,
    _hparams.mel_fmax)

if 'hifitts' not in os.listdir(taco_path):
        os.mkdir(taco_path + '/hifitts')
        os.mkdir(taco_path + '/hifitts/mels')
        
def flac_to_mel(line_for_create_mel):
    
    audio_path, mel_path = line_for_create_mel.split('&')[0], line_for_create_mel.split('&')[1]
    
    def _load_flac_to_torch(audio_path):
        data, sampling_rate = sf.read(audio_path)
        return torch.FloatTensor(data.astype(np.float32)), sampling_rate

    def _get_mel(audio_path):
        
        audio, sampling_rate = _load_flac_to_torch(audio_path)
        if sampling_rate != _stft.sampling_rate:
            raise ValueError("{} {} SR doesn't match target {} SR".format(sampling_rate, _stft.sampling_rate))
        audio_norm = audio / _hparams.max_wav_value
        audio_norm = audio_norm.unsqueeze(0)
        audio_norm = torch.autograd.Variable(audio_norm, requires_grad=False)
        melspec = _stft.mel_spectrogram(audio_norm)
        melspec = torch.squeeze(melspec, 0)
        return melspec
    
    load_audio_path = hifitts_path + '/' + audio_path
    save_mel_path = taco_path + '/hifitts/' + mel_path
    
    melspec = _get_mel(load_audio_path)
    
    np.save(save_mel_path, melspec)

In [None]:
df['line_for_create_mel'] = df['audio_filepath'] + '&' + df['mel_path']
df['line_for_create_mel'].apply(lambda x: flac_to_mel(x))

In [76]:
def get_mel_size(mel_path):
    melspec = np.load(taco_path + '/hifitts/' + mel_path + '.npy')
    melspec_size = melspec.shape[1]
    return melspec_size

In [77]:
df['melspec_size'] = df['mel_path'].apply(lambda x: get_mel_size(x))

In [78]:
df.head()

Unnamed: 0,audio_filepath,text,duration,text_no_preprocessing,text_normalized,dataset_type,reader_id,data_quality,book_id,book_chapter,mel_path,reader_id_norm,txt_line,line_for_create_mel,melspec_size
0,audio/11614_other/12352/prideofjennico_01_cast...,some decision,1.03,"some decision,","some decision,",dev,11614,other,12352,prideofjennico_01_castle_0028,mels/0_dev_11614,0,mels/0_dev_11614|some decision|0\n,audio/11614_other/12352/prideofjennico_01_cast...,178
1,audio/11614_other/12352/prideofjennico_01_cast...,i fear me that those around him then did not f...,7.96,I fear me that those around him then did not f...,I fear me that those around him then did not f...,dev,11614,other,12352,prideofjennico_01_castle_0119,mels/1_dev_11614,0,mels/1_dev_11614|i fear me that those around h...,audio/11614_other/12352/prideofjennico_01_cast...,1372
2,audio/11614_other/12352/prideofjennico_02_cast...,to keep myself something in countenance despit...,10.86,To keep myself something in countenance despit...,To keep myself something in countenance despit...,dev,11614,other,12352,prideofjennico_02_castle_0407,mels/2_dev_11614,0,mels/2_dev_11614|to keep myself something in c...,audio/11614_other/12352/prideofjennico_02_cast...,1871
3,audio/11614_other/12352/prideofjennico_03_cast...,under my gaze,1.06,"under my gaze,","under my gaze,",dev,11614,other,12352,prideofjennico_03_castle_0044,mels/3_dev_11614,0,mels/3_dev_11614|under my gaze|0\n,audio/11614_other/12352/prideofjennico_03_cast...,183
4,audio/11614_other/12352/prideofjennico_04_cast...,in the vineyards,0.93,"In the vineyards,","In the vineyards,",dev,11614,other,12352,prideofjennico_04_castle_0087,mels/4_dev_11614,0,mels/4_dev_11614|in the vineyards|0\n,audio/11614_other/12352/prideofjennico_04_cast...,161


In [80]:
def save_df(df):
    save_path = project_path + '/dataset.csv'
    df.to_csv(save_path)

df['line_for_create_txt'] = df['dataset_type'] + '&' + df['txt_line']
save_df(df)

In [81]:
def create_txt(line_for_create_txt):
    dataset_type, txt_line = line_for_create_txt.split('&')[0], line_for_create_txt.split('&')[1]
    with open(taco_path + '/hifitts/' + dataset_type + '.txt', 'a') as f:
        f.write(txt_line)

df_below_1000 = df[df['melspec_size'] < 1000].copy()
df_below_1000['line_for_create_txt'].apply(lambda x: create_txt(x))

0         None
3         None
4         None
5         None
6         None
          ... 
323972    None
323973    None
323974    None
323975    None
323976    None
Name: line_for_create_txt, Length: 292622, dtype: object

In [83]:
print('Исключено', df.shape[0]-df_below_1000.shape[0], 'файлов')

Исключено 31356 файлов


[zip архив датасета](https://drive.google.com/file/d/1a0Xd6NxuErgdfsphZgZ7j_bUmGzZn5dX/view?usp=sharing)