Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preprocessing_mel question #18

Closed
Kerry0123 opened this issue Sep 15, 2020 · 6 comments
Closed

preprocessing_mel question #18

Kerry0123 opened this issue Sep 15, 2020 · 6 comments

Comments

@Kerry0123
Copy link

Kerry0123 commented Sep 15, 2020

hi,I have doubt about the preprocessing_mel function. I use the following preprocessing method. The generated audio file is muted.

def melspectrogram(wav, hparams):
D = _stft(preemphasis(wav, hparams.preemphasis, hparams.preemphasize), hparams)
S = _amp_to_db(_linear_to_mel(np.abs(D), hparams), hparams) - hparams.ref_level_db

np.dot(mel_basis, S)

if hparams.signal_normalization:
	return _normalize(S, hparams)
return S

def _stft(y, hparams):
if hparams.use_lws: False
return _lws_processor(hparams).stft(y).T
else:
return librosa.stft(y=y, n_fft=hparams.n_fft, hop_length=get_hop_size(hparams), win_length=hparams.win_size)
librosa.stft(y, n_fft=num_fft, hop_length=hop_length, win_length=win_length)
def _linear_to_mel(spectogram, hparams):
global _mel_basis
if _mel_basis is None:
_mel_basis = _build_mel_basis(hparams)
return np.dot(_mel_basis, spectogram)

def _amp_to_db(x, hparams):
min_level = np.exp(hparams.min_level_db / 20 * np.log(10))
return 20 * np.log10(np.maximum(min_level, x))

np.exp(-100 / 20 * np.log(10))

min_level = 10**(-100 / 20)
return 20 * np.log10(np.maximum(min_level, x))

def _normalize(S, hparams):
if hparams.allow_clipping_in_normalization: (True)
if hparams.symmetric_mels: (True)
return np.clip((2 * hparams.max_abs_value) * ((S - hparams.min_level_db) / (-hparams.min_level_db)) - hparams.max_abs_value,
-hparams.max_abs_value, hparams.max_abs_value)
else:
return np.clip(hparams.max_abs_value * ((S - hparams.min_level_db) / (-hparams.min_level_db)), 0, hparams.max_abs_value)

The main difference is “S = _amp_to_db(_linear_to_mel(np.abs(D), hparams), hparams) - hparams.ref_level_db” and _normalize,
hparams.ref_level_db =20, hparams.max_abs_value = 4;
data is [-4, 4], your preprocessing data is[0, 1]; the data range has a great influence on the model? I don't understand,I am asking for your help. thank you.

@Kerry0123
Copy link
Author

I am asking for your help. thank you.

@bshall
Copy link
Owner

bshall commented Sep 15, 2020

Hi @Kerry0123,

Did you retrain the model with your preprocessing steps or did you feed your spectrograms directly to the pretrained model?

@Kerry0123
Copy link
Author

I retrain the model with my preprocessing steps. The loss of epoch 1 is 0.66. Loss will drop to 0. I am asking for your help. thank you.

@bshall
Copy link
Owner

bshall commented Sep 16, 2020

@Kerry0123, something weird is going on because that loss is very low. What dataset are you using? The ZeroSpeech one? Also, could you share an example spectrogram so I can check if anything is odd?

@Kerry0123
Copy link
Author

The dataset is BZNSYP(Chinese dataset),To align the output of the synthesizer with the input of the vocoder,I use the preprocessing of the tacotron2 synthesizer. Its github link: https://github.com/cnlinxi/style-token_tacotron2.
python preprocess.py --dataset=biaobei --base_dir=/tmp-data/data/ --output=/nfs/volume-340-1/tts_data_preprocess/training_data_biaobe.
Is it convenient to tell me your email address? I send you mel file. I am asking for your help. thank you.

@bshall
Copy link
Owner

bshall commented Sep 16, 2020

Sure, you can send it to benjamin.l.van.niekerk@gmail.com

Just to check, you kept all the other preprocessing the same e.g. mu-law encoding and all the padding stuff here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants