-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preprocessing_mel question #18
Comments
I am asking for your help. thank you. |
Hi @Kerry0123, Did you retrain the model with your preprocessing steps or did you feed your spectrograms directly to the pretrained model? |
I retrain the model with my preprocessing steps. The loss of epoch 1 is 0.66. Loss will drop to 0. I am asking for your help. thank you. |
@Kerry0123, something weird is going on because that loss is very low. What dataset are you using? The ZeroSpeech one? Also, could you share an example spectrogram so I can check if anything is odd? |
The dataset is BZNSYP(Chinese dataset),To align the output of the synthesizer with the input of the vocoder,I use the preprocessing of the tacotron2 synthesizer. Its github link: https://github.com/cnlinxi/style-token_tacotron2. |
Sure, you can send it to benjamin.l.van.niekerk@gmail.com Just to check, you kept all the other preprocessing the same e.g. mu-law encoding and all the padding stuff here? |
hi,I have doubt about the preprocessing_mel function. I use the following preprocessing method. The generated audio file is muted.
def melspectrogram(wav, hparams):
D = _stft(preemphasis(wav, hparams.preemphasis, hparams.preemphasize), hparams)
S = _amp_to_db(_linear_to_mel(np.abs(D), hparams), hparams) - hparams.ref_level_db
def _stft(y, hparams):
if hparams.use_lws: False
return _lws_processor(hparams).stft(y).T
else:
return librosa.stft(y=y, n_fft=hparams.n_fft, hop_length=get_hop_size(hparams), win_length=hparams.win_size)
librosa.stft(y, n_fft=num_fft, hop_length=hop_length, win_length=win_length)
def _linear_to_mel(spectogram, hparams):
global _mel_basis
if _mel_basis is None:
_mel_basis = _build_mel_basis(hparams)
return np.dot(_mel_basis, spectogram)
def _amp_to_db(x, hparams):
min_level = np.exp(hparams.min_level_db / 20 * np.log(10))
return 20 * np.log10(np.maximum(min_level, x))
def _normalize(S, hparams):
if hparams.allow_clipping_in_normalization: (True)
if hparams.symmetric_mels: (True)
return np.clip((2 * hparams.max_abs_value) * ((S - hparams.min_level_db) / (-hparams.min_level_db)) - hparams.max_abs_value,
-hparams.max_abs_value, hparams.max_abs_value)
else:
return np.clip(hparams.max_abs_value * ((S - hparams.min_level_db) / (-hparams.min_level_db)), 0, hparams.max_abs_value)
The main difference is “S = _amp_to_db(_linear_to_mel(np.abs(D), hparams), hparams) - hparams.ref_level_db” and _normalize,
hparams.ref_level_db =20, hparams.max_abs_value = 4;
data is [-4, 4], your preprocessing data is[0, 1]; the data range has a great influence on the model? I don't understand,I am asking for your help. thank you.
The text was updated successfully, but these errors were encountered: