Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add integration test for asr egs #114

Merged
merged 74 commits into from
Nov 15, 2019
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
80e0618
fix integration with ops feat
zh794390558 Sep 26, 2019
c16f21d
fix for interation test
zh794390558 Sep 26, 2019
152c792
add speech utils
zh794390558 Sep 26, 2019
95c3477
fix path of FE shell
Sep 27, 2019
28dfbf3
add kaldi prepare
Sep 27, 2019
3cbfa83
fix kaldi tools compile
zh794390558 Sep 27, 2019
5e7131e
format
zh794390558 Sep 27, 2019
813e1ea
fix path
Sep 27, 2019
dd71db4
Merge branch 'ci' of https://github.com/didi/delta into ci
Sep 27, 2019
fc56c06
add tools for kaldi
zh794390558 Sep 27, 2019
6ecf309
Merge branch 'ci' of https://github.com/didi/delta into ci
Sep 29, 2019
fc4fbb0
Merge branch 'ci' of https://github.com/didi/delta into ci
Sep 29, 2019
1b7402f
Merge branch 'ci' of https://github.com/didi/delta into ci
Sep 29, 2019
8d88070
fix kaldi install
zh794390558 Sep 30, 2019
39e1164
update hparam.py
GaryGao99 Sep 30, 2019
13d563a
add make stft features
Oct 8, 2019
ee42e09
add cmvn
Oct 8, 2019
d098564
add snip_edges to stft
Oct 11, 2019
b4fbe3f
Merge branch 'master' into ci
zh794390558 Oct 11, 2019
7702415
change delta_delta shape
Oct 12, 2019
dd5c76c
fix fbank as kaldi
Oct 14, 2019
80826f5
Merge branch 'ci' of https://github.com/didi/delta into ci
Oct 14, 2019
3a4514b
set default fbank features as kaldi
Oct 15, 2019
79bf62b
fix spectrum_test real value
Oct 15, 2019
23b4810
rm do_preemphasis2 and fix fbank_test
Oct 15, 2019
18f5f92
fix delta_delta shape
Oct 16, 2019
666f928
fix sample rate and audio data format
Oct 16, 2019
e9d1f4e
fix sr and test
Oct 16, 2019
c39bf8b
fix high-freq
Oct 22, 2019
4813aec
Merge branch 'master' into ci
zh794390558 Oct 23, 2019
232d4af
Merge branch 'master' into ci
zh794390558 Oct 30, 2019
669f991
change MAIN_ROOT to PACKAGE_ROOT_DIR
Oct 30, 2019
0c42185
Merge branch 'master' into ci
zh794390558 Oct 30, 2019
f9ff8a5
fix make_fbank on tf2.0
Oct 30, 2019
6e70b00
Merge branch 'ci' of https://github.com/didi/delta into ci
Oct 30, 2019
d524482
fix apply-cmvn
Nov 1, 2019
329e793
fix dump.sh
Nov 7, 2019
61e1688
delete print
Nov 7, 2019
3bb78b4
add mfcc features
Nov 8, 2019
5541c8a
add mfcc FE
Nov 8, 2019
b9dbec4
add make_mfcc.sh
Nov 11, 2019
4e25833
add add_noise_rir_aecres
Nov 11, 2019
789c363
fix tf import
Nov 11, 2019
34cbd44
fix loader setting
Nov 11, 2019
badc0c7
fix import get_session()
Nov 11, 2019
751e16a
fix kaldi tools install script
zh794390558 Nov 11, 2019
70b1660
fix ci
zh794390558 Nov 11, 2019
ece530f
fix plp_test
Nov 11, 2019
ddb51df
Merge branch 'ci' of https://github.com/didi/delta into ci
Nov 11, 2019
ecc8aba
Merge branch 'master' into ci
zh794390558 Nov 11, 2019
c3a9b92
fix kaldi install
zh794390558 Nov 11, 2019
d33ab52
fix
zh794390558 Nov 11, 2019
60434fd
fix makefile
Nov 11, 2019
0e8eb0f
Merge branch 'ci' of https://github.com/didi/delta into ci
Nov 11, 2019
b8ed921
fix apt tools
zh794390558 Nov 11, 2019
07ae1dd
fix kaldi install
zh794390558 Nov 11, 2019
296139d
fix
zh794390558 Nov 11, 2019
e4290bf
fix test
Nov 11, 2019
9eac2ce
Merge branch 'ci' of https://github.com/didi/delta into ci
Nov 11, 2019
b3c7457
Merge branch 'ci' of https://github.com/didi/delta into ci
Nov 11, 2019
53d93bb
fix format && params
Nov 11, 2019
be206de
fix old params
Nov 12, 2019
1a6af63
fix path error
Nov 15, 2019
62c0db6
delete old FE test files
Nov 15, 2019
49a65a2
fix delete test files
Nov 15, 2019
b69685e
Merge branch 'ci' of https://github.com/didi/delta into ci
Nov 15, 2019
ac239e0
fix test files
Nov 15, 2019
d19bdf0
fix dpl spk examples
zh794390558 Nov 15, 2019
a040ef4
fix
Nov 15, 2019
4f5c362
fix file mode
zh794390558 Nov 15, 2019
8d98607
fix get_session import
Nov 15, 2019
0a0d826
Merge branch 'ci' of https://github.com/didi/delta into ci
Nov 15, 2019
eaecf1e
fix test
zh794390558 Nov 15, 2019
749ee7d
format
zh794390558 Nov 15, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion delta/data/frontend/analyfiltbank.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def params(cls, config=None):

return hparams

def call(self, audio_data, sample_rate):
def call(self, audio_data, sample_rate=None):
"""
Caculate power spectrum and phase spectrum of audio data.
:param audio_data: the audio signal from which to compute spectrum. Should be an (1, N) tensor.
Expand Down
185 changes: 185 additions & 0 deletions delta/data/frontend/cmvn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# Copyright (C) 2017 Beijing Didi Infinity Technology and Development Co.,Ltd.
# All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

import io
import kaldiio
import numpy as np
from delta.utils.hparam import HParams
from delta.data.frontend.base_frontend import BaseFrontend

# This version is efficient, but without hparams.
# class CMVN(object):
# def __init__(self, stats, norm_means=True, norm_vars=False,
# utt2spk=None, spk2utt=None, reverse=False, std_floor=1.0e-20):
# self.stats_file = stats
# self.norm_means = norm_means
# self.norm_vars = norm_vars
# self.reverse = reverse
#
# if isinstance(stats, dict):
# stats_dict = dict(stats)
# else:
# self.accept_uttid = True
# stats_dict = dict(kaldiio.load_ark(stats))
#
# if utt2spk is not None:
# self.utt2spk = {}
# with io.open(utt2spk, 'r', encoding='utf-8') as f:
# for line in f:
# utt, spk = line.rstrip().split(None, 1)
# self.utt2spk[utt] = spk
# elif spk2utt is not None:
# self.utt2spk = {}
# with io.open(spk2utt, 'r', encoding='utf-8') as f:
# for line in f:
# spk, utts = line.rstrip().split(None, 1)
# for utt in utts.split():
# self.utt2spk[utt] = spk
# else:
# self.utt2spk = None
#
# self.bias = {}
# self.scale = {}
# for spk, stats in stats_dict.items():
# assert len(stats) == 2, stats.shape
#
# count = stats[0, -1]
#
# if not (np.isscalar(count) or isinstance(count, (int, float))):
# count = count.flatten()[0]
#
# mean = stats[0, :-1] / count
# var = stats[1, :-1] / count - mean * mean
# std = np.maximum(np.sqrt(var), std_floor)
# self.bias[spk] = -mean
# self.sacle[spk] = 1 / std
#
# def __repr__(self):
# return ('{name}(stats_file={stats_file}, '
# 'norm_means={norm_means}, norm_vars={norm_vars}, '
# 'reverse={reverse})'
# .format(name=self.__class__.__name__,
# stats_file=self.stats_file,
# norm_means=self.norm_means,
# norm_vars=self.norm_vars,
# reverse=self.reverse))
#
# def __call__(self, x, uttid=None):
# if self.utt2spk is not None:
# spk = self.utt2spk[uttid]
# else:
# spk = uttid
#
# if not self.reverse:
# if self.norm_means:
# x = np.add(x, self.bias[spk])
# if self.norm_vars:
# x = np.multiply(x, self.scale[spk])
# else:
# if self.norm_means:
# x = np.subtract(x, self.bias[spk])
# if self.norm_vars:
# x = np.divide(x, self.scale[spk])
#
# return x
zh794390558 marked this conversation as resolved.
Show resolved Hide resolved

class CMVN(BaseFrontend):

def __init__(self, config: dict):
super().__init__(config)

@classmethod
def params(cls, config=None):

norm_means = True
norm_vars = False
utt2spk = None
spk2utt = None
reverse = False
std_floor = 1.0e-20

hparams = HParams(cls=cls)
hparams.add_hparam('norm_means', norm_means)
hparams.add_hparam('norm_vars', norm_vars)
hparams.add_hparam('utt2spk', utt2spk)
hparams.add_hparam('spk2utt', spk2utt)
hparams.add_hparam('reverse', reverse)
hparams.add_hparam('std_floor', std_floor)

if config is not None:
hparams.override_from_dict(config)

return hparams

def call(self, stats, x, uttid=None):

p = self.config

# The way is not efficient.
if isinstance(stats, dict):
stats_dict = dict(stats)
else:
stats_dict = dict(kaldiio.load_ark(stats))

if p.utt2spk is not None:
self.utt2spk = {}
with io.open(p.utt2spk, 'r', encoding='utf-8') as f:
for line in f:
utt, spk = line.rstrip().split(None, 1)
self.utt2spk[utt] = spk
elif p.spk2utt is not None:
self.utt2spk = {}
with io.open(p.spk2utt, 'r', encoding='utf-8') as f:
for line in f:
spk, utts = line.rstrip().split(None, 1)
for utt in utts.split():
self.utt2spk[utt] = spk
else:
self.utt2spk = None

self.bias = {}
self.scale = {}
for spk, stats in stats_dict.items():
assert len(stats) == 2, stats.shape

count = stats[0, -1]

if not (np.isscalar(count) or isinstance(count, (int, float))):
count = count.flatten()[0]

mean = stats[0, :-1] / count
var = stats[1, :-1] / count - mean * mean
std = np.maximum(np.sqrt(var), p.std_floor)
self.bias[spk] = -mean
self.sacle[spk] = 1 / std

if self.utt2spk is not None:
spk = self.utt2spk[uttid]
else:
spk = uttid

if not p.reverse:
if p.norm_means:
x = np.add(x, self.bias[spk])
if p.norm_vars:
x = np.multiply(x, self.scale[spk])
else:
if p.norm_means:
x = np.subtract(x, self.bias[spk])
if p.norm_vars:
x = np.divide(x, self.scale[spk])

return x
2 changes: 1 addition & 1 deletion delta/data/frontend/delta_delta.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def call(self, feat, order, window):
:param feat: a float tensor of size (num_frames, dim_feat).
:param order: an int.
:param window: an int.
:return: A tensor with shape (num_frames, (dim_feat * (order + 1))),
:return: A tensor with shape (num_frames, ((order + 1) * dim_feats)),
containing delta of features of every frame in speech.
"""

Expand Down
15 changes: 13 additions & 2 deletions delta/data/frontend/fbank.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,19 @@ def params(cls, config=None):
:return: An object of class HParams, which is a set of hyperparameters as name-value pairs.
"""

upper_frequency_limit = 4000.0
upper_frequency_limit = 8000.0
lower_frequency_limit = 20.0
filterbank_channel_count = 40.0
filterbank_channel_count = 23.0
zh794390558 marked this conversation as resolved.
Show resolved Hide resolved
window_length = 0.025
frame_length = 0.010
output_type = 2
sample_rate = 16000.0
snip_edges = 2
raw_energy = 1
preEph_coeff = 0.97
window_type = 'povey'
remove_dc_offset = True


hparams = HParams(cls=cls)
hparams.add_hparam('upper_frequency_limit', upper_frequency_limit)
Expand All @@ -55,6 +61,11 @@ def params(cls, config=None):
hparams.add_hparam('frame_length', frame_length)
hparams.add_hparam('output_type', output_type)
hparams.add_hparam('sample_rate', sample_rate)
hparams.add_hparam('snip_edges', snip_edges)
hparams.add_hparam('raw_energy', raw_energy)
hparams.add_hparam('preEph_coeff', preEph_coeff)
zh794390558 marked this conversation as resolved.
Show resolved Hide resolved
hparams.add_hparam('window_type', window_type)
hparams.add_hparam('remove_dc_offset', remove_dc_offset)

if config is not None:
hparams.override_from_dict(config)
Expand Down
2 changes: 1 addition & 1 deletion delta/data/frontend/fbank_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def test_fbank(self):
with self.session():
read_wav = ReadWav.params().instantiate()
input_data, sample_rate = read_wav(wav_path)
config = {'window_length': 0.025, 'output_type': 1, 'frame_length': 0.010}
config = {'window_length': 0.025, 'output_type': 1, 'frame_length': 0.010, 'snip_edges': 1}
fbank = Fbank.params(config).instantiate()
fbank_test = fbank(input_data, sample_rate)

Expand Down
17 changes: 16 additions & 1 deletion delta/data/frontend/spectrum.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,22 @@ def params(cls, config=None):
frame_length = 0.010
output_type = 2
sample_rate = 16000.0
snip_edges = 2
raw_energy = 1
preEph_coeff = 0.97
window_type = 'povey'
remove_dc_offset = True

hparams = HParams(cls=cls)
hparams.add_hparam('window_length', window_length)
hparams.add_hparam('frame_length', frame_length)
hparams.add_hparam('output_type', output_type)
hparams.add_hparam('sample_rate', sample_rate)
hparams.add_hparam('snip_edges', snip_edges)
hparams.add_hparam('raw_energy', raw_energy)
hparams.add_hparam('preEph_coeff', preEph_coeff)
hparams.add_hparam('window_type', window_type)
hparams.add_hparam('remove_dc_offset', remove_dc_offset)

if config is not None:
hparams.override_from_dict(config)
Expand Down Expand Up @@ -75,6 +85,11 @@ def call(self, audio_data, sample_rate=None):
sample_rate,
window_length=p.window_length,
frame_length=p.frame_length,
output_type=p.output_type)
output_type=p.output_type,
snip_edges=p.snip_edges,
raw_energy=p.raw_energy,
preEph_coeff=p.preEph_coeff,
window_type=p.window_type,
remove_dc_offset=p.remove_dc_offset)

return spectrum
2 changes: 1 addition & 1 deletion delta/data/frontend/spectrum_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def test_spectrum(self):
read_wav = ReadWav.params().instantiate()
input_data, sample_rate = read_wav(wav_path)

spectrum = Spectrum.params({'window_length': 0.025}).instantiate()
spectrum = Spectrum.params({'window_length': 0.025, 'snip_edges': 1}).instantiate()
spectrum_test = spectrum(input_data, sample_rate)

output_true = np.array(
Expand Down
4 changes: 2 additions & 2 deletions delta/layers/ops/kernels/delta_delta.cc
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ void DeltaDelta::Compute(const Tensor& input_feats, int frame,

int num_frames = input_feats.dim_size(0);
int feat_dim = input_feats.dim_size(1);
int output_dim = feat_dim * (order_ + 1);
int output_dim = (order_ + 1) * feat_dim;

output->resize(output_dim);
auto input = input_feats.matrix<float>();
Expand All @@ -104,7 +104,7 @@ void DeltaDelta::Compute(const Tensor& input_feats, int frame,
double scale = scales[j + max_offset];
if (scale != 0.0) {
for (int k = 0; k < feat_dim; k++) {
(*output)[i * feat_dim + k] += input(offset_frame, k) * scale;
(*output)[i + k * (order_ + 1)] += input(offset_frame, k) * scale;
}
}
}
Expand Down
5 changes: 3 additions & 2 deletions delta/layers/ops/kernels/mfcc_mel_filterbank.cc
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,8 @@ bool MfccMelFilterbank::Initialize(int input_length, double input_sample_rate,
// Always exclude DC; emulate HTK.
const double hz_per_sbin =
0.5 * sample_rate_ / static_cast<double>(input_length_ - 1);
start_index_ = static_cast<int>(1.5 + (lower_frequency_limit / hz_per_sbin));
// start_index_ = static_cast<int>(1.5 + (lower_frequency_limit / hz_per_sbin));
start_index_ = static_cast<int>(lower_frequency_limit / hz_per_sbin);
end_index_ = static_cast<int>(upper_frequency_limit / hz_per_sbin);

// Maps the input spectrum bin indices to filter bank channels/indices. For
Expand Down Expand Up @@ -184,7 +185,7 @@ void MfccMelFilterbank::Compute(const std::vector<double> &input,
output->assign(num_channels_, 0.0);

for (int i = start_index_; i <= end_index_; i++) { // For each FFT bin
double spec_val = sqrt(input[i]);
double spec_val = input[i];
double weighted = spec_val * weights_[i];
int channel = band_mapper_[i];
if (channel >= 0)
Expand Down
Empty file modified delta/layers/ops/kernels/mfcc_mel_filterbank.h
100644 → 100755
Empty file.