Speech Regression Model #1629

diego-fustes · 2021-01-13T08:51:53Z

Extend the current speech classification models (e.g. Speaker Recognition) to accommodate the prediction of a numeric target (regression).
This can be applied to use cases like Speech Quality Assessment

lgtm-com · 2021-01-13T09:04:57Z

This pull request introduces 1 alert when merging 657e3e5 into bc5f034 - view on LGTM.com

new alerts:

1 for 'import *' may pollute namespace

diego-fustes · 2021-01-13T09:09:52Z

Hi @titu1994, I would like to merge this pull request. Could you review it? Thanks in advance

lgtm-com · 2021-01-13T09:52:13Z

This pull request introduces 1 alert when merging b4cbe14 into bc5f034 - view on LGTM.com

new alerts:

1 for 'import *' may pollute namespace

titu1994

Thanks for this great PR! There are a few concerns with the amount of code duplication occuring with the parallel EncDecClassificationModel.

PR #1617 from @fayejf is trying to unify the two already similar models (Classification model and Label model for speaker labels), and I think this regression model should become a third subclass in order to share the vast commonalities between the models.

That PR also greatly refactors the dataset management using helper methods that collapse configs, and unify access, so thats another thing this PR would need to do to stay consistent.

I think this PR should remain in draft status, and rebase on top of #1617 once that is merged, so as to subclass and share as much common code as possible.

titu1994 · 2021-01-13T23:12:18Z

nemo/collections/asr/data/audio_to_target.py

+    Module which reads speech recognition with a numeric target. It accepts comma-separated
+    JSON manifest files describing the correspondence between wav audio files
+    and their target values. JSON files should be of the following format::
+        {"audio_filepath": path_to_wav_0, "duration": time_in_sec_0, "target": \


Where is target and scalar used in side __getitem__? This dataset seems to be a duplicate of the AudioToLabelDataSet

I think it would be best to reuse functionality in the preexisting dataset classes (by making them adaptive to the input data).

So if the AudioToLabelDataSet gets a label - assume its a classification problem. If its a regression_target - assume its a regression value.

titu1994 · 2021-01-13T23:22:56Z

nemo/collections/asr/models/regression_models.py

+__all__ = ['EncDecRegressionModel']
+
+
+class EncDecRegressionModel(ASRModel, Exportable):


I think this class is almost identical to the EncDecClassificationModel with minor semantic changes - output_types is regression type, loss is MSE, dataset is slightly different., logs are slightly different. Its constructor, forward, train/val/test/multi_*, transcribe, export etc are all very similar to each other with minor differences only.

I think we can refactor a base class that operates on both classification and regression tasks, and dynamically switches its behaviour. This will avoid long term maintenance costs of keeping two nearly identical models in sync.

titu1994 · 2021-01-13T23:23:40Z

nemo/collections/asr/models/regression_models.py

+            'test_loss': loss_value,
+        }
+
+    def multi_test_epoch_end(self, outputs, dataloader_idx: int = 0):


For multi methods, please return a log dict. It manages the actual logging with self.log() internally while properly attaching dataset prefix names.

titu1994 · 2021-01-13T23:23:49Z

nemo/collections/asr/models/regression_models.py

+            'val_loss': loss_value,
+        }
+
+    def multi_validation_epoch_end(self, outputs, dataloader_idx: int = 0):


For multi methods, please return a log dict. It manages the actual logging with self.log() internally while properly attaching dataset prefix names.

titu1994 · 2021-01-13T23:26:10Z

nemo/collections/asr/modules/conv_asr.py

@@ -331,6 +331,57 @@ def num_classes(self):
        return self._num_classes


+class ConvASRDecoderRegression(NeuralModule, Exportable):


This neural module can be refactored entirely inside ConvASRDecoderClassification by adding/removing the final activation function. I think having a shared base module between the two will greatly reduce this code duplication.

diego-fustes · 2021-01-14T08:45:24Z

I completely agree with your comments, there are clear opportunities to avoid code duplication. I've implemented this way for the sake of clarity and to avoid conflicts. Anyway, I will watch the related PR and refactor this one after it's been merged.

On the other hand, the DCO check is failing as my commits have been signed with a wrong email. I've tried to correct it by rebasing, but I wasn't able to fix it so far. Any clue about how to handle it?

Thanks for your help

fayejf · 2021-01-30T01:49:49Z

HI @diego-fustes Sorry for keep you waiting. PR#1617 is merged now 😃 . Please rebase on it. And let me know if you have any questions.

lgtm-com · 2021-01-30T01:59:39Z

This pull request introduces 1 alert when merging 33d6fd0 into 14904fd - view on LGTM.com

new alerts:

1 for 'import *' may pollute namespace

diego-fustes · 2021-02-01T14:12:21Z

Hi @fayejf and @titu1994 :

I've been looking at the current code base, and I think that some refactoring is still needed. To start with, I think that EndDecSpeakerLabelModel should be a subclass of EncDecClassificationModel. At the end, I think that it's almost the same, just with the option of switching the loss to AngularSoftmaxLoss and that it outputs the embeddings along with the logits.

On the other hand, the class EncDecRegressionModel that I'm implementing should be a superclass of EncDecClassificationModel. At the end, classification can be seem as a special type of regression with a threshold...

What do you think? It's quite a major change, I'm willing to implement it if you agree

titu1994 · 2021-02-01T17:39:48Z

Hey @diego-fustes,

You raise some are valid points, but there are other reasons they are kept separate as of now.

EndDecSpeakerLabelModel should be a subclass of EncDecClassificationModel. At the end, I think that it's almost the same, just with the option of switching the loss to AngularSoftmaxLoss and that it outputs the embeddings along with the logits.

Yes, this is true - for the present. The reason we internally decided to separate the two models is that Speech classification is guarenteed to have a 1 ip - 1 op pipeline as in general classification models. Speaker classification on the other hand may use auxiliary information in the future, or might require architectures with multiple outputs. Speaker classification is also one step in the entire speaker diarization pipeline, which needs different interaction than the simpler classification model. All in all, it was therefore concluded that albeit they share significant similarities, they would still be separate entities.

EncDecRegressionModel that I'm implementing should be a superclass of EncDecClassificationModel. At the end, classification can be seem as a special type of regression with a threshold

This is something I disagree with. Simply put, while mathematically correct that classification is a special case of regression, that does not mean it is semantically the same task. We dont apply regression losses (MSE) to train MNIST / ImageNet, nor do we use any activation function for regression models (as an example). The dataset, and the semantic task that the model performs is entirely different, even if mathematically the only difference is the application of softmax/sigmoid at the end of the model.

lgtm-com · 2021-02-03T12:40:33Z

This pull request introduces 1 alert when merging 01e79ac into 941ef1f - view on LGTM.com

new alerts:

1 for 'import *' may pollute namespace

diego-fustes · 2021-02-03T12:53:52Z

Hi @titu1994

I agree with you, from a practical perspective it's better to implement the common code for regression and classification in an abstract class. I've just pushed a new version, trying to minimize code duplication. Apologies as I think that I've added more commits that needed, I'm a kind of starting with GitHub.

Kind regards

lgtm-com · 2021-02-03T12:58:00Z

This pull request introduces 1 alert when merging 1452075 into 941ef1f - view on LGTM.com

new alerts:

1 for 'import *' may pollute namespace

titu1994 · 2021-02-03T17:05:55Z

Hey @diego-fustes, no worries about Git. Could you try rebasing your commit on top of main branch instead of merge? That should clean up the commit history.

I was looking though some of the changes, and they are pretty good. Some minor cleanup here and there and it should be good to merge. I'll post the comments after the rebase (as comments sometime get deleted after rebase)

diego-fustes force-pushed the main branch from 466c142 to 657e3e5 Compare January 13, 2021 08:56

diego-fustes force-pushed the main branch from 657e3e5 to f530a1a Compare January 13, 2021 09:38

titu1994 requested changes Jan 13, 2021

View reviewed changes

diego-fustes marked this pull request as draft January 14, 2021 08:46

diego-fustes marked this pull request as ready for review February 3, 2021 12:50

diego-fustes closed this Feb 3, 2021

diego-fustes force-pushed the main branch from b629d22 to 4c1d7b5 Compare February 3, 2021 22:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech Regression Model #1629

Speech Regression Model #1629

diego-fustes commented Jan 13, 2021

lgtm-com bot commented Jan 13, 2021

diego-fustes commented Jan 13, 2021

lgtm-com bot commented Jan 13, 2021

titu1994 left a comment

titu1994 Jan 13, 2021

titu1994 Jan 13, 2021

titu1994 Jan 13, 2021

titu1994 Jan 13, 2021

titu1994 Jan 13, 2021

diego-fustes commented Jan 14, 2021

fayejf commented Jan 30, 2021

lgtm-com bot commented Jan 30, 2021

diego-fustes commented Feb 1, 2021

titu1994 commented Feb 1, 2021 •

edited

lgtm-com bot commented Feb 3, 2021

diego-fustes commented Feb 3, 2021

lgtm-com bot commented Feb 3, 2021

titu1994 commented Feb 3, 2021

		__all__ = ['EncDecRegressionModel']


		class EncDecRegressionModel(ASRModel, Exportable):

		@@ -331,6 +331,57 @@ def num_classes(self):
		return self._num_classes


		class ConvASRDecoderRegression(NeuralModule, Exportable):

Speech Regression Model #1629

Speech Regression Model #1629

Conversation

diego-fustes commented Jan 13, 2021

lgtm-com bot commented Jan 13, 2021

diego-fustes commented Jan 13, 2021

lgtm-com bot commented Jan 13, 2021

titu1994 left a comment

Choose a reason for hiding this comment

titu1994 Jan 13, 2021

Choose a reason for hiding this comment

titu1994 Jan 13, 2021

Choose a reason for hiding this comment

titu1994 Jan 13, 2021

Choose a reason for hiding this comment

titu1994 Jan 13, 2021

Choose a reason for hiding this comment

titu1994 Jan 13, 2021

Choose a reason for hiding this comment

diego-fustes commented Jan 14, 2021

fayejf commented Jan 30, 2021

lgtm-com bot commented Jan 30, 2021

diego-fustes commented Feb 1, 2021

titu1994 commented Feb 1, 2021 • edited

lgtm-com bot commented Feb 3, 2021

diego-fustes commented Feb 3, 2021

lgtm-com bot commented Feb 3, 2021

titu1994 commented Feb 3, 2021

titu1994 commented Feb 1, 2021 •

edited