Enhance speech recognition in speech.py #52

kamalcherala · 2025-05-05T16:32:23Z

This update improves the speech recognition accuracy in speech.py by adding support for multiple languages. I also refactored the audio processing logic to handle diverse accents more effectively.

Key Changes:

Added language detection to automatically switch between English and Spanish.
Refined audio preprocessing to improve recognition accuracy.
Updated the recognize_speech function to handle errors more gracefully.

Testing:

Run the script with an audio file in English or Spanish to test the new features.
Ensure the language is correctly detected and the speech is transcribed accurately.

No new dependencies were added.

chimosky · 2025-05-05T20:18:19Z

speech.py

@@ -1,36 +1,15 @@
-# Copyright (C) 2009, Aleksey Lim


Why delete the license?

chimosky · 2025-05-05T20:20:40Z

speech.py

+    def connect_peak(self, cb): self._cb['peak'] = self.connect('peak', cb)
+    def connect_wave(self, cb): self._cb['wave'] = self.connect('wave', cb)
+    def connect_idle(self, cb): self._cb['idle'] = self.connect('idle', cb)


This isn't consistent with the code in this activity.

chimosky · 2025-05-05T20:24:05Z

speech.py

@@ -40,162 +19,103 @@ class Speech(GstSpeechPlayer):
    }

    def __init__(self):
-        GstSpeechPlayer.__init__(self)
+        super().__init__()


Why switch to this when the previous code works, achieves the desired result and is consistent with our codebase?

chimosky · 2025-05-05T20:25:18Z

speech.py

-        # build a pipeline that makes speech
-        # and sends it to both the audio output
-        # and a fake one that we use to draw from
        cmd = 'espeak name=espeak' \
-            ' ! capsfilter name=caps' \
-            ' ! tee name=me' \
-            ' me.! queue ! autoaudiosink name=ears' \
-            ' me.! queue ! fakesink name=sink'
-        self.pipeline = Gst.parse_launch(cmd)


The comment here is explaining an obscurity, why delete it?

chimosky · 2025-05-05T20:33:59Z

speech.py

+    def restart_sound_device(self):
+        super().restart_sound_device()

+        def check_idle():
+            if self.pipeline and self.pipeline.get_state(0)[1] == Gst.State.NULL:
+                self.queue.pop(0)
+                self._speak_next()
+            return False

-_speech = None
+        GLib.timeout_add(500, check_idle)


Any particular reason why you're redefining this?

chimosky · 2025-05-05T20:36:15Z

speech.py

+SUPPORTED_LANGUAGES = {
+    'en': 'en',         # English
+    'es': 'es',         # Spanish
+    'fr': 'fr',         # French
+    'de': 'de',         # German
+    'hi': 'hi',         # Hindi
+    # Add more as needed
+}


There's a comprehensive list of voices supported by espeak which is supported in sugar3.speech, did you look at that?

chimosky · 2025-05-05T20:38:34Z

Reviewed, not tested.

Your opening comment should be part of your commit message as git stores your commit message and your opening comment is lost, see making commits.

Multiple support for different languages already exists, did you try to use the activity?

What did you notice that prompted the change? Did your PR fix it?

quozl · 2025-05-05T21:02:03Z

Deleting copyrights and license is egregious. We've never heard from this contributor before, so it may be an attack. Let's look very carefully at any response. I agree the toolkit has this support already, so I don't see why it should be added in Speak.

chimosky · 2025-05-05T21:10:13Z

Deleting copyrights and license is egregious. We've never heard from this contributor before, so it may be an attack. Let's look very carefully at any response. I agree the toolkit has this support already, so I don't see why it should be added in Speak.

I'm wondering why the copyrights and license was deleted too as it makes no sense whatsoever, also reminds us of something we've seen a lot lately, people not looking at their diffs before making a change.

amannaik247 · 2025-05-07T02:31:57Z

Deleting copyrights and license is egregious. We've never heard from this contributor before, so it may be an attack. Let's look very carefully at any response. I agree the toolkit has this support already, so I don't see why it should be added in Speak.

I'm wondering why the copyrights and license was deleted too as it makes no sense whatsoever, also reminds us of something we've seen a lot lately, people not looking at their diffs before making a change.

I think this is probably the case of people using AI code editors to modify code based on prompts. That is why many of them just open a PR without even having launched the activity yet.
Obviously, not every case could be like this.
But most of them I feel so could be falling into this category.

quozl · 2025-05-07T20:58:06Z

@kamalcherala we are waiting for your response to review comments.

(you have not previously contributed, so it is possible the account you are using is compromised, or a sock puppet of someone else trying to sway a discussion, and we may need to use the GitHub features to flag the account as a source of spam ... and look very closely or not at all at new contributors ... don't poison the well).

kamalcherala · 2025-05-07T21:34:45Z

Hi EVERYONE @chimosky @quozl @amannaik247 ;
Thank you for your detailed feedback and for taking the time to review my contribution.

First and foremost, I sincerely apologize for the confusion caused by my recent commit — especially regarding the license removal and inconsistencies with the existing codebase. That was absolutely not my intention, and I understand the seriousness of such changes.

I’m currently working on addressing all the issues raised and aligning my code with the established practices and expectations of the project. I kindly request a little more time — within the next 24 hours — to make the necessary corrections and push a revised version.

I truly appreciate your patience and the opportunity to contribute. Thank you again for your guidance and for maintaining such a high standard for this project.

Best regards,
CHERALA SAI KAMAL

Updated speech.py

d477a6d

kamalcherala mentioned this pull request May 5, 2025

[DMP 2025]: Enhance Speech Recognition in Speech.py #53

Closed

5 tasks

chimosky reviewed May 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance speech recognition in speech.py #52

Enhance speech recognition in speech.py #52

Uh oh!

kamalcherala commented May 5, 2025

Uh oh!

chimosky May 5, 2025

Uh oh!

chimosky May 5, 2025

Uh oh!

chimosky May 5, 2025

Uh oh!

chimosky May 5, 2025

Uh oh!

chimosky May 5, 2025

Uh oh!

chimosky May 5, 2025

Uh oh!

chimosky commented May 5, 2025 •

edited

Loading

Uh oh!

quozl commented May 5, 2025

Uh oh!

chimosky commented May 5, 2025

Uh oh!

amannaik247 commented May 7, 2025

Uh oh!

quozl commented May 7, 2025

Uh oh!

kamalcherala commented May 7, 2025

Uh oh!

Uh oh!

Enhance speech recognition in speech.py #52

Are you sure you want to change the base?

Enhance speech recognition in speech.py #52

Uh oh!

Conversation

kamalcherala commented May 5, 2025

Uh oh!

chimosky May 5, 2025

Choose a reason for hiding this comment

Uh oh!

chimosky May 5, 2025

Choose a reason for hiding this comment

Uh oh!

chimosky May 5, 2025

Choose a reason for hiding this comment

Uh oh!

chimosky May 5, 2025

Choose a reason for hiding this comment

Uh oh!

chimosky May 5, 2025

Choose a reason for hiding this comment

Uh oh!

chimosky May 5, 2025

Choose a reason for hiding this comment

Uh oh!

chimosky commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quozl commented May 5, 2025

Uh oh!

chimosky commented May 5, 2025

Uh oh!

amannaik247 commented May 7, 2025

Uh oh!

quozl commented May 7, 2025

Uh oh!

kamalcherala commented May 7, 2025

Uh oh!

Uh oh!

chimosky commented May 5, 2025 •

edited

Loading