Reading Voice / Audio from Voice Channel? [For Voice Recognition AI Bot] #444

Mercurial · 2017-01-05T20:18:36Z

Hi guys, I'm wondering if the library has capability to read audio bytes from the voice channels? I'm building a Bot that will read the voice and try to convert it to text commands.

Can anyone enlighten me?

Thanks!

Fuyukai · 2017-01-05T20:20:24Z

Not yet.

Voice receive has been planned for ages. PRs welcome.

Mercurial · 2017-01-05T20:24:09Z

https://github.com/Rapptz/discord.py/blob/master/discord/voice_client.py#L266

Seems to be already reading/polling from the voice channel though?

Fuyukai · 2017-01-05T20:25:56Z

Sure, just design the API, document it fully and submit a pull request.

ghost · 2017-01-30T21:26:35Z

Just to settle this, this has been tried and tried again, and everyone has mostly failed. Danny wants it one in a nice way, but really, it isn't worth the time and effort, so this will not be coming anytime soon.

Mercurial · 2017-01-31T21:38:50Z

whos danny? and why is it hard? isn't it just connecting whatever the socket for the audio and reading that data

ghost · 2017-01-31T22:01:36Z

Danny made this library (Danny = Rapptz). Next, it is very easy to read the data from the websocket, but presenting data that is usable, in a decent manner, is hard. In essence, you have to chunk streamed data, and I dont think anybody wants to go into the trouble of doing that, yet. The hard part is designing the API in a way that is useful and useable, not a quick throw-together solution.

ghost · 2017-01-31T22:11:20Z

Just to let you know, Danny has planned voice recieve for the rewrite

Mercurial · 2017-01-31T22:12:56Z

oh ok thanks for the information!

Ruuttu · 2017-03-07T12:17:04Z

I wanted my bot to play a 15 second replay on-demand, just for laughs basically, so I needed basic recording capability to start with.

I built a setup that mixes together all incoming audio and makes available a single stream of ~50 packets a second. There's no fancy synchronization or stretching, it's just in-out as fast as possible with a latency of a few frames so there's time to get everything in order. You need to call a function to fetch a new frame 50 times a second. Each speaker can be "re-synchronized" when they don't speak, so the stream remains live and stable on the long term even if there's minor drifting. Otherwise you could drop or duplicate packets I'm sure.

The code is shit, but if I could make it a little less shit, would that kind of basic "just feed me data" API be worthy if only as a starting point?

ghost · 2017-03-11T08:10:33Z

You can always pull request, but keep in mind, there have been more than a few failed attempts, since Danny is very strict when it comes to pull requests.

rawrzors · 2017-06-25T16:32:37Z

@Ruuttu Would you be able to share that code? Curious because I'm trying to add some voice recording (save to file)

Thanks

Ruuttu · 2017-06-25T19:11:52Z

Let's see. This was all done against version 0.13.0 at the time.

I started by copying the work from #333 for receiving decrypted opus voice packets. I wrote a "Decoder" class in opus.py, which I've only confirmed to work in Windows.

In your bot (inherited from discord.Client) you need to call enable_voice_events() for your VoiceClient after joining a channel. After that you can receive opus packets in the on_speak() method which you'll add.

I wrote a "Recorder" class that takes the packets from on_speak(), converts them to PCM and maintains continuous per-speaker audio "streams" that sync together. There's a get_replay() method for retrieving the last n seconds of audio. You get lists of tuples because the audio is still separated by speaker, plus there's some extra data. Once you figure out what's what, you can mix together the speakers using python's audioop module.

You'll need to make some edits, but this should have all you need. I added a commented out example of how you might write a mixed down PCM stream to a file. Sorry some of the code is kinda silly and poorly commented.
recording_example.zip

lasa01 · 2017-06-30T22:16:24Z

@Ruuttu Thanks for this! This is very helpful. However, doing these modifications against the latest discord.py version, the decoder doesn't seem to be working, it raises an access violation error.

Ignoring exception in on_speak
Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\discorde\client.py", line 307, in _run_event
    yield from getattr(self, event)(*args, **kwargs)
  File "*******************************************dbot.py", line 60, in on_speak
    await self.servermgrs[server.id].on_speak(data, ssrc, timestamp, sequence)
  File "*******************************************servermgr.py", line 86, in on_speak
    self.vrecorder.receive_packet(data, ssrc, sequence, timestamp)
  File "*******************************************recorder.py", line 124, in receive_packet
    self.streams[ user_id ].append( data, sequence, timestamp )
  File "*******************************************recorder.py", line 30, in append
    pcm = self.decoder.decode( data, self.decoder.samples_per_frame )
  File "C:\Program Files\Python36\lib\site-packages\discorde\opus.py", line 356, in decode
    ret = _lib.opus_decode(self._state, data, max_data_bytes, pcm_pointer, frame_size, 0)
OSError: exception: access violation reading 0x0000000017607EE8

Only thing that has been changed between these versions (of discord.py) in opus.py is it setting the signal type to auto when encoding:

CTL_SET_SIGNAL       = 4024

signal_ctl = {
    'auto': -1000,
    'voice': 3001,
    'music': 3002,
}

class Encoder():
        __init__(self):
                self.set_signal_type('auto')

        def set_signal_type(self, req):
                if req not in signal_ctl:
                    raise KeyError('%r is not a valid signal setting. Try one of: %s' % (req, ','.join(signal_ctl)))

                k = signal_ctl[req]
                ret = _lib.opus_encoder_ctl(self._state, CTL_SET_SIGNAL, k)

                if ret < 0:
                    log.info('error has happened in set_signal_type')
                    raise OpusError(ret)

(in opus.py)

I just recently started with Python so I don't have any idea how this could be fixed. I already got decoding working before using python-opus(with some editing), but it would be nice to get this working since it doesn't need another library.

EDIT: I think i got it working, atleast it doesn't error anymore. I was just messing around in opus.py and somehow got it working. Here is my opus.py that seems to be working.

Bottersnike · 2017-08-27T08:45:07Z

I've been needing voice recieve for some stuff, and I've had a poke around and I think it should be possible to knock together a jitter buffer to handle recieving audio when I get home.

lasa01 · 2017-08-27T09:54:02Z

@Bottersnike Ruuttu's initial code seems to no longer work, it fails decrypting the voice packets with some ciphertext error. If you get your code working, could you share atleast the voice packet decrypting part? Thanks!

Bottersnike · 2017-08-27T10:50:15Z

I implemented it in node the other day because that was the only language I could find a good lib for receiving. It shouldn't be too hard to port it over and then make it conform to d.py.

mturley · 2018-01-05T00:51:35Z

Did you guys ever end up figuring out a reliable solution for audio receive? I would be happy to use someone's fork in the meantime if it's not good enough to be merged upstream.

My use case: I want to set up a Raspberry Pi running discord.py that will operate as a passthrough audio device to both transmit to and receive from a discord channel using the microphone and headphone jack of a USB audio adapter connected to the Pi. Then I plan to connect the mic jack to a feed coming from my Playstation 4, and the headphone jack to a line in adapter for the PS4... connect the PS4 to Party Chat and leave both it and the Pi running, and suddenly I have an official PSN Party that will allow PS4 players to chat with Discord users (who are playing the same cross-platform MMO on PC). It's for my Final Fantasy XIV group... But I imagine the 2-way Discord audio on the Pi might be useful for others too.

mturley · 2018-01-05T00:55:57Z

Looks like I might have better luck using https://discord.js.org instead.

Bottersnike · 2018-01-05T10:29:03Z

Indeed. The packet parsing that I was using was relying on the fact that Discord was not using the most up-to-date structure. Because of that, the entire RFC wasn't implemented. Due to my lack of motivation, I'm unlikely to ever fix it.

ghost · 2019-02-23T22:44:52Z

Sorry if I'm not up to date on this, has there been any work on this ?
I'm interested in this feature for a voice recognition attempt I'm working on.

Harmon758 · 2019-02-24T00:17:41Z

See #1094

ghost · 2019-02-24T04:26:46Z

See #1094

Thanks for this :)

Disord py doesn't yet let us simply read/listen audio present in a voice channel. See Rapptz/discord.py#1094 and Rapptz/discord.py#444 It needs probably more work than I intend to do, websockets ack rec convert audio async etc Nothing impossible but I expected to just use a play and record functions, not having to implement one.

Jourdelune · 2023-02-27T10:14:18Z

This feature is useful, for example transcribe audio from channel and translate it in real time.

Rapptz added this to the Rewrite milestone May 21, 2017

Rapptz added feature request This is a feature request. v1.0-alpha This pertains to the rewrite version labels May 21, 2017

Rapptz removed this from the Rewrite milestone Oct 20, 2018

Harmon758 mentioned this issue Dec 3, 2018

More voice events like on_speak and on_speak_stop #1755

Closed

Harmon758 mentioned this issue Feb 14, 2020

Is there audio receiver now? I didn't find any news about it #2568

Closed

Harmon758 mentioned this issue Mar 18, 2020

How to listen voice channel audio??? #2598

Closed

Rapptz removed the v1.0-alpha This pertains to the rewrite version label Apr 29, 2021

Makiyu-py mentioned this issue Sep 1, 2021

Voice Receive API Pycord-Development/pycord#52

Closed

Rapptz mentioned this issue Jun 19, 2022

Audio Recording Feature #8162

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading Voice / Audio from Voice Channel? [For Voice Recognition AI Bot] #444

Reading Voice / Audio from Voice Channel? [For Voice Recognition AI Bot] #444

Mercurial commented Jan 5, 2017

Fuyukai commented Jan 5, 2017

Mercurial commented Jan 5, 2017 •

edited

Loading

Fuyukai commented Jan 5, 2017

ghost commented Jan 30, 2017

Mercurial commented Jan 31, 2017 •

edited

Loading

ghost commented Jan 31, 2017

ghost commented Jan 31, 2017

Mercurial commented Jan 31, 2017

Ruuttu commented Mar 7, 2017

ghost commented Mar 11, 2017

rawrzors commented Jun 25, 2017

Ruuttu commented Jun 25, 2017

lasa01 commented Jun 30, 2017 •

edited

Loading

Bottersnike commented Aug 27, 2017

lasa01 commented Aug 27, 2017

Bottersnike commented Aug 27, 2017

mturley commented Jan 5, 2018

mturley commented Jan 5, 2018

Bottersnike commented Jan 5, 2018

ghost commented Feb 23, 2019

Harmon758 commented Feb 24, 2019

ghost commented Feb 24, 2019

Jourdelune commented Feb 27, 2023

Reading Voice / Audio from Voice Channel? [For Voice Recognition AI Bot] #444

Reading Voice / Audio from Voice Channel? [For Voice Recognition AI Bot] #444

Comments

Mercurial commented Jan 5, 2017

Fuyukai commented Jan 5, 2017

Mercurial commented Jan 5, 2017 • edited Loading

Fuyukai commented Jan 5, 2017

ghost commented Jan 30, 2017

Mercurial commented Jan 31, 2017 • edited Loading

ghost commented Jan 31, 2017

ghost commented Jan 31, 2017

Mercurial commented Jan 31, 2017

Ruuttu commented Mar 7, 2017

ghost commented Mar 11, 2017

rawrzors commented Jun 25, 2017

Ruuttu commented Jun 25, 2017

lasa01 commented Jun 30, 2017 • edited Loading

Bottersnike commented Aug 27, 2017

lasa01 commented Aug 27, 2017

Bottersnike commented Aug 27, 2017

mturley commented Jan 5, 2018

mturley commented Jan 5, 2018

Bottersnike commented Jan 5, 2018

ghost commented Feb 23, 2019

Harmon758 commented Feb 24, 2019

ghost commented Feb 24, 2019

Jourdelune commented Feb 27, 2023

Mercurial commented Jan 5, 2017 •

edited

Loading

Mercurial commented Jan 31, 2017 •

edited

Loading

lasa01 commented Jun 30, 2017 •

edited

Loading