Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading Voice / Audio from Voice Channel? [For Voice Recognition AI Bot] #444

Open
Mercurial opened this issue Jan 5, 2017 · 23 comments
Open
Labels
feature request This is a feature request.

Comments

@Mercurial
Copy link

Hi guys, I'm wondering if the library has capability to read audio bytes from the voice channels? I'm building a Bot that will read the voice and try to convert it to text commands.

Can anyone enlighten me?

Thanks!

@Fuyukai
Copy link
Contributor

Fuyukai commented Jan 5, 2017

Not yet.

Voice receive has been planned for ages. PRs welcome.

@Mercurial
Copy link
Author

Mercurial commented Jan 5, 2017

https://github.com/Rapptz/discord.py/blob/master/discord/voice_client.py#L266

Seems to be already reading/polling from the voice channel though?

@Fuyukai
Copy link
Contributor

Fuyukai commented Jan 5, 2017

Sure, just design the API, document it fully and submit a pull request.

@ghost
Copy link

ghost commented Jan 30, 2017

Just to settle this, this has been tried and tried again, and everyone has mostly failed. Danny wants it one in a nice way, but really, it isn't worth the time and effort, so this will not be coming anytime soon.

@Mercurial
Copy link
Author

Mercurial commented Jan 31, 2017

whos danny? and why is it hard? isn't it just connecting whatever the socket for the audio and reading that data

@ghost
Copy link

ghost commented Jan 31, 2017

Danny made this library (Danny = Rapptz). Next, it is very easy to read the data from the websocket, but presenting data that is usable, in a decent manner, is hard. In essence, you have to chunk streamed data, and I dont think anybody wants to go into the trouble of doing that, yet. The hard part is designing the API in a way that is useful and useable, not a quick throw-together solution.

@ghost
Copy link

ghost commented Jan 31, 2017

Just to let you know, Danny has planned voice recieve for the rewrite

@Mercurial
Copy link
Author

oh ok thanks for the information!

@Ruuttu
Copy link

Ruuttu commented Mar 7, 2017

I wanted my bot to play a 15 second replay on-demand, just for laughs basically, so I needed basic recording capability to start with.

I built a setup that mixes together all incoming audio and makes available a single stream of ~50 packets a second. There's no fancy synchronization or stretching, it's just in-out as fast as possible with a latency of a few frames so there's time to get everything in order. You need to call a function to fetch a new frame 50 times a second. Each speaker can be "re-synchronized" when they don't speak, so the stream remains live and stable on the long term even if there's minor drifting. Otherwise you could drop or duplicate packets I'm sure.

The code is shit, but if I could make it a little less shit, would that kind of basic "just feed me data" API be worthy if only as a starting point?

@ghost
Copy link

ghost commented Mar 11, 2017

You can always pull request, but keep in mind, there have been more than a few failed attempts, since Danny is very strict when it comes to pull requests.

@Rapptz Rapptz added this to the Rewrite milestone May 21, 2017
@Rapptz Rapptz added feature request This is a feature request. v1.0-alpha This pertains to the rewrite version labels May 21, 2017
@rawrzors
Copy link

@Ruuttu Would you be able to share that code? Curious because I'm trying to add some voice recording (save to file)

Thanks

@Ruuttu
Copy link

Ruuttu commented Jun 25, 2017

Let's see. This was all done against version 0.13.0 at the time.

I started by copying the work from #333 for receiving decrypted opus voice packets. I wrote a "Decoder" class in opus.py, which I've only confirmed to work in Windows.

In your bot (inherited from discord.Client) you need to call enable_voice_events() for your VoiceClient after joining a channel. After that you can receive opus packets in the on_speak() method which you'll add.

I wrote a "Recorder" class that takes the packets from on_speak(), converts them to PCM and maintains continuous per-speaker audio "streams" that sync together. There's a get_replay() method for retrieving the last n seconds of audio. You get lists of tuples because the audio is still separated by speaker, plus there's some extra data. Once you figure out what's what, you can mix together the speakers using python's audioop module.

You'll need to make some edits, but this should have all you need. I added a commented out example of how you might write a mixed down PCM stream to a file. Sorry some of the code is kinda silly and poorly commented.
recording_example.zip

@lasa01
Copy link

lasa01 commented Jun 30, 2017

@Ruuttu Thanks for this! This is very helpful. However, doing these modifications against the latest discord.py version, the decoder doesn't seem to be working, it raises an access violation error.

Ignoring exception in on_speak
Traceback (most recent call last):
  File "C:\Program Files\Python36\lib\site-packages\discorde\client.py", line 307, in _run_event
    yield from getattr(self, event)(*args, **kwargs)
  File "*******************************************dbot.py", line 60, in on_speak
    await self.servermgrs[server.id].on_speak(data, ssrc, timestamp, sequence)
  File "*******************************************servermgr.py", line 86, in on_speak
    self.vrecorder.receive_packet(data, ssrc, sequence, timestamp)
  File "*******************************************recorder.py", line 124, in receive_packet
    self.streams[ user_id ].append( data, sequence, timestamp )
  File "*******************************************recorder.py", line 30, in append
    pcm = self.decoder.decode( data, self.decoder.samples_per_frame )
  File "C:\Program Files\Python36\lib\site-packages\discorde\opus.py", line 356, in decode
    ret = _lib.opus_decode(self._state, data, max_data_bytes, pcm_pointer, frame_size, 0)
OSError: exception: access violation reading 0x0000000017607EE8

Only thing that has been changed between these versions (of discord.py) in opus.py is it setting the signal type to auto when encoding:

CTL_SET_SIGNAL       = 4024

signal_ctl = {
    'auto': -1000,
    'voice': 3001,
    'music': 3002,
}

class Encoder():
        __init__(self):
                self.set_signal_type('auto')

        def set_signal_type(self, req):
                if req not in signal_ctl:
                    raise KeyError('%r is not a valid signal setting. Try one of: %s' % (req, ','.join(signal_ctl)))

                k = signal_ctl[req]
                ret = _lib.opus_encoder_ctl(self._state, CTL_SET_SIGNAL, k)

                if ret < 0:
                    log.info('error has happened in set_signal_type')
                    raise OpusError(ret)

(in opus.py)

I just recently started with Python so I don't have any idea how this could be fixed. I already got decoding working before using python-opus(with some editing), but it would be nice to get this working since it doesn't need another library.

EDIT: I think i got it working, atleast it doesn't error anymore. I was just messing around in opus.py and somehow got it working. Here is my opus.py that seems to be working.

@Bottersnike
Copy link

I've been needing voice recieve for some stuff, and I've had a poke around and I think it should be possible to knock together a jitter buffer to handle recieving audio when I get home.

@lasa01
Copy link

lasa01 commented Aug 27, 2017

@Bottersnike Ruuttu's initial code seems to no longer work, it fails decrypting the voice packets with some ciphertext error. If you get your code working, could you share atleast the voice packet decrypting part? Thanks!

@Bottersnike
Copy link

I implemented it in node the other day because that was the only language I could find a good lib for receiving. It shouldn't be too hard to port it over and then make it conform to d.py.

@mturley
Copy link

mturley commented Jan 5, 2018

Did you guys ever end up figuring out a reliable solution for audio receive? I would be happy to use someone's fork in the meantime if it's not good enough to be merged upstream.

My use case: I want to set up a Raspberry Pi running discord.py that will operate as a passthrough audio device to both transmit to and receive from a discord channel using the microphone and headphone jack of a USB audio adapter connected to the Pi. Then I plan to connect the mic jack to a feed coming from my Playstation 4, and the headphone jack to a line in adapter for the PS4... connect the PS4 to Party Chat and leave both it and the Pi running, and suddenly I have an official PSN Party that will allow PS4 players to chat with Discord users (who are playing the same cross-platform MMO on PC). It's for my Final Fantasy XIV group... But I imagine the 2-way Discord audio on the Pi might be useful for others too.

@mturley
Copy link

mturley commented Jan 5, 2018

Looks like I might have better luck using https://discord.js.org instead.

@Bottersnike
Copy link

Indeed. The packet parsing that I was using was relying on the fact that Discord was not using the most up-to-date structure. Because of that, the entire RFC wasn't implemented. Due to my lack of motivation, I'm unlikely to ever fix it.

@ghost
Copy link

ghost commented Feb 23, 2019

Sorry if I'm not up to date on this, has there been any work on this ?
I'm interested in this feature for a voice recognition attempt I'm working on.

@Harmon758
Copy link
Contributor

See #1094

@ghost
Copy link

ghost commented Feb 24, 2019

See #1094

Thanks for this :)

PepeWork added a commit to PepeWork/Spydis that referenced this issue Jan 20, 2021
Disord py doesn't yet let us simply read/listen audio present in a voice channel.
See Rapptz/discord.py#1094 and Rapptz/discord.py#444
It needs probably more work than I intend to do, websockets ack rec convert audio async etc
Nothing impossible but I expected to just use a play and record functions, not having to implement one.
@Rapptz Rapptz removed the v1.0-alpha This pertains to the rewrite version label Apr 29, 2021
@Jourdelune
Copy link

This feature is useful, for example transcribe audio from channel and translate it in real time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request This is a feature request.
Projects
None yet
Development

No branches or pull requests

10 participants