Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems reading opus files #19

Closed
snakers4 opened this issue Dec 19, 2019 · 7 comments
Closed

Problems reading opus files #19

snakers4 opened this issue Dec 19, 2019 · 7 comments
Assignees
Labels

Comments

@snakers4
Copy link

snakers4 commented Dec 19, 2019

Hi @Zuzu-Typ ,

Many thanks for your library, it seems to be working, but I am facing some issues.
I managed to successfully load the library on Ubuntu-18.04 after running these commands (some of them may be redundant)

pip install PyOgg
conda install -c anaconda libopus
apt install libopus-dev
apt install libopusfile0

After that I could open and listen to an opus file like this:

import pyogg
import numpy as np
from IPython.display import Audio

file = pyogg.OpusFile("detodos.opus")

wav = []
c = 0

for _ in file.buffer:
    wav.append(_)
    c+=1
    if c > file.buffer_length:
        break
        
wav = np.array(wav)
Audio(wav, rate=file.frequency)

Looks like there are a few problems

  • Looks like if I just iterate over the buffer without breaking, there is an infinite loop somewhere
  • The audio should be ~2.5s long, but when I listen to it, it is ~5s long and the second half is filled with some loud artefacts

Please tell if I am doing something wrong!

@Zuzu-Typ
Copy link
Collaborator

Zuzu-Typ commented Dec 19, 2019

Hello @snakers4,

unfortunately it's quite difficult to do the task you're trying to achieve.
Internally, the buffer is just an array of shorts (short is a small integer) with every even index being the left channel audio data and every odd index being the right channel audio data (thus what you hear is stereo).
That's why it appears to be 5 seconds long if you read it as a whole.

file.buffer is basically just a pointer (i.e. an address in memory) to the first short of that array.
That's why you need to stop after the end of the buffer has reached, because otherwise you'll simply continue reading values from memory.

I think the easiest way to solve this would be as follows:
Firstly, to improve performance, you'll need ctypes - your python provided way of using values that come from C or C++ (such as pointers).
No worries, it comes with Python, so no need to install anything.

Next you'll need numpy, which you already have.

The first step will be to turn the pointer into a numpy array.
The following code does the trick:

import ctypes, numpy, pyogg

[...]

file = pyogg.OpusFile("detodos.opus")

target_datatype = ctypes.c_short * file.buffer_length
buffer_as_array = ctypes.cast(file.buffer, ctypes.POINTER(target_datatype)).contents
buffer_as_numpy_array = numpy.array(buffer_as_array)

Now we need to reorganize the numpy array to a 2d array as requested in the documentation I found.

left_data = buffer_as_numpy_array[0::2] # starting from 0, every second value
right_data = buffer_as_numpy_array[1::2]
final_data = numpy.array((left_data, right_data))

I think that should do it.

I hope the code doesn't contain any typos.

And I hope it helps you!

@snakers4
Copy link
Author

Hi,

Many thanks for your replies!

(i)

with every even index being the left channel audio data and every odd index being the right channel audio data

import pyogg
file = pyogg.OpusFile("detodos.opus")
print(file.channels)

It is weird, when I run this, I get 1 channel in this file.
But on the other hand file.buffer_length "says" that is is 2 channels (sample rate * duration).
It this some opus artefact?

(ii)

I think that should do it.

Many thanks for you example, I tried it.
When I try to listen to it, it appeared to sound really weird and sped up.
Then I tried some stereo files, and looks like this code snippet is the solution.
Looks a bit weird to me. I checked by listening to the wavs I could decode.

import ctypes, numpy, pyogg
from IPython.display import Audio

# file = pyogg.OpusFile("detodos.opus")  # mono
file = pyogg.OpusFile("ehren-paper_lights-64.opus")  # stereo

target_datatype = ctypes.c_short * (file.buffer_length // 2)  # always divide by 2 for some reason
buffer_as_array = ctypes.cast(file.buffer,
                              ctypes.POINTER(target_datatype)).contents
if file.channels == 1:
    wav = numpy.array(buffer_as_array)
elif file.channels == 2:
    wav = numpy.array((wav[0::2],
                       wav[1::2]))
else:
    raise NotImplementedError()

(iii)
Since we are here, a couple of questions

  • Can I encode files using your library? If so, do I need to use the OpusEncoder class and some class like this?
  • # audio frequency (always 48000) - can this be changed somehow? We are working with speech applications, and we are selecting a codec to store vast amounts of data now. Our colleagues said that opus encoding actually improves (!) performance on downstream tasks. But for speech 48kHz seems very excessive. Ofc I can resample downstream using some fast method, but why store 3x data.

(iv)
Maybe it is worth adding the above python example along with codec installation scripts to wiki / README.md so that it would be easier to use the library?
I have an ML themed telegram channel with 2k people reading it, I could tell people about opus and how they can easily work with it in python!

@Zuzu-Typ Zuzu-Typ self-assigned this Dec 20, 2019
@Zuzu-Typ Zuzu-Typ added the bug label Dec 20, 2019
@Zuzu-Typ Zuzu-Typ changed the title Problems reading opus file on Ubuntu 18.04 Problems reading opus files Dec 20, 2019
@Zuzu-Typ
Copy link
Collaborator

Hi again,

okay.
(i)
You're totally right. I didn't realize the file was mono.
The reason why the buffer_length is always twice the actual length is because it's multiplied by two in the code (I don't know why -- and I don't know how even still PyOpenAL plays my opus files loaded by PyOgg just fine ...).

(ii)
That's also why you needed to divide it here.

(iii)

  • Technically, yes, you could encode files using PyOgg, but you would have to use the raw bindings to C code, which is a little cumbersome to deal with.
  • Unfortunately no. The 48000Hz is a decoder restriction. You should be able to decode files with lower frequencies none the less though. They're simply converted to 48kHz.

(iv)
Thank you for the offer :)
Though at the current state using PyOgg is way too complicated and error prone.
I originally created this library for my own needs and I barely knew what I was doing.
I'm definitely willing to give this library a cleanup and improve it's functionality, but I suppose that will take some time.
When I've got the time I'll push a quickfix for the buffer length - though that isn't really a "solution".

Cheers,
--Zuzu_Typ--

@snakers4
Copy link
Author

Thank you for the offer :)

I think I will will cover the available options when I will be writing a post
The channel is located here btw

Though at the current state using PyOgg is way too complicated and error prone.

Correct me if I am wrong, but it looks like there is no proper in-memory library to work with opus files (?). There is pysoundfile, which is really nice, but it is built on top of libsoundfile, which does not support opus (it supports vorbis, though).

Technically it does support it, but there are no binaries available, etc

I tried using the packaged version in 18.04, but there is still no support.

I'm definitely willing to give this library a cleanup and improve it's functionality, but I suppose that will take some time.
When I've got the time I'll push a quickfix for the buffer length - though that isn't really a "solution".

Are you planning on adding the write functionality?

Though at the current state using PyOgg is way too complicated and error prone.

Do you think that even if there was a class for writing files, your library is not suitable for production usage, i.e. there may be memory leaks?

@Zuzu-Typ
Copy link
Collaborator

Correct me if I am wrong, but it looks like there is no proper in-memory library to work with opus files (?)

I don't really know any other libraries that don't have massive overhead in terms of unnecessary frameworks and functionality.

Are you planning on adding the write functionality?

Yes, that should be part of a library that claims to give access to Ogg, FLAC and Opus' functionality.

Do you think that even if there was a class for writing files, your library is not suitable for production usage, i.e. there may be memory leaks?

If I take the time and care, I'm pretty certain that I can make it production ready. Of course, there may always be memory leaks, but none that can't be fixed.

@snakers4
Copy link
Author

snakers4 commented Dec 23, 2019

Gave your library a shout-out here
https://t.me/snakers4/2385

Keep up the good work! =)

@TeamPyOgg TeamPyOgg deleted a comment from Oktai15 Jun 1, 2020
@mattgwwalker
Copy link
Collaborator

Closing this issue. This repository now includes an example of how to read and play Opus-encoded audio using PyOgg (see the file examples/01-play-opus-simpleaudio.py). There is also an example of how to write Opus-encoded audio (see examples/03-write-ogg-opus.py). Both can now be achieved with no requirements for the user to be even aware of the ctypes interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants