Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize cdf query of custom model #31

Open
wildug opened this issue Jul 24, 2023 · 1 comment
Open

Vectorize cdf query of custom model #31

wildug opened this issue Jul 24, 2023 · 1 comment

Comments

@wildug
Copy link

wildug commented Jul 24, 2023

In my usecase I want to compress a large amount of data with a custom entropy model.
Unfortunately this takes quite some time since for each compressed symbol the cdf is called.
I can't straight up use the scipy model adapter since I'm using a mixture distribution which is not implemented in scipy.

Here's my dummy code:

from scipy import stats
import constriction
import numpy as np

c = 0
def cdf_likelihood_normal(x, mu, sigma):
    global c
    c += 1
    print(c, end="\r")
    p =  stats.norm.cdf(x, loc=mu, scale=sigma )
    return p

def inverse_cdf_likelihood_normal(q, mu, sigma):
    x = stats.norm.ppf(q, loc=mu, scale = sigma)
    return x

coder = constriction.stream.stack.AnsCoder()
entropy_model = constriction.stream.model.CustomModel(cdf_likelihood_normal, inverse_cdf_likelihood_normal, -10, 10)


sigma =  np.ones(int(1e4))
mu    = np.zeros(int(1e4))
message = np.random.randint(-1,1,int(1e4),dtype=np.int32)

p = stats.norm.cdf(message, loc=mu, scale=sigma) # very fast

coder.encode_reverse(message, entropy_model,  mu, sigma) # very slow
print(coder.num_bits())

reconstruction = coder.decode(entropy_model, mu,sigma)

assert (message == reconstruction).all()

Is it possible to take care of vectorizable cdfs in the custom model adapter to allow for a speed up?

@robamler
Copy link
Collaborator

I can see how vectorizing would reduce overhead from python callbacks. Unfortunately, vectorizing is only possible for encoding; when decoding a symbol, the decoder cannot know where to evaluate the ppf before it has decoded all preceding symbols (except in case of the ChainCoder). I'll have to think a bit what the best API would be to reflect this asymmetry (and to ideally still support vectorization for decoding with a ChainCoder).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants