Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimethods for the random module #73

Merged
merged 4 commits into from Aug 28, 2020

Conversation

joaosferreira
Copy link
Collaborator

@joaosferreira joaosferreira commented Aug 21, 2020

This picks up the work started in #46 that added multimethods for random routines. Most of the multimethods added in the previous PR where revised and hopefully some corrections were made. These were mostly changing the argument replacers and marking arguments for dispatching. This PR also adds two classes, RandomState and Generator. The multimethods added are manifold and are listed below:

Seeding and State

  • get_state
  • set_state
  • seed

Simple random data

  • rand
  • randn
  • randint
  • random_integers
  • random_sample
  • choice
  • bytes

Permutations

  • shuffle
  • permutation

Distributions

  • beta
  • binomial
  • chisquare
  • dirichlet
  • exponential
  • f
  • gamma
  • geometric
  • gumbel
  • hypergeometric
  • laplace
  • logistic
  • lognormal
  • logseries
  • multinomial
  • multivariate_normal
  • negative_binomial
  • noncentral_chisquare
  • noncentral_f
  • normal
  • pareto
  • poisson
  • power
  • rayleigh
  • standard_cauchy
  • standard_exponential
  • standard_gamma
  • standard_normal
  • standard_t
  • triangular
  • uniform
  • vonmises
  • wald
  • weibull
  • zipf

Notes:

  1. Some aliases not included in the above list were also added, they are random, ranf and sample. Although they are documented as aliases for random_sample they all reference different objects and so they have their own multimethods.
  2. The Generator class is commented out in the tests because I'm not entirely sure on what argument I should pass it.
  3. Most of CuPy's random methods should be working since its API follows NumPy's closely but we need to check the tests (some arguments might need to be unmarked).
  4. multinomial is failing with a weird error in the Dask backend. I think it needs to be issued upstream but I'm not sure.

@hameerabbasi
Copy link
Collaborator

hameerabbasi commented Aug 21, 2020

  1. multinomial is failing with a weird error in the Dask backend. I think it needs to be issued upstream but I'm not sure.

Can you come up with a reproducer for it?

@joaosferreira
Copy link
Collaborator Author

What happened:

Calling random.multinomial with the pvals argument as a Dask array fails (there is no documentation for this function which makes it harder to understand if this is a bug or not).

Reproducing the error:

In [1]: import numpy as np                                                      
In [2]: import dask.array as da                                                 
In [3]: pvals = [1/6] * 6                                                       
In [4]: da.random.multinomial(20, da.from_array(np.asarray(pvals)))  
Traceback
TypeError                                 Traceback (most recent call last)
<ipython-input-4-eed5bfc93022> in <module>
----> 1 da.random.multinomial(20, da.from_array(np.asarray(pvals)))

~/miniconda3/envs/unumpy/lib/python3.7/site-packages/dask/array/random.py in multinomial(self, n, pvals, size, chunks, 
**kwargs)
    336             size=size,
    337             chunks=chunks,
--> 338             extra_chunks=((len(pvals),),),
    339         )
    340 

~/miniconda3/envs/unumpy/lib/python3.7/site-packages/dask/array/random.py in _wrap(self, funcname, size, chunks, 
extra_chunks, *args, **kwargs)
    189             (0,) * len(size),
    190             small_args,
--> 191             small_kwargs,
    192         )
    193 

~/miniconda3/envs/unumpy/lib/python3.7/site-packages/dask/array/random.py in _apply_random(RandomState, funcname, 
state_data, size, args, kwargs)
    466     state = RandomState(state_data)
    467     func = getattr(state, funcname)
--> 468     return func(*args, size=size, **kwargs)
    469 
    470 

mtrand.pyx in numpy.random.mtrand.RandomState.multinomial()

~/miniconda3/envs/unumpy/lib/python3.7/site-packages/dask/array/core.py in __len__(self)
   1186     def __len__(self):
   1187         if not self.chunks:
-> 1188             raise TypeError("len() of unsized object")
   1189         return sum(self.chunks[0])
   1190 

TypeError: len() of unsized object

What's expected to happen:

Return the same result as when passing a list of pvals.

In [1]: import numpy as np                                                      
In [2]: import dask.array as da                                                 
In [3]: pvals = [1/6] * 6                                                       
In [4]: da.random.multinomial(20, pvals)                                        
Out[4]: dask.array<multinomial, shape=(6,), dtype=int64, chunksize=(6,), chunktype=numpy.ndarray>

Environment:

  • Dask version: 2.14.0
  • Python version: 3.7.6
  • Operating Sytem: MacOS Catalina (Version 10.15.5)
  • Install method: conda

@joaosferreira
Copy link
Collaborator Author

  1. multinomial is failing with a weird error in the Dask backend. I think it needs to be issued upstream but I'm not sure.

The simpler fix would be to not dispatch pvals.

@joaosferreira
Copy link
Collaborator Author

More multimethods/classes:

Should we include these as well?

@hameerabbasi
Copy link
Collaborator

hameerabbasi commented Aug 22, 2020

Here's the docs for random.multinomial. They say that pvals is a "sequence of floats". Arrays aren't formally sequences, see numpy/numpy#2776. So we should not dispatch it.

@joaosferreira
Copy link
Collaborator Author

Here's the docs for random.multinomial. They say that pvals is a "sequence of floats". Arrays aren't formally sequences, see numpy/numpy#2776. So we should not dispatch it.

This is fixed, thanks for the help.

More multimethods/classes:

Should we include these as well?

I've added these as well.



class RandomState(metaclass=ClassOverrideMetaWithConstructor):
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add methods for these classes as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand what you mean.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at: https://github.com/Quansight-Labs/unumpy/blob/master/unumpy/_multimethods.py#L244-L295

You will see how the ufunc class defines methods. The same needs to be done here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for RandomState or Generator as well? As I understand we just have to write the multimethods again but as class methods, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prioritize Generator, but also RandomState.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will see how the ufunc class defines methods. The same needs to be done here.

I tried calling ufunc's accumulate method but it returns a BackendNotImplementedError. I think the methods are broken and need to be rewritten in terms of overridden_class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does reduce work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it works if reduce is mapped in _implementations in a given backend.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need a way to set the domain of classes to numpy.classname.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a working implementation that serves as a simple solution (although maybe not the ideal) which maps the Generator's methods. In the NumPy backend would be something like this:

for k, v in unumpy.random.Generator.__dict__.items():
    if isinstance(v, _Function):
        _implementations[v] = getattr(np.random.Generator, k)

So for each multimethod in unumpy's Generator class it maps the corresponding method in NumPy's Generator. I think this for loop is at least better than manually adding the methods to the dictionary since they are manifold.

Maybe we need a way to set the domain of classes to numpy.classname.

This sounds like a more interesting solution since the problem resides in the incomplete name of a given class method (i.e., without the class name prefix). I've looked into __qualname__ which seems to give the full name (without the modules) so we might want to use something like that.

@joaosferreira
Copy link
Collaborator Author

The last commit adds the multimethods for the Generator class. Some notes regarding the commit:

  1. The changes in the backends update the domain of class methods to include the name of the class as we discussed. I think this solution is at least better than the one I first proposed.
  2. The added multimethods needed new argument replacers because of the self argument. Since they are almost identical to the non-Generator versions I was trying to implement more general argument replacers that work for both but couldn't so far. Do you have any idea if this can be done? It would clean up the code a bit since it looks awfully repeated.

Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much, @joaosferreira. Any further comments, @peterbell10?

@hameerabbasi hameerabbasi merged commit 5720e11 into Quansight-Labs:master Aug 28, 2020
@hameerabbasi
Copy link
Collaborator

Thanks, @joaosferreira!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants