-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MARKOV: More efficient implementation for computing stationary distributions #79
Conversation
@oyamad Thanks for the PR. I don't think we've had a chance look through this carefully yet as some of us have just started the school year, but will try and get to this ASAP With respect to your questions:
-- Couldn't help myself. I just glanced through the notebook and this looks very cool. It did seem like we should be able to leverage the fact that we know that we have a unit eigenvalue and this looks like an answer to that. |
This looks very promising. I think the test of irreducibility followed by the specialist algorithm is the way to go. I had a different test of irreducibility in mind but this one looks better. Speed is something of an issue because computing stationary distributions falls inside loops of certain equilibrium problems --- but accuracy is probably more important. I'm going to run some tests on all of this in a couple of days. |
General ThoughtsI've been playing around with the new tools supplied by @oyamad in this PR and I think they're excellent. The code is very well written. I particularly like the following:
Requests for Comments 1It's difficult to understand the difference between "communicating" and "recurrent" classes from the code and documentation. Moreover definitions change slightly from source to source so it would be helpful to be precise here. Evidently we are only interested in communicating classes that are maximal in some sense. For example, if every element of P is strictly positive then any nonempty subset of the state space consists of points that communicate with each other. But we only care about the maximal ergodic set, which is the whole state space. Presumably this is what's meant by a "recurrent class" in the code. But then what is a communicating class? I'm away from my books right now but my understanding is that for a finite Markov chain there's always a decomposition of the state space into a finite number of nonempty maximal recurrent/ergodic sets. (In fact there's a theorem in Stokey, Lucas and Prescott to this effect.) If I'm not mistaken, these exactly correspond to the "recurrent classes" in @oyamad 's code. So my suggestion is that #. We adopt the notation and terminology of Stokey et al. (since it's well known in econ), and cite their theorem for the decomposition. #. We compute only these maximal classes and get rid of the "communicating classes" (or explain the difference very clearly). Requests for Comments 2Some reorganization would be helpful to accommodate this code. In particular, it seems to me that we don't really need both DMarkov and StochMatrix --- particularly if we add some kind of stationary distributions attribute to the latter. At the same time, we should be mindful of trying not to break backwards compatability. Hence functions should not be renamed or relocated unless necessary. What do you think about the following: #. The class DMarkov is eliminated. #. The function #. The functions Further Suggestions#. The name StochMatrix sounds a bit awkward. I suggest renaming it either StochasticMatrix or MarkovMatrix or MarkovChain instead. #. It should perhaps be mentioned somewhere in the docs that we don't return all the stationary distributions, since that would be impossible. We return only the extremal/ergodic distributions, one for each recurrent class. |
@jstac Thank you very much for your comments. Communication Classes/Recurrent ClassesI will consult Stokey-Lucas tomorrow (I don't have it with me now).
Other Requests/Suggestions
Some MoreIt could be too much, but I have some more code which computes the period of an irreducible stochastic matrix (and thus detects whether it is periodic or aperiodic). It uses |
@oyamad These explanations are 100% clear. So my suggestion now is that we keep both communication classes and recurrence classes and add the explanation above to the docstring, including the examples. (If it seems long it can also go into the docstring at the top of the module --- either is fine.) In addition, it's very useful to know if the Markov matrix is aperiodic since, as you say, it means that we have uniform ergodicity and therefore convergence of the marginal distribution of X_t from every initial condition. I think it would be great to have that code as well. @cc7768 Please let us know your thoughts. |
Lots of info to process. I will try and make my response as organized as possible (The summarized version is that I think this will be a good change):
That is more or less what I think right now. If I think of something else, I'll add it. |
Quick comment about subclassing Suppose we have How about I see the appeal of having things like Any thoughts or comments about this? |
@spencerlyon2 Very good point. I didn't think about it. Actually, finding the communication/recurrent classes as well as computing the period are all purely graph theoretic operations. So another possible implementation strategy is to implement these functions in a class, say, (By the way, Networkx has these graph theoretic functions, but it doesn't seem to use functions from scipy such as scipy.sparse.csgraph.connected_components. I guess, although I haven't checked, the code that uses those scipy functions is faster than Networkx.) |
@oyamad I like this suggestion of keeping DMarkov alive and setting up a Digraph class that computes communication classes and so forth. As you say, such methods might also be useful for network models --- which would be good to include --- and Digraph would be a good place to park additional graph related functionality as it comes up. On top of that, the distinction between DMarkov and Digraph is then clear, so there's no problem having both. The only thing I would say on that point is that I think DMarkov should be renamed to something a little clearer. MarkovMatrix seems a popular choice. Finally, while I was enthusiastic about subclassing ndarrays before, I found Spencer's argument quite convincing. A related point is that if you query the methods associated with a StochMatrix you get a huge list of methods, many of which, as Spencer pointed out, are not relevant. This makes it harder to find the methods that we do care about --- those written by Daisuke. Do we all have agreement on the way forward now? |
Good point about not needing to subclass it as an array; I liked the idea that it would just be an array with some additional information, but I agree with @spencerlyon2 @jstac that it adds too many things that won't be applicable. |
Very interesting thread ! It also appears to me that subclassing ndarray introduces many unpredictable operations for the few ones that would be useful. One option would be to have a is_stochastic() method that returns True or False depending on whether all rows sum to 1. Then you would have About the use of properties, there could be one simple rule of thumb: if you call On the terminology aspect, I was wondering whether we could use StochasticMatrix for a matrix whose rows sum to 1 and something like DMarkovChain for the conjunction of a StochasticMatrix, and a list of nodes. This would essentially be the object we would get as a result of the discretization routines. |
Somewhat related: we still lack experience here, but it is becoming apparent that objects used for computational codes should be isomorphic to simple data structures. We are not even sure that we can optimize object methods as well as an (unbound) function using numba. In my view that would be a good reason for avoiding any complication due to object inheritance. |
@albop Many thanks for these thoughts. One interesting piont is on the use of Numba. As you say, Numba seems happier working with less nested structures. On the other hand there are probably ways around this ex post. For example, we can probably outsource computationally intensive tasks from methods to stand alone functions later on if we really need to (i.e., method remains but hot loops are outsourced). @oyamad I really liked the path you suggested above, particularly the idea of separating graph theoretic and Markov centric operations, and I think you can take this discussion as advice, to which free disposal applies. |
I finally updated the previous PR. Graph theoretic operations are implemented in
graph_tools.py:
gth_solve.py:
mc_tools.py:
Issues:
Please see digraph_demo01.ipynb and markovchain_demo01.ipynb for some more details. |
@oyamad I think this looks really great. I'm delighted to get this kind of functionality into quantecon. I want to spend a bit more time with this and I'd like to get another set of eyes to look it over before we merge. I'll be in contact about that very soon. |
This is very interesting. I can have a look at it later this week if you want. |
@oyamad You have done a great job with this. Reading through the PR thread - @jstac comments are excellent - I also agree with @spencerlyon2 re: inheritance. I initially wanted to inherit a pandas DataFrame in my own work - and I decided having control of the base level object is a good thing (and used self.data = pd.DataFrame instead). It also means that if ndarray names something in the future that is the same as one of methods then it will have a name collision. The good news is that it would get overwritten by the QuantEcon method when inheriting down stream. There are a few minor things I would probably change - which are largely stylistic - like:
I do have two questions for the group:
|
@sanguineturtle Thank you for your comments.
I am also open to any suggestion on the class/method names for |
@oyamad Please tell me when you've added the period attribute for non-irreducible chains and I'll accept the PR. |
I finally made a final revision. mc_tools.py:
I suppose this is ready to be merged. Just one thing about implementation of some test code: |
MARKOV: More efficient implementation for computing stationary distributions
Great work, many thanks. |
Hi,
related to #18, I wrote some code that implements an algorithm specialized in computing stationary distributions (instead of using a general routine to solve the eigenvalue problem), called the Grassmann-Taksar-Heyman (GTH) algorithm (see for example these slides, pages 46-81). It is more straightforward than the current procedure, and it turned out that it is accurate enough to give (almost) correct answers even for KMR matrices (without mpmath).
stochmatrix.py:
gth_solve
solves for a nonzero solution to x (P - I) = 0 by the GTH algorithm for an irreducible stochastic matrix P.StochMatrix
decomposes P into irreducible components (wherescipy.sparse.csgraph.connected_components
is used to find communication classes).stationary_dists
passes these irreducible components togth_solve
to obtain all the stationary distributions, one for each irreducible component. (They are contained in the output array as its rows.)mc_tools.py:
mc_compute_stationary
is replaced with the abovestationary_dists
.StochMatrix
are made available inDMarkov
, such ascomm_classes
andrec_classes
.DMarkov
from "mc_compute_stationary" and "mc_sample_path" to "compute_stationary" and "simulate".Please see stochmatrix_demo01.ipynb and dmarkov_demo01.ipynb for some more details.
I also prepared test_stochmatrix.py (and edited test_mc_tools.py).
There are some issues/questions:
StochMatrix
I use decorators, but I am not sure if they are properly used, as well as if my choices between attribute and method in implementing each functionality are appropriate.stationary_dists
is slower than the currentmc_compute_stationary
, but I hope it is acceptable.DMarkov
:pi_0
is not used. Is it for any future use?stationary_dists
in addition to the method to compute stationary distributions. Is it only for the__repr__
method?mc_sample_path
. I could add one.