New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DiscreteDP: Issues #185

Open
oyamad opened this Issue Sep 12, 2015 · 13 comments

Comments

Projects
None yet
7 participants
@oyamad
Copy link
Member

oyamad commented Sep 12, 2015

Here's a list of issues for discussion regarding the DiscreteDP class.

  1. This is the major issue: There is an intrinsic limitation of the current approach of DiscreteDP, that one has to prepare arrays R and Q (and s_indices and a_indices with the "state-action pairs formulation") in advance. If the problem is large, the memory can become full upon creating those arrays (before passing them to DiscreteDP). For example, I tried to setup the problem given in Comparison-Programming-Languages-Economics, where there are 17,820 x 5 = 89,100 states and 17,820 actions, so there are 89,100 x 17,820 ~ 1.6 x 10^9 state-action pairs. But I haven't been able to create s_indices (size 1.6 x 10^9), a_indices (size 1.6 x 10^9), R (size1.6 x 10^9), and Q (3 nonzero entries for each of 1.6 x 10^9 pairs) without having the memory full.

    One (and only?) resolution would be to have DiscreteDP accept functions (callables) for R and Q.

  2. With the "state-action pairs formulation", DiscreteDP should work without the s_indices and a_indices arrays when all the actions are available at all states (otherwise they should be necessary).

  3. Would we want a finite horizon algorithm?

    • The current implementation does not accept beta=1 because of the convergence/zero-division issues.

    EDIT: Done by #244.

  4. Are there other important informations to store in DPSolveResult?

  5. The method operator_iteration may be replaced with compute_fixed_point, whereas the latter does not exactly match here. See also #40 and #41.

  6. Inputs checking is not exhaustive.

    • It is not checked that the entries of Q are nonnegative and sum to one for each state-action pair.
    • For the "state-action pairs formulation", it is not checked that there are no duplicate action indices for the same state. (For this, it may be better to write custom code to sort s_indices and a_indices (when the inputs are not sorted), which currently is delegated to spcipy.sparse.csr_matrix.)

    If we check everything, maybe better to supply an option to skip the checks for the case when the user is sure that the inputs are correct and creates an instance many times in a loop? Or, maybe should collect the checking procedures in a method and assume that the user calls it manually?

  7. __repr__ and __str__ are missing. What do we want to print?

@jstac

This comment has been minimized.

Copy link
Contributor

jstac commented Sep 13, 2015

Regarding the possibility of passing callables to the DiscreteDP class, it seems that jitted functions can call jitted functions, which should be all that's necessary.

Here's an example where this happens from the Numba documentation:

(create_fractal calls madel)

It's an interesting issue. When the problem has small or medium size it should be faster to compute all the arrays and store them in memory. So for the problems I've worked on the code seems very quick. But for large problems we have a memory issue.

@mmcky When that server at SMU is up and running how much RAM will we have? It might be fun to run this same problem @oyamad references without changing the code to see how fast it runs.

@sglyon

This comment has been minimized.

Copy link
Member

sglyon commented Sep 14, 2015

@jstac If we want to run this on a machine with enough RAM I can put it up on the HPC resources at NYU. There we can request as much memory as we want.

Let me know if you would like to try this.

Regarding the points in the original post (disclaimer, I haven't read the current closely enough to comment on most of the points above):

  1. I think accepting callables is a great idea.
@jstac

This comment has been minimized.

Copy link
Contributor

jstac commented Sep 14, 2015

Thanks for comments. We'll have a fancy machine at SMU within about 2 weeks that will have 256GB RAM so we were planning to wait for that. (Not that I want to stop you if you're curious.)

Regarding passing callables, there's a lot of stuff I don't know concerning how this would work. Do we jit the callables as they come in? Or ask the user to numba-fy them? Perhaps the former, but then we have to warn the user if they can't be jit compiled. So what do we tell them?

Can internal jitted functions work with jitted functions passed in as arguments? Will they know the return types?

(Clearly this will all be smoother in Julia.)

@cc7768

This comment has been minimized.

Copy link
Member

cc7768 commented Sep 14, 2015

I think using callables would be a good idea as well -- Unfortunately, the last time I tried, passing jitted functions as arguments does not work. I believe this is on numba's radar though (numba/numba#821).

@davidrpugh

This comment has been minimized.

Copy link
Contributor

davidrpugh commented Sep 14, 2015

@jstac

At the end of issue #159 I think I have a partial solution showing how jitted functions can be passed as arguments to other jitted functions. I am not entirely happy with the proposed solution, but haven't had time to improve upon it yet...

@oyamad

This comment has been minimized.

Copy link
Member Author

oyamad commented Nov 2, 2015

Related to point 3:
Should we allow DiscreteDP to accept beta=1 while raising NotImplementedError if evaluate_policy, policy_iteration, or modified_policy_iteration is called?

@oyamad

This comment has been minimized.

Copy link
Member Author

oyamad commented Nov 2, 2015

Re 3. Another issue:
As an example of finite horizon DP with discounting, I tried the option pricing example from Miranda and Fackler, Section 7.6.4:
http://nbviewer.ipython.org/github/oyamad/mdp/blob/master/ddp_ex_MF_7_6_4.ipynb

The figure for the "Put Option Optimal Exercise Boundary" (the right panel at the bottom) looks different from the corresponding figure in Miranda and Fackler (Figure 7.4(b), p.180). I wanted to check the output of the original code from CompEcon Toolbox, but I do not have Matlab in my computers.

Could anybody who has CompEcon Toolbox with Matlab (and time) please compare the outputs?

@oyamad oyamad referenced this issue Dec 3, 2015

Merged

Add Discrete DP #78

2 of 4 tasks complete
@oyamad

This comment has been minimized.

Copy link
Member Author

oyamad commented Mar 16, 2016

Re 1:
Is it a feasible option to store the arrays in disk? Would Blaze help?

@albop

This comment has been minimized.

Copy link
Contributor

albop commented Mar 16, 2016

I missed the beginnning of this discussion and haven't looked closely at the code, but regarding point 1:

1/ I would be very curious to see what can be done with Blaze
2/ Do you really need to compute all state-action pairs at once ? Since the iteration ultimately involves a reduction w.r.t. the actions, I would hazard that it could work if you do it chunks by chunks. If so, it would be quite suboptimal to compute all the chunks and store them on the disk.

@jstac

This comment has been minimized.

Copy link
Contributor

jstac commented Mar 16, 2016

This is an interesting discussion. The key issue is how advantageous it is to compute transition probabilities and rewards up front. My prior is that it is very advantageous, given that we need to repeatedly iterate using these values or solve linear systems using them. See, for example, this routine.

In the end I think it's best if we offer two flavors. One is where all arrays are computed up front. This is fast but memory intensive. Another is where all quantities are computed on the fly, and routines like the one discussed above are replaced by iterative methods (the T_sigma operator is also a contraction mapping).

@albop

This comment has been minimized.

Copy link
Contributor

albop commented Mar 16, 2016

I shall add that dealing with chunks in an efficient way, is precisely what blaze is supposed to be able to do. Another option to consider is xarray+dask. Actually getting some experience with xarray is probably a good investment, as it seems to be developped at a very fast pace.

@oyamad oyamad referenced this issue Mar 17, 2016

Closed

TODO: DiscreteDP: Allow beta=1 #242

3 of 3 tasks complete
@robertdkirkby

This comment has been minimized.

Copy link

robertdkirkby commented Mar 20, 2016

@jstac
I can confirm that the computational advantage to upfront computation of the rewards matrix is huge (orders of magnitude). I have own codes that use a slightly different approach to pure discretized value fn iteration than the one you lot are using (my implementation is in matlab with gpu). Based on this experience I would suggest that the idea of offering the current implementation as the default behaviour, but also allowing a 'lowmemory' option which does not pre-compute the reward matrix is a good combination.

@jstac

This comment has been minimized.

Copy link
Contributor

jstac commented Mar 20, 2016

@robertdkirkby Great. We're going to run some horse races soon and post the results somewhere. Probably here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment