[0.2.dev3] Choice simulation without capacity constraints #43

smmaurer · 2018-09-13T21:55:29Z

This PR adds functionality for efficient Monte Carlo simulation of choices for a set of K scenarios, each having different probability distributions (and potentially different alternatives). Choices are independent and unconstrained, meaning that the same alternative can be chosen in multiple scenarios.

This is a component of issue #26. With this PR, we have full support in ChoiceModels for unconstrained choice simulation. The next PR will handle capacity constraints. A separate PR in UrbanSim Templates will provide access to this logic.

Discussion

This PR adds a tool called choicemodels.tools.monte_carlo_choices().

Using this is equivalent to applying np.random.choice() to each of K scenarios, but it's implemented as a single-pass matrix calculation. This is about 50x faster than using df.apply() or a loop. The algorithm is adapted from urbansim.urbanchoice.

For cases where all the choice scenarios have the same probability distribution among alternatives, you don't need this function. You can use np.random.choice() with size=K, which will be more efficient. (For example, that would work for a choice model whose expression includes only attributes of the alternatives.)

PR includes a unit test that confirms the simulated choices align with the provided probabilities.

Usage

from choicemodels.tools import monte_carlo_choices

probabilities =  # pd.Series indexed with observation id and alternative id
choices = monte_carlo_choices(probabilities)

This is implemented as a general-purpose function that can accept any list of indexed probabilities -- so it will work with output from our own MNL estimator, or PyLogit, or future model types. It can be called directly or used as the back end for a model template.

Performance

Overall the performance is excellent, especially compared to df.apply() as noted above.

Simulating choices is faster than calculating choice probabilities from the MNL utility equations. For 1 million choice scenarios with 10 alternatives each, calculating the probabilities takes 1.0 seconds and then simulating choices takes 0.5 seconds, on an old i5 MacBook.

Although this seems fine in absolute terms, it's worth noting that it's a little bit slower than the 100%-numpy implementation in the original urbansim.urbanchoice codebase. It looks like this is caused by overhead from requiring the probabilities to be formatted as an indexed pandas object.

Profiling indicates that 65% of the execution time, and the vast majority of memory usage, comes from a couple of initial pandas operations. The numpy matrix math is very efficient in comparison.

I think for now, the clean data format is worth the performance hit. But I'd like to go through and do more careful profiling of other parts of the codebase in light of this.

Other changes

reorders parameters in the MultinomialLogitResults() constructor and makes the estimation_engine parameter optional
improves the efficiency of MultinomialLogitResults.probabilities()

Versioning

updates ChoiceModels version number to 0.2.dev3

coveralls · 2018-09-13T22:14:56Z

Coverage increased (+0.7%) to 66.615% when pulling 21167d7 on unconstrained-prediction into 9684d3a on master.

mxndrwgrdnr

This all seems reasonably straightforward to me. I wonder though, if the pandas conversion is whats causing the slow down, if its really worth it instead of just making it pure numpy? I agree the clean API is nice to have, but do we expect many users to actually leverage it down at this level? Certainly most UrbanSim users would not. And users that want to leverage choicemodels outside of an UrbanSim context definitely might want to access these lower level functions, but in this scenario they are probably less likely to be working with pandas objects to begin with, maybe? Ultimately I agree its not a huge performance hit, and shouldn't have a big impact on overall UrbanSim runtimes. Just food for thought.

smmaurer · 2018-09-13T23:48:48Z

@mxndrwgrdnr Thanks for taking a look at this. Yeah, these are good points. I guess we could also leave pandas formats as the default but provide an option for passing numpy arrays directly, which we could use in the UrbanSim templates to maximize performance. I'll look into this more as i'm building out the capacity-constrained choice simulation, which i think will have much worse performance.

smmaurer added 8 commits September 12, 2018 15:45

Cleaning up MCT imports

0191885

Updating MCT docstrings

310d3ce

Adding simulation.py

b3c3944

Fast unconstrained simulation

8570edd

Docstring cleanup

f06fdab

Choice simulations run

241afb6

Cleanup

c023b29

Better documentation

6079802

smmaurer requested a review from mxndrwgrdnr September 13, 2018 21:59

mxndrwgrdnr reviewed Sep 13, 2018

View reviewed changes

Adding tests

46108e1

smmaurer added 6 commits September 13, 2018 16:54

Division fix for python 2.7

94aaa24

Bug fix

f0310c6

Plan for finalizing unconstrained choice simulation

5ebe0b3

Revised documentation

0edc1d6

Updating tests

051dad8

Loose ends

21167d7

smmaurer requested a review from janowicz October 3, 2018 18:59

janowicz approved these changes Oct 3, 2018

View reviewed changes

smmaurer merged commit 36eb8ea into master Oct 3, 2018

smmaurer deleted the unconstrained-prediction branch October 3, 2018 19:29

smmaurer mentioned this pull request Oct 11, 2018

[0.2.dev4] Choice simulation with capacity constraints #44

Merged

smmaurer changed the title ~~Choice simulation without capacity constraints~~ [0.2.dev3] Choice simulation without capacity constraints Nov 10, 2018

smmaurer mentioned this pull request Mar 22, 2019

Sampling of alternatives: performance optimization #39

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.2.dev3] Choice simulation without capacity constraints #43

[0.2.dev3] Choice simulation without capacity constraints #43

smmaurer commented Sep 13, 2018 •

edited

coveralls commented Sep 13, 2018 •

edited

mxndrwgrdnr left a comment •

edited

smmaurer commented Sep 13, 2018

[0.2.dev3] Choice simulation without capacity constraints #43

[0.2.dev3] Choice simulation without capacity constraints #43

Conversation

smmaurer commented Sep 13, 2018 • edited

Discussion

Usage

Performance

Other changes

Versioning

coveralls commented Sep 13, 2018 • edited

mxndrwgrdnr left a comment • edited

Choose a reason for hiding this comment

smmaurer commented Sep 13, 2018

smmaurer commented Sep 13, 2018 •

edited

coveralls commented Sep 13, 2018 •

edited

mxndrwgrdnr left a comment •

edited