Merge branch 'master' into parallel_sim

UDST · Apr 23, 2019 · a094679 · a094679
2 parents 8d61a72 + 5cf353e
commit a094679
Show file tree

Hide file tree

Showing 51 changed files with 476 additions and 25,508 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,5 @@
+data/
+
 # Jupyter checkpoints
 **/.ipynb_checkpoints
 .pytest_cache/*

diff --git a/.readthedocs.yml b/.readthedocs.yml
diff --git a/.travis.yml b/.travis.yml
@@ -5,29 +5,23 @@ python:
   - "3.5"
   - "3.6"
 
-before_install:
-  - pip install --upgrade pip
-  - pip install --upgrade wheel
-  - wget http://bit.ly/miniconda -O miniconda.sh
-  - bash miniconda.sh -b -p $HOME/miniconda
-  - export PATH="$HOME/miniconda/bin:$PATH"
-  - hash -r
-  - conda config --set always_yes yes --set show_channel_urls true
-  - conda update conda
-  - conda config --add channels conda-forge --force
-  - conda config --add channels udst --force
-  - conda create --quiet --name TESTENV python=$TRAVIS_PYTHON_VERSION --file requirements.txt --file requirements-dev.txt
-  - source activate TESTENV
-  - conda info --all
-  - conda list
+matrix:
+  include:
+    - python: "3.7"  # temp solution until travis supports python 3.7 more cleanly
+      dist: xenial
+      sudo: true
 
 install:
   - pip install .
+  - pip install -r requirements-dev.txt
+  - # extra tests run if urbansim is present, but it can't install with python 3.7
+  - if [ "$TRAVIS_PYTHON_VERSION" != "3.7" ]; then pip install urbansim; fi
+  - pip list
   - pip show choicemodels
 
 script:
-  - coverage run --source choicemodels -m pytest --verbose
+  - coverage run --source choicemodels --module pytest --verbose
 
 after_success:
-  - coverage report -m
+  - coverage report --show-missing
   - coveralls
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,75 @@
+# ChoiceModels change log
+
+### 0.2.1 (2019-01-30)
+
+- fixes a distribution error that excluded the LICENSE.txt file
+
+### 0.2 (2019-01-25)
+
+- production release
+
+### 0.2.dev10 (2019-01-25)
+
+- moves the `choicemodels.tools.distancematrix` functions directly into `choicemodels.tools`
+
+### 0.2.dev9 (2019-01-22)
+
+- improves documentation and packaging
+
+### 0.2.dev8 (2019-01-21)
+
+- prevents an infinite loop in `interative_lottery_choices()` when none of the remaining alternatives can accommodate any of the remaining choosers
+
+### 0.2.dev7 (2018-12-12)
+
+- adds a check to the `MergedChoiceTable` constructor to make sure there aren't any column names that overlap between the observations and alternatives tables
+
+### 0.2.dev6 (2018-11-23)
+
+- resolves deprecation warnings from older code
+
+- removes `choicemodels.tools.mnl_simulate()` (originally from `urbansim.urbanchoice.mnl`), because this functionality has been fully replaced
+
+- removes `choicemodels.Logit`, which wrapped a StatsModels estimator as proof of concept for MNL and didn't provide much value on its own
+
+### 0.2.dev5 (2018-11-12)
+
+- adds a `chooser_batch_size` parameter to `iterative_lottery_choices()`, to support batch simulation for very large datasets
+
+### 0.2.dev4 (2018-10-15)
+
+- adds a function `choicemodels.tools.iterative_lottery_choices()` for simulation of choices where the alternatives have limited capacity and choosers have varying probability distributions over the alternatives
+
+- in `MergedChoiceTable`, empty choosers or alternatives now produces an empty choice table (rather than an exception)
+
+- adds support for multiple tables of interaction terms in `MergedChoiceTable`
+
+### 0.2.dev3 (2018-10-03)
+
+- adds a function `choicemodels.tools.monte_carlo_choices()` for efficient simulation of choices for a list of scenarios that have differing probability distributions, but no capacity constraints on the alternatives
+
+### 0.2.dev2 (2018-09-12)
+
+- adds a `probabilities()` method to the `MultinomialLogitResults` class, which uses the fitted model coefficients to generate predicted probabilities for a table of choice scenarios
+
+- adds a required `model_experssion` parameter to the `MultinomialLogitResults` constructor
+
+### 0.2.dev1 (2018-08-06)
+
+- improves the reliability of the native MNL estimator: (a) reduces the chance of a memory overflow when exponentiating utilities and (b) reports warnings from SciPy if the likelihood maximization algorithm may not have converged correctly
+
+- adds substantial functionality to the `MergedChoiceTable` utility: sampling of alternatives with or without replacement, alternative-specific weights, interaction weights that apply to combinations of choosers and alternatives, automatic joining of interaction terms onto the merged table, non-sampling (all the alternatives available for each chooser), and estimation/simulation support for all combinations
+
+- `LargeMultinomialLogit` class now optionally accepts a `MergedChoiceTable` as input
+
+### 0.2.dev0 (2018-07-09)
+
+- adds additional information to the summary table for the native MNL estimator: number of observations, df of the model, df of the residuals, rho-squared, rho-bar-squared, BIC, AIC, p values, timestamp
+
+### 0.1.1 (2018-03-08)
+
+- packaging improvements
+
+### 0.1 (2018-03-08)
+
+- initial release
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,94 @@
+Thanks for using ChoiceModels! 
+
+This is an open source project that's part of the Urban Data Science Toolkit. Development and maintenance is a collaboration between UrbanSim Inc and U.C. Berkeley's Urban Analytics Lab. 
+
+You can contact Sam Maurer, the lead developer, at `maurer@urbansim.com`.
+
+
+## If you have a problem:
+
+- Take a look at the [open issues](https://github.com/UDST/choicemodels/issues) and [closed issues](https://github.com/UDST/choicemodels/issues?q=is%3Aissue+is%3Aclosed) to see if there's already a related discussion
+
+- Open a new issue describing the problem -- if possible, include any error messages, the operating system and version of python you're using, and versions of any libraries that may be relevant
+
+
+## Feature proposals:
+
+- Take a look at the [open issues](https://github.com/UDST/choicemodels/issues) and [closed issues](https://github.com/UDST/choicemodels/issues?q=is%3Aissue+is%3Aclosed) to see if there's already a related discussion
+
+- Post your proposal as a new issue, so we can discuss it (some proposals may not be a good fit for the project)
+
+
+## Contributing code:
+
+- Create a new branch of `UDST/choicemodels`, or fork the repository to your own account
+
+- Make your changes, following the existing styles for code and inline documentation
+
+- Add [tests](https://github.com/UDST/choicemodels/tree/master/tests) if possible!
+
+- Open a pull request to the `UDST/choicemodels` master branch, including a writeup of your changes -- take a look at some of the closed PR's for examples
+
+- Current maintainers will review the code, suggest changes, and hopefully merge it!
+
+
+## Updating the version number:
+
+- Each pull request that changes substantive code should increment the development version number, e.g. from `0.2.dev7` to `0.2.dev8`, so that users know exactly which version they're running
+
+- It works best to do this just before merging (in case other PR's are merged first, and so you know the release date for the changelog and documentation)
+
+- There are three places where the version number needs to be changed: 
+  - `setup.py`
+  - `choicemodels/__init__.py`
+  - `docs/source/index.rst`
+
+- Please also add a section to `CHANGELOG.md` describing the changes!
+
+
+## Updating the documentation: 
+
+- See instructions in `docs/README.md`
+
+
+## Preparing a production release:
+
+- Make a new branch for release prep
+
+- Update the version number and `CHANGELOG.md`
+
+- Make sure all the tests are passing, and check if updates are needed to `README.md` or to the documentation
+
+- Open a pull request to the master branch and merge it
+
+- Tag the release on Github
+
+
+## Distributing a release on PyPI (for pip installation):
+
+- Register an account at https://pypi.org, ask one of the current maintainers to add you to the project, and `pip install twine`
+
+- Run `python setup.py sdist bdist_wheel --universal`
+
+- This should create a `dist` directory containing two package files -- delete any old ones before the next step
+
+- Run `twine upload dist/*` -- this will prompt you for your pypi.org credentials
+
+- Check https://pypi.org/project/choicemodels/ for the new version
+
+
+## Distributing a release on Conda Forge (for conda installation):
+
+- Make a fork of the [conda-forge/choicemodels-feedstock](https://github.com/conda-forge/choicemodels-feedstock) repository -- there may already be a fork in udst
+
+- Edit `recipe/meta.yaml`: 
+  - update the version number
+  - paste a new hash matching the tar.gz file that was uploaded to pypi (it's available on the pypi.org project page)
+
+- Check that the run requirements still match `requirements.txt`
+
+- Open a pull request to the `conda-forge/choicemodels-feedstock` master branch
+
+- Automated tests will run, and after they pass one of the current project maintainers will be able to merge the PR -- you can add your Github user name to the maintainers list in `meta.yaml` for the next update
+
+- Check https://anaconda.org/conda-forge/choicemodels for the new version (may take a few minutes for it to appear)
diff --git a/LICENSE → LICENSE.txt b/LICENSE → LICENSE.txt
@@ -1,4 +1,4 @@
-Copyright (c) 2018, Urban Analytics Lab. All rights reserved.
+Copyright (c) 2019, Urban Analytics Lab. All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1,2 +1,2 @@
-include LICENSE
+include LICENSE.txt
 include requirements.txt
diff --git a/README.md b/README.md
@@ -1,44 +1,40 @@
 [![Build Status](https://travis-ci.org/UDST/choicemodels.svg?branch=master)](https://travis-ci.org/UDST/choicemodels)
 [![Coverage Status](https://coveralls.io/repos/github/UDST/choicemodels/badge.svg?branch=master)](https://coveralls.io/github/UDST/choicemodels?branch=master)
+[![Docs Status](https://readthedocs.org/projects/choicemodels/badge/?version=latest)](https://choicemodels.readthedocs.io)
 
 # ChoiceModels
 
-This is a package for discrete choice model estimation and simulation, with an emphasis on large choice sets and behavioral refinements to multinomial models. Most of these models are not available in Statsmodels or Scikit-learn.
+ChoiceModels is a Python library for discrete choice modeling, with utilities for sampling, simulation, and other ancillary tasks. It's part of the [Urban Data Science Toolkit](https://docs.udst.org) (UDST).
 
-The underlying estimation routines come from two main places: (1) the `urbanchoice` codebase, which has been moved into ChoiceModels, and (2) Timothy Brathwaite's PyLogit package, which handles more flexible model specifications.
 
+### Features
 
+The library currently focuses on tools to help integrate discrete choice models into larger workflows, drawing on other packages such as the excellent [PyLogit](https://github.com/timothyb0912/pylogit) for most estimation of models. 
 
-## Documentation
+ChoiceModels can automate the creation of choice tables for estimation or simulation, using uniform or weighted random sampling of alternatives, as well as interaction terms or cartesian merges. 
 
-Package documentation is available on [readthedocs](https://choicemodels.readthedocs.io/).
+It also provides general-purpose tools for Monte Carlo simulation of choices given probability distributions from fitted models, with fast algorithms for independent or capacity-constrained choices. 
 
+ChoiceModels includes a custom engine for Multinomial Logit estimation that's optimized for fast performance with large numbers of alternatives.
 
 
-## Installation
+### Installation
 
-Install with pip:
+ChoiceModels can be installed using the Pip or Conda package managers:
 
-`pip install choicemodels`
+```
+pip install choicemodels
+```
 
-or with conda-forge.
+```
+conda install choicemodels --channel conda-forge
+```
 
 
+### Documentation
 
-## Current functionality
+See the online documentation for much more: https://choicemodels.readthedocs.io
 
-`choicemodels.tools.MergedChoiceTable()`
+Some additional documentation is available within the repo in `CHANGELOG.md`, `CONTRIBUTING.md`, `/docs/README.md`, and `/tests/README.md`.
 
-- Generates a merged long-format table of choosers and alternatives.
-
-`choicemodels.MultinomialLogit()`
-
-- Fits MNL models, using either the ChoiceModels or PyLogit estimation engines.
-
-`chociemodels.MultinomialLogitResults()`
-
-- Stores and reports fitted MNL models.
-
-There's documentation in these classes' docstrings, and a usage demo in a Jupyter notebook.
-
-https://github.com/udst/choicemodels/blob/master/notebooks/Destination-choice-models-02.ipynb
+There's discussion of current and planned features in the [Pull requests](https://github.com/udst/choicemodels/pulls?utf8=✓&q=is%3Apr) and [Issues](https://github.com/udst/choicemodels/issues?utf8=✓&q=is%3Aissue), both open and closed.
diff --git a/choicemodels/__init__.py b/choicemodels/__init__.py
@@ -3,4 +3,4 @@
 
 from .mnl import MultinomialLogit, MultinomialLogitResults
 
-version = __version__ = '0.2.dev7'
+version = __version__ = '0.2.1'
diff --git a/choicemodels/mnl.py b/choicemodels/mnl.py
@@ -76,9 +76,6 @@ class MultinomialLogit(object):
     and the alternatives. Attributes of a particular alternative may vary for different
     choosers (distance, for example), but this must be set up manually in the input data.
 
-    [TO DO: comparison of the estimation engines]
-    [TO DO: testing and input validation]
-
     Note that prediction methods are in a separate class: see MultinomialLogitResults().
 
     Parameters
@@ -250,7 +247,7 @@ class MultinomialLogitResults(object):
         If not provided, these will be extracted from the raw results.
 
     estimation_engine : str, optional
-        'ChoiceModels' (default) or 'PyLogit'.  # TO DO - infer from model_expression?
+        'ChoiceModels' (default) or 'PyLogit'.
 
     """
     def __init__(self, model_expression, results=None, fitted_parameters=None, 
@@ -287,11 +284,6 @@ def probabilities(self, data):
         Generate predicted probabilities for a table of choice scenarios, using the fitted
         parameters stored in the results object.
         
-        TO DO - make sure this handles pylogit case
-        
-        TO DO - does MergedChoiceTable guarantee that alternatives for a single scenario
-        are consecutive? seems like a requirement here; should document it
-        
         Parameters
         ----------
         data : choicemodels.tools.MergedChoiceTable
@@ -307,6 +299,11 @@ def probabilities(self, data):
         pandas.Series with indexes matching the input
         
         """
+        # TO DO - make sure this handles pylogit case
+
+        # TO DO - does MergedChoiceTable guarantee that alternatives for a single scenario
+        # are consecutive? seems like a requirement here; should document it
+
         df = data.to_frame()
         numalts = data.sample_size  # TO DO - make this an official MCT param
 

diff --git a/choicemodels/tools/__init__.py b/choicemodels/tools/__init__.py
@@ -1,5 +1,6 @@
 # ChoiceModels
 # See full license in LICENSE
 
+from .distancematrix import *
 from .mergedchoicetable import *
 from .simulation import *
diff --git a/choicemodels/tools/mergedchoicetable.py b/choicemodels/tools/mergedchoicetable.py
@@ -133,15 +133,11 @@ def __init__(self, observations, alternatives, chosen_alternatives=None,
         # Check for duplicate column names
         obs_cols = list(observations.columns) + list(observations.index.names)
         alt_cols = list(alternatives.columns) + list(alternatives.index.names)
-        dupes = [c for c in obs_cols if c in alt_cols]
-
-        if len(dupes) == 1:
-            raise ValueError("Column '{}' appears in both input tables. Please ensure "
-                             "column names are unique before merging".format(dupes[0]))
-        elif len(dupes) > 1:
-            raise ValueError("Columns '{}' appear in both input tables. Please ensure "
-                             "column names are unique before merging"\
-                             .format("', '".join(dupes)))
+        dupes = set(obs_cols) & set(alt_cols)
+
+        if len(dupes) > 0:
+            raise ValueError("Both input tables contain column {}. Please ensure "
+                             "column names are unique before merging".format(dupes))
 
         # Normalize weights to a pd.Series
         if (weights is not None) & isinstance(weights, str):