Skip to content

Commit

Permalink
Rename to glum! (#444)
Browse files Browse the repository at this point in the history
* Remove QC infrastructure dependencies.

* Add install conda-build and yq where needed.

* Rename to glum.

* Rename.

Co-authored-by: Marc-Antoine Schmidt <marc-antoine.schmidt@quantco.com>
  • Loading branch information
tbenthompson and MarcAntoineSchmidtQC committed Oct 7, 2021
1 parent e19de69 commit 5c06a90
Show file tree
Hide file tree
Showing 73 changed files with 255 additions and 254 deletions.
2 changes: 1 addition & 1 deletion .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,5 @@ select = B,C,E,F,W,T4,B9,D
enable-extensions = flake8-docstrings
per-file-ignores =
tests/**:D101,D102,D103
src/quantcore/glm/_glm.py:D
src/glum/_glm.py:D
docstring-convention = numpy
4 changes: 2 additions & 2 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@
* @tbenthompson @MarcAntoineSchmidtQC

# Core
/src/quantcore/glm/ @MarcAntoineSchmidtQC
/src/glum/ @MarcAntoineSchmidtQC

# Cython / C++
*.pyx @tbenthompson
*.cpp @tbenthompson

# GLM benchmarks
/src/quantcore/glm_benchmarks/ @tbenthompson
/src/glum_benchmarks/ @tbenthompson

# Docs
/docs/ @MarcAntoineSchmidtQC
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/tests-win-master.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ jobs:
miniforge-version: 4.10.0-0
use-mamba: true
environment-file: environment.yml
activate-environment: quantcore.glm
activate-environment: glum
- name: Install benchmark dependencies
shell: pwsh
run: |
mamba env update -n quantcore.glm --file environment-benchmark.yml
mamba env update -n glum --file environment-benchmark.yml
- name: Run Unit Tests
shell: pwsh
run: |
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/tests-win.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,11 @@ jobs:
miniforge-version: 4.10.0-0
use-mamba: true
environment-file: environment.yml
activate-environment: quantcore.glm
activate-environment: glum
- name: Install benchmark dependencies
shell: pwsh
run: |
mamba env update -n quantcore.glm --file environment-benchmark.yml
mamba env update -n glum --file environment-benchmark.yml
- name: Run Unit Tests
shell: pwsh
run: |
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ repos:
flake8-print=4.0.0,
pep8-naming=0.11.1,
]
exclude: ^src/quantcore/glm_benchmarks/orig_sklearn_fork/
exclude: ^src/glum_benchmarks/orig_sklearn_fork/
- repo: https://github.com/Quantco/pre-commit-mirrors-isort
rev: 5.6.4
hooks:
Expand Down
45 changes: 23 additions & 22 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,13 @@ Unreleased

**Breaking changes:**

- :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV` lose the ``fit_dispersion`` parameter.
- Renamed the package to ``glum``!! Hurray! Celebration.
- :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV` lose the ``fit_dispersion`` parameter.
Please use the :meth:`dispersion` method of the appropriate family instance instead.
- All functions now use ``sample_weight`` as a keyword instead of ``weights``, in line with scikit-learn.
- All functions now use ``dispersion`` as a keyword instead of ``phi``.
- Several methods :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV` that should have been private have had an underscore prefixed on their names: :meth:`tear_down_from_fit`, :meth:`_set_up_for_fit`, :meth:`_set_up_and_check_fit_args`, :meth:`_get_start_coef`, :meth:`_solve` and :meth:`_solve_regularization_path`.
- :meth:`quantcore.glm.GeneralizedLinearRegressor.report_diagnostics` and :meth:`quantcore.glm.GeneralizedLinearRegressor.get_formatted_diagnostics` are now public.
- Several methods :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV` that should have been private have had an underscore prefixed on their names: :meth:`tear_down_from_fit`, :meth:`_set_up_for_fit`, :meth:`_set_up_and_check_fit_args`, :meth:`_get_start_coef`, :meth:`_solve` and :meth:`_solve_regularization_path`.
- :meth:`glum.GeneralizedLinearRegressor.report_diagnostics` and :meth:`glum.GeneralizedLinearRegressor.get_formatted_diagnostics` are now public.

**New features:**

Expand All @@ -26,45 +27,45 @@ Unreleased
all with the same value.
- :class:`ExponentialDispersionModel` gains a :meth:`dispersion` method.
- :class:`BinomialDistribution` and :class:`TweedieDistribution` gain a :meth:`log_likelihood` method.
- The :meth:`fit` method of :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV`
- The :meth:`fit` method of :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV`
now saves the column types of pandas data frames.
- :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV` gain two properties: ``family_instance`` and ``link_instance``.
- :meth:`~quantcore.glm.GeneralizedLinearRegressor.std_errors` and :meth:`~quantcore.glm.GeneralizedLinearRegressor.covariance_matrix` have been added and support non-robust, robust (HC-1), and clustered
- :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV` gain two properties: ``family_instance`` and ``link_instance``.
- :meth:`~glum.GeneralizedLinearRegressor.std_errors` and :meth:`~glum.GeneralizedLinearRegressor.covariance_matrix` have been added and support non-robust, robust (HC-1), and clustered
covariance matrices.
- :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV` now accept ``family='gaussian'`` as an alternative to ``family='normal'``.
- :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV` now accept ``family='gaussian'`` as an alternative to ``family='normal'``.

**Bug fix:**

- The :meth:`score` method of :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV` now accepts data frames.
- The :meth:`score` method of :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV` now accepts data frames.

**Other:**

- A major overhaul of the documentation. Everything is better!
- The methods of the link classes will now return scalars when given scalar inputs. Under certain circumstances, they'd return zero-dimensional arrays.
- There is a new benchmark available ``glm_benchmarks_run`` based on the Boston housing dataset. See `here <https://github.com/Quantco/quantcore.glm/pull/376>`_.
- ``glm_benchmarks_analyze`` now includes ``offset`` in the index. See `here <https://github.com/Quantco/quantcore.glm/issues/346>`_.
- There is a new benchmark available ``glm_benchmarks_run`` based on the Boston housing dataset. See `here <https://github.com/Quantco/glum/pull/376>`_.
- ``glm_benchmarks_analyze`` now includes ``offset`` in the index. See `here <https://github.com/Quantco/glum/issues/346>`_.
- ``glmnet_python`` was removed from the benchmarks suite.
- The innermost coordinate descent was optimized. This speeds up coordinate descent dominated problems like LASSO by about 1.5-2x. See `here <https://github.com/Quantco/quantcore.glm/pull/424>`_.
- The innermost coordinate descent was optimized. This speeds up coordinate descent dominated problems like LASSO by about 1.5-2x. See `here <https://github.com/Quantco/glum/pull/424>`_.

1.5.1 - 2021-07-22
------------------

**Bug fix:**

* Have the :meth:`linear_predictor` and :meth:`predict` methods of :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV`
* Have the :meth:`linear_predictor` and :meth:`predict` methods of :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV`
honor the offset when ``alpha`` is ``None``.

1.5.0 - 2021-07-15
------------------

**New features:**

* The :meth:`linear_predictor` and :meth:`predict` methods of :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV`
* The :meth:`linear_predictor` and :meth:`predict` methods of :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV`
gain an ``alpha`` parameter (in complement to ``alpha_index``). Moreover, they are now able to predict for multiple penalties.

**Other:**

* Methods of :class:`~quantcore.glm._link.Link` now consistently return NumPy arrays, whereas they used to preserve pandas series in special cases.
* Methods of :class:`~glum._link.Link` now consistently return NumPy arrays, whereas they used to preserve pandas series in special cases.
* Don't list ``sparse_dot_mkl`` as a runtime requirement from the conda recipe.
* The minimal ``numpy`` pin should be dependent on the ``numpy`` version in ``host`` and not fixed to ``1.16``.

Expand All @@ -89,7 +90,7 @@ Unreleased

**Other:**

- Small improvement in documentation for the ``alpha_index`` argument to :meth:`~quantcore.glm.GeneralizedLinearRegressor.predict`.
- Small improvement in documentation for the ``alpha_index`` argument to :meth:`~glum.GeneralizedLinearRegressor.predict`.
- Pinned pre-commit hooks versions.

1.4.1 - 2021-05-01
Expand All @@ -102,18 +103,18 @@ We now have Windows builds!

**Deprecations:**

- Fusing the ``alpha`` and ``alphas`` arguments for :class:`~quantcore.glm.GeneralizedLinearRegressor`. ``alpha`` now also accepts array like inputs. ``alphas`` is now deprecated but can still be used for backward compatibility. The ``alphas`` argument will be removed with the next major version.
- Fusing the ``alpha`` and ``alphas`` arguments for :class:`~glum.GeneralizedLinearRegressor`. ``alpha`` now also accepts array like inputs. ``alphas`` is now deprecated but can still be used for backward compatibility. The ``alphas`` argument will be removed with the next major version.

**Bug fix:**

- We removed entry points to functions in ``quantcore.glm_benchmarks`` from the conda package.
- We removed entry points to functions in ``glum_benchmarks`` from the conda package.

1.3.1 - 2021-04-12
------------------

**Bug fix:**

- :func:`quantcore.glm._distribution.unit_variance_derivative` is
- :func:`glum._distribution.unit_variance_derivative` is
evaluating a proper numexpr expression again (regression in 1.3.0).

1.3.0 - 2021-04-12
Expand All @@ -127,7 +128,7 @@ We now have Windows builds!
1.2.0 - 2021-02-04
------------------

We removed ``quantcore.glm_benchmarks`` from the conda package.
We removed ``glum_benchmarks`` from the conda package.

1.1.1 - 2021-01-11
------------------
Expand All @@ -144,22 +145,22 @@ Maintenance release to get a fresh build for OSX.
1.0.1 - 2020-11-12
------------------

This is a maintenance release to be compatible with ``quantcore.matrix>=1.0.0``.
This is a maintenance release to be compatible with ``tabmat>=1.0.0``.

1.0.0 - 2020-11-11
------------------

**Other:**

- Renamed ``alpha_level`` attribute of :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV` to ``alpha_index``.
- Renamed ``alpha_level`` attribute of :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV` to ``alpha_index``.
- Clarified behavior of ``scale_predictors``.

0.0.15 - 2020-11-11
-------------------

**Other:**

- Pin ``quantcore.matrix<1.0.0`` as we are expecting a breaking change with version 1.0.0.
- Pin ``tabmat<1.0.0`` as we are expecting a breaking change with version 1.0.0.

0.0.14 - 2020-08-06
-------------------
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# Contributing

See the [contributing and development information in our documentation](https://docs.dev.***REMOVED***/***REMOVED***/Quantco/quantcore.glm/latest/contributing.html).
See the [contributing and development information in our documentation](https://docs.dev.***REMOVED***/***REMOVED***/Quantco/glum/latest/contributing.html).
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# quantcore.glm
# glum

![CI](https://github.com/Quantco/glm_benchmarks/workflows/CI/badge.svg)

[Documentation](https://docs.dev.***REMOVED***/***REMOVED***/Quantco/quantcore.glm/latest/index.html)
[Documentation](https://docs.dev.***REMOVED***/***REMOVED***/Quantco/glum/latest/index.html)

Generalized linear models (GLM) are a core statistical tool that include many common methods like least-squares regression, Poisson regression and logistic regression as special cases. At QuantCo, we have used GLMs in e-commerce pricing, insurance claims prediction and more. We have developed `quantcore.glm`, a fast Python-first GLM library. The development was based on [a fork of scikit-learn](https://github.com/scikit-learn/scikit-learn/pull/9405), so it has a scikit-learn-like API. We are thankful for the starting point provided by Christian Lorentzen in that PR!
Generalized linear models (GLM) are a core statistical tool that include many common methods like least-squares regression, Poisson regression and logistic regression as special cases. At QuantCo, we have used GLMs in e-commerce pricing, insurance claims prediction and more. We have developed `glum`, a fast Python-first GLM library. The development was based on [a fork of scikit-learn](https://github.com/scikit-learn/scikit-learn/pull/9405), so it has a scikit-learn-like API. We are thankful for the starting point provided by Christian Lorentzen in that PR!

`quantcore.glm` is at least as feature-complete as existing GLM libraries like `glmnet` or `h2o`. It supports
`glum` is at least as feature-complete as existing GLM libraries like `glmnet` or `h2o`. It supports

* Built-in cross validation for optimal regularization, efficiently exploiting a “regularization path”
* L1 regularization, which produces sparse and easily interpretable solutions
Expand All @@ -15,21 +15,21 @@ Generalized linear models (GLM) are a core statistical tool that include many co
* Normal, Poisson, logistic, gamma, and Tweedie distributions, plus varied and customizable link functions
* Box constraints, linear inequality constraints, sample weights, offsets

This repo also includes tools for benchmarking GLM implementations in the `quantcore.glm_benchmarks` module. For details on the benchmarking, [see here](src/quantcore/glm_benchmarks/README.md). Although the performance of `quantcore.glm` relative to `glmnet` and `h2o` depends on the specific problem, we find that it is consistently much faster for a wide range of problems.
This repo also includes tools for benchmarking GLM implementations in the `glum_benchmarks` module. For details on the benchmarking, [see here](src/glum_benchmarks/README.md). Although the performance of `glum` relative to `glmnet` and `h2o` depends on the specific problem, we find that it is consistently much faster for a wide range of problems.

![](docs/_static/headline_benchmark.png)

For more information on `quantcore.glm`, including tutorials and API reference, please see [the documentation](https://docs.dev.***REMOVED***/***REMOVED***/Quantco/quantcore.glm/latest/index.html).
For more information on `glum`, including tutorials and API reference, please see [the documentation](https://docs.dev.***REMOVED***/***REMOVED***/Quantco/glum/latest/index.html).

# An example: predicting car insurance claim frequency using Poisson regression.

This example uses a public French car insurance dataset.
```python
>>> import pandas as pd
>>> import numpy as np
>>> from quantcore.glm_benchmarks.problems import load_data, generate_narrow_insurance_dataset
>>> from quantcore.glm_benchmarks.util import get_obj_val
>>> from quantcore.glm import GeneralizedLinearRegressor
>>> from glum_benchmarks.problems import load_data, generate_narrow_insurance_dataset
>>> from glum_benchmarks.util import get_obj_val
>>> from glum import GeneralizedLinearRegressor
>>>
>>> # Load the French Motor Insurance dataset
>>> dat = load_data(generate_narrow_insurance_dataset)
Expand Down Expand Up @@ -62,5 +62,5 @@ n_iter

Please install the package through conda-forge:
```bash
conda install quantcore.glm -c conda-forge
conda install glum -c conda-forge
```
10 changes: 5 additions & 5 deletions conda.recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
{% set build_string = "py{}h{}_{}_{}".format(CONDA_PY, PKG_HASH, build_number, GLM_ARCHITECTURE) %}

package:
name: quantcore.glm
name: glum
version: {{ environ.get('GIT_DESCRIBE_TAG', '').lstrip('v') }}{% if environ.get('GIT_DESCRIBE_NUMBER', 0)|int != 0 %}.post{{ GIT_DESCRIBE_NUMBER }}+{{ GIT_DESCRIBE_HASH }}{% endif %}

source:
Expand All @@ -13,7 +13,7 @@ build:
number: {{ build_number }}
string: "{{ build_string }}"
track_features:
{{ "- quantcore-glm-{}".format(GLM_ARCHITECTURE) if GLM_ARCHITECTURE != "default" else "" }}
{{ "- glum-{}".format(GLM_ARCHITECTURE) if GLM_ARCHITECTURE != "default" else "" }}


requirements:
Expand Down Expand Up @@ -41,16 +41,16 @@ requirements:
- pandas
- scikit-learn >=0.23
- scipy
- quantcore.matrix >=1.0.0
- tabmat >=1.0.0

test:
requires:
- pip
commands:
- pip check
imports:
- quantcore.glm
- glum

about:
home: https://github.com/Quantco/quantcore.glm
home: https://github.com/Quantco/glum
license: Proprietary
8 changes: 4 additions & 4 deletions docs/background.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# An introduction to the algorithms used in `quantcore.glm`\n",
"# An introduction to the algorithms used in `glum`\n",
"\n",
"Before continuing, please take a look at the [sklearn documentation](https://scikit-learn.org/stable/modules/linear_model.html#generalized-linear-regression) for a high-level intro to generalized linear models. \n",
"\n",
Expand Down Expand Up @@ -116,7 +116,7 @@
"source": [
"## Optimizers/solvers\n",
"\n",
"There are three solvers implemented in `quantcore.glm`:\n",
"There are three solvers implemented in `glum`:\n",
"\n",
"* `lbfgs` - This solver uses the scipy `fmin_l_bfgs_b` optimizer to minimize L2-penalized GLMs. The L-BFGS solver does not work with L1-penalties. Because L-BFGS does not store the full Hessian, it can be particularly effective for very high dimensional problems with several thousand or more columns. For more details, see [the scipy documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html).\n",
"* `irls-cd` and `irls-ls` - These solvers are both based on Iteratively Reweighted Least Squares (IRLS). IRLS proceeds by iteratively approximating the objective function with a quadratic, then solving that quadratic for the optimal update. For purely L2-penalized settings, the `irls-ls` uses a least squares inner solver for each quadratic subproblem. For problems that have any L1-penalty component, the `irls-cd` uses a coordinate descent inner solver for each quadratic subproblem. The IRLS-LS and IRLS-CD implementations largely follow the algorithm described in `newglmnet` (see references below).\n",
Expand Down Expand Up @@ -206,7 +206,7 @@
"\n",
"Along with the GLM solvers, this package supports dense, sparse, categorical matrix types and mixtures of these types. Using the most efficient matrix representations massively improves performacne. \n",
"\n",
"For more details, see the [README for quantcore.matrix](https://github.com/Quantco/quantcore.matrix)\n",
"For more details, see the [README for tabmat](https://github.com/Quantco/tabmat)\n",
"\n",
"* We support dense matrices via standard numpy arrays. \n",
"* We support sparse CSR and CSC matrices via standard `scipy.sparse` objects. However, we have extended these operations with custom matrix-vector and sandwich product routines that are optimized and parallelized. A user does not need to modify their code to take advantage of this optimization. If a `scipy.sparse.csc_matrix` object is passed in, it will be automatically converted to a `SparseMatrix` object. This operation is almost free because no data needs to be copied.\n",
Expand All @@ -220,7 +220,7 @@
"source": [
"## Standardization\n",
"\n",
"Internal to our solvers, all matrix types are wrapped in a `quantcore.matrix.StandardizedMatrix` which offsets columns to have mean zero and standard deviation one without modifying the matrix data itself. This avoids situations where modifying a matrix to have mean zero would result in losing the sparsity structure. It also avoids ever needing to copy or modify the input data matrix. As a result, excess memory usage is very low in `quantcore.glm`."
"Internal to our solvers, all matrix types are wrapped in a `tabmat.StandardizedMatrix` which offsets columns to have mean zero and standard deviation one without modifying the matrix data itself. This avoids situations where modifying a matrix to have mean zero would result in losing the sparsity structure. It also avoids ever needing to copy or modify the input data matrix. As a result, excess memory usage is very low in `glum`."
]
},
{
Expand Down

0 comments on commit 5c06a90

Please sign in to comment.