Rename to glum! (#444)

* Remove QC infrastructure dependencies. * Add install conda-build and yq where needed. * Rename to glum. * Rename. Co-authored-by: Marc-Antoine Schmidt <marc-antoine.schmidt@quantco.com>
Quantco · Oct 7, 2021 · 5c06a90 · 5c06a90
1 parent e19de69
commit 5c06a90
Show file tree

Hide file tree

Showing 73 changed files with 255 additions and 254 deletions.
diff --git a/.flake8 b/.flake8
@@ -18,5 +18,5 @@ select = B,C,E,F,W,T4,B9,D
 enable-extensions = flake8-docstrings
 per-file-ignores =
     tests/**:D101,D102,D103
-    src/quantcore/glm/_glm.py:D
+    src/glum/_glm.py:D
 docstring-convention = numpy
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -2,14 +2,14 @@
 * @tbenthompson @MarcAntoineSchmidtQC
 
 # Core
-/src/quantcore/glm/ @MarcAntoineSchmidtQC
+/src/glum/ @MarcAntoineSchmidtQC
 
 # Cython / C++
 *.pyx @tbenthompson
 *.cpp @tbenthompson
 
 # GLM benchmarks
-/src/quantcore/glm_benchmarks/ @tbenthompson
+/src/glum_benchmarks/ @tbenthompson
 
 # Docs
 /docs/ @MarcAntoineSchmidtQC

diff --git a/.github/workflows/tests-win-master.yml b/.github/workflows/tests-win-master.yml
@@ -29,11 +29,11 @@ jobs:
           miniforge-version: 4.10.0-0
           use-mamba: true
           environment-file: environment.yml
-          activate-environment: quantcore.glm
+          activate-environment: glum
       - name: Install benchmark dependencies
         shell: pwsh
         run: |
-          mamba env update -n quantcore.glm --file environment-benchmark.yml
+          mamba env update -n glum --file environment-benchmark.yml
       - name: Run Unit Tests
         shell: pwsh
         run: |

diff --git a/.github/workflows/tests-win.yml b/.github/workflows/tests-win.yml
@@ -31,11 +31,11 @@ jobs:
           miniforge-version: 4.10.0-0
           use-mamba: true
           environment-file: environment.yml
-          activate-environment: quantcore.glm
+          activate-environment: glum
       - name: Install benchmark dependencies
         shell: pwsh
         run: |
-          mamba env update -n quantcore.glm --file environment-benchmark.yml
+          mamba env update -n glum --file environment-benchmark.yml
       - name: Run Unit Tests
         shell: pwsh
         run: |

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -20,7 +20,7 @@ repos:
           flake8-print=4.0.0,
           pep8-naming=0.11.1,
         ]
-      exclude: ^src/quantcore/glm_benchmarks/orig_sklearn_fork/
+      exclude: ^src/glum_benchmarks/orig_sklearn_fork/
  - repo: https://github.com/Quantco/pre-commit-mirrors-isort
    rev: 5.6.4
    hooks:

diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -12,12 +12,13 @@ Unreleased
 
 **Breaking changes:**
 
-- :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV` lose the ``fit_dispersion`` parameter.
+- Renamed the package to ``glum``!! Hurray! Celebration. 
+- :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV` lose the ``fit_dispersion`` parameter.
   Please use the :meth:`dispersion` method of the appropriate family instance instead.
 - All functions now use ``sample_weight`` as a keyword instead of ``weights``, in line with scikit-learn.
 - All functions now use ``dispersion`` as a keyword instead of ``phi``.
-- Several methods :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV` that should have been private have had an underscore prefixed on their names: :meth:`tear_down_from_fit`, :meth:`_set_up_for_fit`, :meth:`_set_up_and_check_fit_args`, :meth:`_get_start_coef`, :meth:`_solve` and :meth:`_solve_regularization_path`.
-- :meth:`quantcore.glm.GeneralizedLinearRegressor.report_diagnostics` and :meth:`quantcore.glm.GeneralizedLinearRegressor.get_formatted_diagnostics` are now public.
+- Several methods :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV` that should have been private have had an underscore prefixed on their names: :meth:`tear_down_from_fit`, :meth:`_set_up_for_fit`, :meth:`_set_up_and_check_fit_args`, :meth:`_get_start_coef`, :meth:`_solve` and :meth:`_solve_regularization_path`.
+- :meth:`glum.GeneralizedLinearRegressor.report_diagnostics` and :meth:`glum.GeneralizedLinearRegressor.get_formatted_diagnostics` are now public.
 
 **New features:**
 
@@ -26,45 +27,45 @@ Unreleased
   all with the same value.
 - :class:`ExponentialDispersionModel` gains a :meth:`dispersion` method.
 - :class:`BinomialDistribution` and :class:`TweedieDistribution` gain a :meth:`log_likelihood` method.
-- The :meth:`fit` method of :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV`
+- The :meth:`fit` method of :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV`
   now saves the column types of pandas data frames.
-- :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV` gain two properties: ``family_instance`` and ``link_instance``.
-- :meth:`~quantcore.glm.GeneralizedLinearRegressor.std_errors` and :meth:`~quantcore.glm.GeneralizedLinearRegressor.covariance_matrix` have been added and support non-robust, robust (HC-1), and clustered
+- :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV` gain two properties: ``family_instance`` and ``link_instance``.
+- :meth:`~glum.GeneralizedLinearRegressor.std_errors` and :meth:`~glum.GeneralizedLinearRegressor.covariance_matrix` have been added and support non-robust, robust (HC-1), and clustered
   covariance matrices.
-- :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV` now accept ``family='gaussian'`` as an alternative to ``family='normal'``.
+- :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV` now accept ``family='gaussian'`` as an alternative to ``family='normal'``.
 
 **Bug fix:**
 
-- The :meth:`score` method of :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV` now accepts data frames.
+- The :meth:`score` method of :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV` now accepts data frames.
 
 **Other:**
 
 - A major overhaul of the documentation. Everything is better!
 - The methods of the link classes will now return scalars when given scalar inputs. Under certain circumstances, they'd return zero-dimensional arrays.
-- There is a new benchmark available ``glm_benchmarks_run`` based on the Boston housing dataset. See `here <https://github.com/Quantco/quantcore.glm/pull/376>`_.
-- ``glm_benchmarks_analyze`` now includes ``offset`` in the index. See `here <https://github.com/Quantco/quantcore.glm/issues/346>`_.
+- There is a new benchmark available ``glm_benchmarks_run`` based on the Boston housing dataset. See `here <https://github.com/Quantco/glum/pull/376>`_.
+- ``glm_benchmarks_analyze`` now includes ``offset`` in the index. See `here <https://github.com/Quantco/glum/issues/346>`_.
 - ``glmnet_python`` was removed from the benchmarks suite.
-- The innermost coordinate descent was optimized. This speeds up coordinate descent dominated problems like LASSO by about 1.5-2x. See `here <https://github.com/Quantco/quantcore.glm/pull/424>`_.
+- The innermost coordinate descent was optimized. This speeds up coordinate descent dominated problems like LASSO by about 1.5-2x. See `here <https://github.com/Quantco/glum/pull/424>`_.
 
 1.5.1 - 2021-07-22
 ------------------
 
 **Bug fix:**
 
-* Have the :meth:`linear_predictor` and :meth:`predict` methods of :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV`
+* Have the :meth:`linear_predictor` and :meth:`predict` methods of :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV`
   honor the offset when ``alpha`` is ``None``.
 
 1.5.0 - 2021-07-15
 ------------------
 
 **New features:**
 
-* The :meth:`linear_predictor` and :meth:`predict` methods of :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV`
+* The :meth:`linear_predictor` and :meth:`predict` methods of :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV`
   gain an ``alpha`` parameter (in complement to ``alpha_index``). Moreover, they are now able to predict for multiple penalties.
 
 **Other:**
 
-* Methods of :class:`~quantcore.glm._link.Link` now consistently return NumPy arrays, whereas they used to preserve pandas series in special cases.
+* Methods of :class:`~glum._link.Link` now consistently return NumPy arrays, whereas they used to preserve pandas series in special cases.
 * Don't list ``sparse_dot_mkl`` as a runtime requirement from the conda recipe.
 * The minimal ``numpy`` pin should be dependent on the ``numpy`` version in ``host`` and not fixed to ``1.16``.
 
@@ -89,7 +90,7 @@ Unreleased
 
 **Other:**
 
-- Small improvement in documentation for the ``alpha_index`` argument to :meth:`~quantcore.glm.GeneralizedLinearRegressor.predict`.
+- Small improvement in documentation for the ``alpha_index`` argument to :meth:`~glum.GeneralizedLinearRegressor.predict`.
 - Pinned pre-commit hooks versions.
 
 1.4.1 - 2021-05-01
@@ -102,18 +103,18 @@ We now have Windows builds!
 
 **Deprecations:**
 
-- Fusing the ``alpha`` and ``alphas`` arguments for :class:`~quantcore.glm.GeneralizedLinearRegressor`. ``alpha`` now also accepts array like inputs. ``alphas`` is now deprecated but can still be used for backward compatibility. The ``alphas`` argument will be removed with the next major version.
+- Fusing the ``alpha`` and ``alphas`` arguments for :class:`~glum.GeneralizedLinearRegressor`. ``alpha`` now also accepts array like inputs. ``alphas`` is now deprecated but can still be used for backward compatibility. The ``alphas`` argument will be removed with the next major version.
 
 **Bug fix:**
 
-- We removed entry points to functions in ``quantcore.glm_benchmarks`` from the conda package.
+- We removed entry points to functions in ``glum_benchmarks`` from the conda package.
 
 1.3.1 - 2021-04-12
 ------------------
 
 **Bug fix:**
 
-- :func:`quantcore.glm._distribution.unit_variance_derivative` is
+- :func:`glum._distribution.unit_variance_derivative` is
   evaluating a proper numexpr expression again (regression in 1.3.0).
 
 1.3.0 - 2021-04-12
@@ -127,7 +128,7 @@ We now have Windows builds!
 1.2.0 - 2021-02-04
 ------------------
 
-We removed ``quantcore.glm_benchmarks`` from the conda package.
+We removed ``glum_benchmarks`` from the conda package.
 
 1.1.1 - 2021-01-11
 ------------------
@@ -144,22 +145,22 @@ Maintenance release to get a fresh build for OSX.
 1.0.1 - 2020-11-12
 ------------------
 
-This is a maintenance release to be compatible with ``quantcore.matrix>=1.0.0``.
+This is a maintenance release to be compatible with ``tabmat>=1.0.0``.
 
 1.0.0 - 2020-11-11
 ------------------
 
 **Other:**
 
-- Renamed ``alpha_level`` attribute of :class:`~quantcore.glm.GeneralizedLinearRegressor` and :class:`~quantcore.glm.GeneralizedLinearRegressorCV` to ``alpha_index``.
+- Renamed ``alpha_level`` attribute of :class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV` to ``alpha_index``.
 - Clarified behavior of ``scale_predictors``.
 
 0.0.15 - 2020-11-11
 -------------------
 
 **Other:**
 
-- Pin ``quantcore.matrix<1.0.0`` as we are expecting a breaking change with version 1.0.0.
+- Pin ``tabmat<1.0.0`` as we are expecting a breaking change with version 1.0.0.
 
 0.0.14 - 2020-08-06
 -------------------

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,3 +1,3 @@
 # Contributing
 
-See the [contributing and development information in our documentation](https://docs.dev.***REMOVED***/***REMOVED***/Quantco/quantcore.glm/latest/contributing.html).
+See the [contributing and development information in our documentation](https://docs.dev.***REMOVED***/***REMOVED***/Quantco/glum/latest/contributing.html).
diff --git a/README.md b/README.md
@@ -1,12 +1,12 @@
-# quantcore.glm
+# glum
 
 ![CI](https://github.com/Quantco/glm_benchmarks/workflows/CI/badge.svg)
 
-[Documentation](https://docs.dev.***REMOVED***/***REMOVED***/Quantco/quantcore.glm/latest/index.html)
+[Documentation](https://docs.dev.***REMOVED***/***REMOVED***/Quantco/glum/latest/index.html)
 
-Generalized linear models (GLM) are a core statistical tool that include many common methods like least-squares regression, Poisson regression and logistic regression as special cases. At QuantCo, we have used GLMs in e-commerce pricing, insurance claims prediction and more. We have developed `quantcore.glm`, a fast Python-first GLM library. The development was based on [a fork of scikit-learn](https://github.com/scikit-learn/scikit-learn/pull/9405), so it has a scikit-learn-like API. We are thankful for the starting point provided by Christian Lorentzen in that PR!
+Generalized linear models (GLM) are a core statistical tool that include many common methods like least-squares regression, Poisson regression and logistic regression as special cases. At QuantCo, we have used GLMs in e-commerce pricing, insurance claims prediction and more. We have developed `glum`, a fast Python-first GLM library. The development was based on [a fork of scikit-learn](https://github.com/scikit-learn/scikit-learn/pull/9405), so it has a scikit-learn-like API. We are thankful for the starting point provided by Christian Lorentzen in that PR!
 
-`quantcore.glm` is at least as feature-complete as existing GLM libraries like `glmnet` or `h2o`. It supports
+`glum` is at least as feature-complete as existing GLM libraries like `glmnet` or `h2o`. It supports
 
 * Built-in cross validation for optimal regularization, efficiently exploiting a “regularization path”
 * L1 regularization, which produces sparse and easily interpretable solutions
@@ -15,21 +15,21 @@ Generalized linear models (GLM) are a core statistical tool that include many co
 * Normal, Poisson, logistic, gamma, and Tweedie distributions, plus varied and customizable link functions
 * Box constraints, linear inequality constraints, sample weights, offsets
 
-This repo also includes tools for benchmarking GLM implementations in the `quantcore.glm_benchmarks` module. For details on the benchmarking, [see here](src/quantcore/glm_benchmarks/README.md). Although the performance of `quantcore.glm` relative to `glmnet` and `h2o` depends on the specific problem, we find that it is consistently much faster for a wide range of problems.
+This repo also includes tools for benchmarking GLM implementations in the `glum_benchmarks` module. For details on the benchmarking, [see here](src/glum_benchmarks/README.md). Although the performance of `glum` relative to `glmnet` and `h2o` depends on the specific problem, we find that it is consistently much faster for a wide range of problems.
 
 ![](docs/_static/headline_benchmark.png)
 
-For more information on `quantcore.glm`, including tutorials and API reference, please see [the documentation](https://docs.dev.***REMOVED***/***REMOVED***/Quantco/quantcore.glm/latest/index.html).
+For more information on `glum`, including tutorials and API reference, please see [the documentation](https://docs.dev.***REMOVED***/***REMOVED***/Quantco/glum/latest/index.html).
 
 # An example: predicting car insurance claim frequency using Poisson regression.
 
 This example uses a public French car insurance dataset.
 ```python
 >>> import pandas as pd
 >>> import numpy as np
->>> from quantcore.glm_benchmarks.problems import load_data, generate_narrow_insurance_dataset
->>> from quantcore.glm_benchmarks.util import get_obj_val
->>> from quantcore.glm import GeneralizedLinearRegressor
+>>> from glum_benchmarks.problems import load_data, generate_narrow_insurance_dataset
+>>> from glum_benchmarks.util import get_obj_val
+>>> from glum import GeneralizedLinearRegressor
 >>>
 >>> # Load the French Motor Insurance dataset
 >>> dat = load_data(generate_narrow_insurance_dataset)
@@ -62,5 +62,5 @@ n_iter
 
 Please install the package through conda-forge:
 ```bash
-conda install quantcore.glm -c conda-forge
+conda install glum -c conda-forge
 ```
diff --git a/conda.recipe/meta.yaml b/conda.recipe/meta.yaml
@@ -2,7 +2,7 @@
 {% set build_string = "py{}h{}_{}_{}".format(CONDA_PY, PKG_HASH, build_number, GLM_ARCHITECTURE) %}
 
 package:
-  name: quantcore.glm
+  name: glum
   version: {{ environ.get('GIT_DESCRIBE_TAG', '').lstrip('v') }}{% if environ.get('GIT_DESCRIBE_NUMBER', 0)|int != 0 %}.post{{ GIT_DESCRIBE_NUMBER }}+{{ GIT_DESCRIBE_HASH }}{% endif %}
 
 source:
@@ -13,7 +13,7 @@ build:
   number: {{ build_number }}
   string: "{{ build_string }}"
   track_features:
-    {{ "- quantcore-glm-{}".format(GLM_ARCHITECTURE) if GLM_ARCHITECTURE != "default" else "" }}
+    {{ "- glum-{}".format(GLM_ARCHITECTURE) if GLM_ARCHITECTURE != "default" else "" }}
 
 
 requirements:
@@ -41,16 +41,16 @@ requirements:
     - pandas
     - scikit-learn >=0.23
     - scipy
-    - quantcore.matrix >=1.0.0
+    - tabmat >=1.0.0
 
 test:
   requires:
     - pip
   commands:
     - pip check
   imports:
-    - quantcore.glm
+    - glum
 
 about:
-  home: https://github.com/Quantco/quantcore.glm
+  home: https://github.com/Quantco/glum
   license: Proprietary
diff --git a/docs/background.ipynb b/docs/background.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# An introduction to the algorithms used in `quantcore.glm`\n",
+    "# An introduction to the algorithms used in `glum`\n",
     "\n",
     "Before continuing, please take a look at the [sklearn documentation](https://scikit-learn.org/stable/modules/linear_model.html#generalized-linear-regression) for a high-level intro to generalized linear models. \n",
     "\n",
@@ -116,7 +116,7 @@
    "source": [
     "## Optimizers/solvers\n",
     "\n",
-    "There are three solvers implemented in `quantcore.glm`:\n",
+    "There are three solvers implemented in `glum`:\n",
     "\n",
     "* `lbfgs` - This solver uses the scipy `fmin_l_bfgs_b` optimizer to minimize L2-penalized GLMs. The L-BFGS solver does not work with L1-penalties. Because L-BFGS does not store the full Hessian, it can be particularly effective for very high dimensional problems with several thousand or more columns. For more details, see [the scipy documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html).\n",
     "* `irls-cd` and `irls-ls` - These solvers are both based on Iteratively Reweighted Least Squares (IRLS). IRLS proceeds by iteratively approximating the objective function with a quadratic, then solving that quadratic for the optimal update. For purely L2-penalized settings, the `irls-ls` uses a least squares inner solver for each quadratic subproblem. For problems that have any L1-penalty component, the `irls-cd` uses a coordinate descent inner solver for each quadratic subproblem. The IRLS-LS and IRLS-CD implementations largely follow the algorithm described in `newglmnet` (see references below).\n",
@@ -206,7 +206,7 @@
     "\n",
     "Along with the GLM solvers, this package supports dense, sparse, categorical matrix types and mixtures of these types. Using the most efficient matrix representations massively improves performacne. \n",
     "\n",
-    "For more details, see the [README for quantcore.matrix](https://github.com/Quantco/quantcore.matrix)\n",
+    "For more details, see the [README for tabmat](https://github.com/Quantco/tabmat)\n",
     "\n",
     "* We support dense matrices via standard numpy arrays. \n",
     "* We support sparse CSR and CSC matrices via standard `scipy.sparse` objects. However, we have extended these operations with custom matrix-vector and sandwich product routines that are optimized and parallelized. A user does not need to modify their code to take advantage of this optimization. If a `scipy.sparse.csc_matrix` object is passed in, it will be automatically converted to a `SparseMatrix` object. This operation is almost free because no data needs to be copied.\n",
@@ -220,7 +220,7 @@
    "source": [
     "## Standardization\n",
     "\n",
-    "Internal to our solvers, all matrix types are wrapped in a `quantcore.matrix.StandardizedMatrix` which offsets columns to have mean zero and standard deviation one without modifying the matrix data itself. This avoids situations where modifying a matrix to have mean zero would result in losing the sparsity structure. It also avoids ever needing to copy or modify the input data matrix. As a result, excess memory usage is very low in `quantcore.glm`."
+    "Internal to our solvers, all matrix types are wrapped in a `tabmat.StandardizedMatrix` which offsets columns to have mean zero and standard deviation one without modifying the matrix data itself. This avoids situations where modifying a matrix to have mean zero would result in losing the sparsity structure. It also avoids ever needing to copy or modify the input data matrix. As a result, excess memory usage is very low in `glum`."
    ]
   },
   {