Merge b261ec2 into 52ce7ec

CamDavidsonPilon · Feb 5, 2019 · b470140 · b470140
2 parents 52ce7ec + b261ec2
commit b470140
Show file tree

Hide file tree

Showing 19 changed files with 1,108 additions and 959 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,13 @@
 ### Changelogs
 
+### 0.18.2
+ - New univariate fitter `PiecewiseExponentialFitter` for creating a stepwise hazard model. See docs online.
+ - Ability to create novel parametric univariate models using the new `ParametericUnivariateFitter` super class. See docs online for how to do this. 
+ - Unfortunately, parametric univariate fitters are not serializable with `pickle`. The library `dill` is still useable. 
+ - Complete overhaul of all internals for parametric univariate fitters. Moved them all (most) to use `autograd`.
+ - `LogNormalFitter` no longer models `log_sigma`.
+
+
 ### 0.18.1
  - bug fixes in `LogNormalFitter` variance estimates
  - improve convergence of `LogNormalFitter`. We now model the log of sigma internally, but still expose sigma externally. 

diff --git a/docs/Survival Regression.rst b/docs/Survival Regression.rst
@@ -25,7 +25,7 @@ hazard rate :math:`h(t | x)` as a function of :math:`t` and some covariates :mat
 Cox's proportional hazard model
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-lifelines has an implementation of the Cox proportional hazards regression model (implemented in
+*lifelines* has an implementation of the Cox proportional hazards regression model (implemented in
 R as ``coxph``). The idea behind the model is that the log-hazard of an individual is a linear function of their static covariates *and* a population-level baseline hazard that changes over time. Mathematically:
 
 .. math::  \underbrace{h(t | x)}_{\text{hazard}} = \overbrace{b_0(t)}^{\text{baseline hazard}} \underbrace{\exp \overbrace{\left(\sum_{i=1}^n b_i (x_i - \overline{x_i})\right)}^{\text{log-partial hazard}}}_ {\text{partial hazard}}
@@ -36,9 +36,9 @@ The dataset for regression
 ###########################
 The dataset required for survival regression must be in the format of a Pandas DataFrame. Each row of the DataFrame should be an observation. There should be a column denoting the durations of the observations. There may be a column denoting the event status of each observation (1 if event occured, 0 if censored). There are also the additional covariates you wish to regress against. Optionally, there could be columns in the DataFrame that are used for stratification, weights, and clusters which will be discussed later in this tutorial. 
 
-.. note:: In other regression models, a column of 1s might be added that represents that intercept or baseline. This is not necessary in the Cox model. In fact, there is no intercept in the additive Cox model - the baseline hazard represents this. _lifelines_ will will throw warnings and may experience convergence errors if a column of 1s is present in your dataset.
+.. note:: In other regression models, a column of 1s might be added that represents that intercept or baseline. This is not necessary in the Cox model. In fact, there is no intercept in the additive Cox model - the baseline hazard represents this. *lifelines* will will throw warnings and may experience convergence errors if a column of 1s is present in your dataset.
 
-An example dataset is called the Rossi recidivism dataset, available in lifelines as ``datasets.load_rossi``.
+An example dataset is called the Rossi recidivism dataset, available in *lifelines* as ``datasets.load_rossi``.
 
 .. code:: python
 
@@ -57,14 +57,14 @@ An example dataset is called the Rossi recidivism dataset, available in lifeline
 The dataframe ``rossi`` contains 432 observations. The ``week`` column is the duration, the ``arrest`` column is the event occured, and the other columns represent variables we wish to regress against. 
 
 
-If you need to first clean or transform your dataset (encode categorical variables, add interation terms, etc.), that should happen *before* using lifelines. Libraries like Pandas and Patsy help with that. 
+If you need to first clean or transform your dataset (encode categorical variables, add interation terms, etc.), that should happen *before* using *lifelines*. Libraries like Pandas and Patsy help with that. 
 
 
 Running the regression
 ########################
 
 
-The implementation of the Cox model in lifelines is called ``CoxPHFitter``. Like R, it has a ``print_summary`` function that prints a tabular view of coefficients and related stats.
+The implementation of the Cox model in *lifelines* is called ``CoxPHFitter``. Like R, it has a ``print_summary`` function that prints a tabular view of coefficients and related stats.
 
 
 .. code:: python
@@ -546,7 +546,7 @@ We can incorporate changes over time into our survival analysis by using a modif
 
 .. math::  h(t | x) = \overbrace{b_0(t)}^{\text{baseline}}\underbrace{\exp \overbrace{\left(\sum_{i=1}^n \beta_i (x_i(t) - \overline{x_i}) \right)}^{\text{log-partial hazard}}}_ {\text{partial hazard}}
 
-Note the time-varying :math:`x_i(t)` to denote that covariates can change over time. This model is implemented in lifelines as ``CoxTimeVaryingFitter``. The dataset schema required is different than previous models, so we will spend some time describing this.
+Note the time-varying :math:`x_i(t)` to denote that covariates can change over time. This model is implemented in *lifelines* as ``CoxTimeVaryingFitter``. The dataset schema required is different than previous models, so we will spend some time describing this.
 
 Dataset creation for time-varying regression
 #############################################
@@ -615,7 +615,7 @@ Lifelines requires that the dataset be in what is called the *long* format. This
 
 In the above dataset, ``start`` and ``stop`` denote the boundaries, ``id`` is the unique identifier per subject, and ``event`` denotes if the subject died at the end of that period. For example, subject ID 2 had variable ``z=0`` up to and including the end of time period 5 (we can think that measurements happen at end of the time period), after which it was set to 1. Since ``event`` is 1 in that row, we conclude that the subject died at time 8,
 
-This desired dataset can be built up from smaller datasets. To do this we can use some helper functions provided in lifelines. Typically, data will be in a format that looks like it comes out of a relational database. You may have a "base" table with ids, durations alive, and a censorsed flag, and possibly static covariates. Ex:
+This desired dataset can be built up from smaller datasets. To do this we can use some helper functions provided in *lifelines*. Typically, data will be in a format that looks like it comes out of a relational database. You may have a "base" table with ids, durations alive, and a censorsed flag, and possibly static covariates. Ex:
 
 .. raw:: html
 
@@ -910,7 +910,7 @@ of AUC, another common loss function, and is interpreted similarly:
 * 1.0 is perfect concordance and,
 * 0.0 is perfect anti-concordance (multiply predictions with -1 to get 1.0)
 
-A fitted model's concordance-index is present in the ``print_summary()``, but also available under the ``score_`` property. Generally, the measure is implemented in lifelines under ``lifelines.utils.concordance_index`` and accepts the actual times (along with any censorships) and the predicted times.
+A fitted model's concordance-index is present in the ``print_summary()``, but also available under the ``score_`` property. Generally, the measure is implemented in *lifelines* under ``lifelines.utils.concordance_index`` and accepts the actual times (along with any censorships) and the predicted times.
 
 .. code:: python
 

diff --git a/docs/Survival analysis with lifelines.rst b/docs/Survival analysis with lifelines.rst
@@ -80,7 +80,7 @@ From the ``lifelines`` library, we'll need the
     from lifelines import KaplanMeierFitter
     kmf = KaplanMeierFitter()
 
-..  note:: Other ways to estimate the survival function in lifelines are discussed below. 
+..  note:: Other ways to estimate the survival function in *lifelines* are discussed below. 
 
 For this estimation, we need the duration each leader was/has been in
 office, and whether or not they were observed to have left office
@@ -488,6 +488,10 @@ here. (My advice: stick with the cumulative hazard function.)
 .. image:: images/lifelines_intro_naf_smooth_multi_2.png
 
 
+Estimating hazard rates using Parametric models
+''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+
 Fitting to a Weibull model
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -496,7 +500,7 @@ Another very popular model for survival data is the Weibull model. In contrast t
 
 .. math::  S(t) = \exp\left(-(\lambda t)^\rho\right),   \lambda >0, \rho > 0,
 
-* A priori*, we do not know what :math:`\lambda` and :math:`\rho` are, but we use the data on hand to estimate these parameters. In fact, we actually model and estimate the cumulative hazard rate instead of the survival function (this is different than the Kaplan-Meier estimator):
+A priori, we do not know what :math:`\lambda` and :math:`\rho` are, but we use the data on hand to estimate these parameters. We model and estimate the cumulative hazard rate instead of the survival function (this is different than the Kaplan-Meier estimator):
 
 .. math::  H(t) = (\lambda t)^\rho,  \lambda >0, \rho > 0,
 
@@ -540,7 +544,7 @@ In lifelines, estimation is available using the ``WeibullFitter`` class. The ``p
 Other parametric models: Exponential, Log-Logistic & Log-Normal
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Similarly, there are other parametric models in lifelines. Generally, which parametric model to choose is determined by either knowledge of the distribution of durations, or some sort of model goodness-of-fit. Below are the three parametric models, and the Nelson-Aalen nonparametric model, of the same data.
+Similarly, there are other parametric models in *lifelines*. Generally, which parametric model to choose is determined by either knowledge of the distribution of durations, or some sort of model goodness-of-fit. Below are the built-in parametric models, and the Nelson-Aalen nonparametric model, of the same data.
 
 .. code:: python
 
@@ -549,31 +553,33 @@ Similarly, there are other parametric models in lifelines. Generally, which para
     from lifelines import LogNormalFitter
     from lifelines import LogLogisticFitter
     from lifelines import NelsonAalenFitter
+    from lifelines import PiecewiseExponentialFitter
 
     from lifelines.datasets import load_waltons
     data = load_waltons()
 
+    fig, axes = plt.subplots(2, 3, figsize=(9, 5))
 
     T = data['T']
     E = data['E']
 
-    wf = WeibullFitter().fit(T, E, label='WeibullFitter')
+    wbf = WeibullFitter().fit(T, E, label='WeibullFitter')
     exf = ExponentialFitter().fit(T, E, label='ExponentalFitter')
     lnf = LogNormalFitter().fit(T, E, label='LogNormalFitter')
     naf = NelsonAalenFitter().fit(T, E, label='NelsonAalenFitter')
     llf = LogLogisticFitter().fit(T, E, label='LogLogisticFitter')
+    pwf = PiecewiseExponentialFitter([40, 60]).fit(T, E, label='PiecewiseExponentialFitter')
 
-    ax = wf.plot(ci_show=False)
-    ax = exf.plot(ax=ax, ci_show=False)
-    ax = lnf.plot(ax=ax, ci_show=False)
-    ax = naf.plot(ax=ax, ci_show=False)
-    ax = llf.plot(ax=ax, ci_show=False)
-    plt.title("Cumulative hazard rate estimates\n of Walton's data")
-
+    wbf.plot(ax=axes[0][0])
+    exf.plot(ax=axes[0][1])
+    lnf.plot(ax=axes[0][2])
+    naf.plot(ax=axes[1][0])
+    llf.plot(ax=axes[1][1])
+    pwf.plot(ax=axes[1][2])
 
 .. image:: images/waltons_cumulative_hazard.png
 
-
+*lifelines* can also be used to define your own parametic model. There is a tutorial on this available, see `Piecewise Exponential Models and Creating Custom Models`_.
 
 Other types of censorship
 ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
@@ -628,3 +634,7 @@ Both ``KaplanMeierFitter`` and ``NelsonAalenFitter`` have an optional argument f
  .. note:: Nothing changes in the duration array: it still measures time from "birth" to time left study (either by death or censorship). That is, durations refers to the absolute death time rather than a duration relative to the study entry.
 
  .. note:: Other types of censorship, like interval-censorship, are not implemented in *lifelines* yet.
+
+.. _Piecewise Exponential Models and Creating Custom Models: jupyter_notebooks/Piecewise%20Exponential%20Models%20and%20Creating%20Custom%20Models.html
+
+
diff --git a/docs/conf.py b/docs/conf.py
@@ -60,7 +60,7 @@
 #
 # The short X.Y version.
 
-version = "0.18.1"
+version = "0.18.2"
 # The full version, including dev info
 release = version
 

diff --git a/docs/images/waltons_cumulative_hazard.png b/docs/images/waltons_cumulative_hazard.png
diff --git a/docs/index.rst b/docs/index.rst
@@ -34,6 +34,7 @@ Contents:
   Survival Regression
   jupyter_notebooks/Proportional hazard assumption.ipynb
   jupyter_notebooks/Cox residuals.ipynb
+  jupyter_notebooks/Piecewise Exponential Models and Creating Custom Models.ipynb
   Examples
 
 
@@ -47,11 +48,11 @@ Dependencies are from the typical Python data-stack: Numpy, Pandas, Scipy, and o
     pip install lifelines
 
 
-Source code and Issue Tracker
+Source Code and Issue Tracker
 ------------------------------
 
-Available on Github, `CamDavidsonPilon/lifelines <https://github.com/CamDavidsonPilon/lifelines/>`_
-Please report bugs, issues and feature extensions there. We also have `Gitter channel <https://gitter.im/python-lifelines/Lobby>`_ open to discuss lifelines:
+Available on Github, `CamDavidsonPilon/lifelines <https://github.com/CamDavidsonPilon/lifelines/>`_.
+Please report bugs, issues and feature extensions there. We also have `Gitter channel <https://gitter.im/python-lifelines/Lobby>`_ avaiable to discuss survival analysis and *lifelines*:
 
 Citing lifelines
 ------------------------------

diff --git a/docs/jupyter_notebooks/Piecewise Exponential Models and Creating Custom Models.ipynb b/docs/jupyter_notebooks/Piecewise Exponential Models and Creating Custom Models.ipynb
diff --git a/lifelines/__init__.py b/lifelines/__init__.py
@@ -12,6 +12,7 @@
 from lifelines.fitters.aalen_johansen_fitter import AalenJohansenFitter
 from lifelines.fitters.log_normal_fitter import LogNormalFitter
 from lifelines.fitters.log_logistic_fitter import LogLogisticFitter
+from lifelines.fitters.piecewise_exponential_fitter import PiecewiseExponentialFitter
 
 from lifelines.version import __version__