Skip to content

Commit

Permalink
Merge b261ec2 into 52ce7ec
Browse files Browse the repository at this point in the history
  • Loading branch information
CamDavidsonPilon committed Feb 5, 2019
2 parents 52ce7ec + b261ec2 commit b470140
Show file tree
Hide file tree
Showing 19 changed files with 1,108 additions and 959 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
@@ -1,5 +1,13 @@
### Changelogs

### 0.18.2
- New univariate fitter `PiecewiseExponentialFitter` for creating a stepwise hazard model. See docs online.
- Ability to create novel parametric univariate models using the new `ParametericUnivariateFitter` super class. See docs online for how to do this.
- Unfortunately, parametric univariate fitters are not serializable with `pickle`. The library `dill` is still useable.
- Complete overhaul of all internals for parametric univariate fitters. Moved them all (most) to use `autograd`.
- `LogNormalFitter` no longer models `log_sigma`.


### 0.18.1
- bug fixes in `LogNormalFitter` variance estimates
- improve convergence of `LogNormalFitter`. We now model the log of sigma internally, but still expose sigma externally.
Expand Down
16 changes: 8 additions & 8 deletions docs/Survival Regression.rst
Expand Up @@ -25,7 +25,7 @@ hazard rate :math:`h(t | x)` as a function of :math:`t` and some covariates :mat
Cox's proportional hazard model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

lifelines has an implementation of the Cox proportional hazards regression model (implemented in
*lifelines* has an implementation of the Cox proportional hazards regression model (implemented in
R as ``coxph``). The idea behind the model is that the log-hazard of an individual is a linear function of their static covariates *and* a population-level baseline hazard that changes over time. Mathematically:

.. math:: \underbrace{h(t | x)}_{\text{hazard}} = \overbrace{b_0(t)}^{\text{baseline hazard}} \underbrace{\exp \overbrace{\left(\sum_{i=1}^n b_i (x_i - \overline{x_i})\right)}^{\text{log-partial hazard}}}_ {\text{partial hazard}}
Expand All @@ -36,9 +36,9 @@ The dataset for regression
###########################
The dataset required for survival regression must be in the format of a Pandas DataFrame. Each row of the DataFrame should be an observation. There should be a column denoting the durations of the observations. There may be a column denoting the event status of each observation (1 if event occured, 0 if censored). There are also the additional covariates you wish to regress against. Optionally, there could be columns in the DataFrame that are used for stratification, weights, and clusters which will be discussed later in this tutorial.

.. note:: In other regression models, a column of 1s might be added that represents that intercept or baseline. This is not necessary in the Cox model. In fact, there is no intercept in the additive Cox model - the baseline hazard represents this. _lifelines_ will will throw warnings and may experience convergence errors if a column of 1s is present in your dataset.
.. note:: In other regression models, a column of 1s might be added that represents that intercept or baseline. This is not necessary in the Cox model. In fact, there is no intercept in the additive Cox model - the baseline hazard represents this. *lifelines* will will throw warnings and may experience convergence errors if a column of 1s is present in your dataset.

An example dataset is called the Rossi recidivism dataset, available in lifelines as ``datasets.load_rossi``.
An example dataset is called the Rossi recidivism dataset, available in *lifelines* as ``datasets.load_rossi``.

.. code:: python
Expand All @@ -57,14 +57,14 @@ An example dataset is called the Rossi recidivism dataset, available in lifeline
The dataframe ``rossi`` contains 432 observations. The ``week`` column is the duration, the ``arrest`` column is the event occured, and the other columns represent variables we wish to regress against.


If you need to first clean or transform your dataset (encode categorical variables, add interation terms, etc.), that should happen *before* using lifelines. Libraries like Pandas and Patsy help with that.
If you need to first clean or transform your dataset (encode categorical variables, add interation terms, etc.), that should happen *before* using *lifelines*. Libraries like Pandas and Patsy help with that.


Running the regression
########################


The implementation of the Cox model in lifelines is called ``CoxPHFitter``. Like R, it has a ``print_summary`` function that prints a tabular view of coefficients and related stats.
The implementation of the Cox model in *lifelines* is called ``CoxPHFitter``. Like R, it has a ``print_summary`` function that prints a tabular view of coefficients and related stats.


.. code:: python
Expand Down Expand Up @@ -546,7 +546,7 @@ We can incorporate changes over time into our survival analysis by using a modif

.. math:: h(t | x) = \overbrace{b_0(t)}^{\text{baseline}}\underbrace{\exp \overbrace{\left(\sum_{i=1}^n \beta_i (x_i(t) - \overline{x_i}) \right)}^{\text{log-partial hazard}}}_ {\text{partial hazard}}

Note the time-varying :math:`x_i(t)` to denote that covariates can change over time. This model is implemented in lifelines as ``CoxTimeVaryingFitter``. The dataset schema required is different than previous models, so we will spend some time describing this.
Note the time-varying :math:`x_i(t)` to denote that covariates can change over time. This model is implemented in *lifelines* as ``CoxTimeVaryingFitter``. The dataset schema required is different than previous models, so we will spend some time describing this.

Dataset creation for time-varying regression
#############################################
Expand Down Expand Up @@ -615,7 +615,7 @@ Lifelines requires that the dataset be in what is called the *long* format. This

In the above dataset, ``start`` and ``stop`` denote the boundaries, ``id`` is the unique identifier per subject, and ``event`` denotes if the subject died at the end of that period. For example, subject ID 2 had variable ``z=0`` up to and including the end of time period 5 (we can think that measurements happen at end of the time period), after which it was set to 1. Since ``event`` is 1 in that row, we conclude that the subject died at time 8,

This desired dataset can be built up from smaller datasets. To do this we can use some helper functions provided in lifelines. Typically, data will be in a format that looks like it comes out of a relational database. You may have a "base" table with ids, durations alive, and a censorsed flag, and possibly static covariates. Ex:
This desired dataset can be built up from smaller datasets. To do this we can use some helper functions provided in *lifelines*. Typically, data will be in a format that looks like it comes out of a relational database. You may have a "base" table with ids, durations alive, and a censorsed flag, and possibly static covariates. Ex:

.. raw:: html

Expand Down Expand Up @@ -910,7 +910,7 @@ of AUC, another common loss function, and is interpreted similarly:
* 1.0 is perfect concordance and,
* 0.0 is perfect anti-concordance (multiply predictions with -1 to get 1.0)

A fitted model's concordance-index is present in the ``print_summary()``, but also available under the ``score_`` property. Generally, the measure is implemented in lifelines under ``lifelines.utils.concordance_index`` and accepts the actual times (along with any censorships) and the predicted times.
A fitted model's concordance-index is present in the ``print_summary()``, but also available under the ``score_`` property. Generally, the measure is implemented in *lifelines* under ``lifelines.utils.concordance_index`` and accepts the actual times (along with any censorships) and the predicted times.

.. code:: python
Expand Down
34 changes: 22 additions & 12 deletions docs/Survival analysis with lifelines.rst
Expand Up @@ -80,7 +80,7 @@ From the ``lifelines`` library, we'll need the
from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter()
.. note:: Other ways to estimate the survival function in lifelines are discussed below.
.. note:: Other ways to estimate the survival function in *lifelines* are discussed below.

For this estimation, we need the duration each leader was/has been in
office, and whether or not they were observed to have left office
Expand Down Expand Up @@ -488,6 +488,10 @@ here. (My advice: stick with the cumulative hazard function.)
.. image:: images/lifelines_intro_naf_smooth_multi_2.png


Estimating hazard rates using Parametric models
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''


Fitting to a Weibull model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand All @@ -496,7 +500,7 @@ Another very popular model for survival data is the Weibull model. In contrast t

.. math:: S(t) = \exp\left(-(\lambda t)^\rho\right), \lambda >0, \rho > 0,

* A priori*, we do not know what :math:`\lambda` and :math:`\rho` are, but we use the data on hand to estimate these parameters. In fact, we actually model and estimate the cumulative hazard rate instead of the survival function (this is different than the Kaplan-Meier estimator):
A priori, we do not know what :math:`\lambda` and :math:`\rho` are, but we use the data on hand to estimate these parameters. We model and estimate the cumulative hazard rate instead of the survival function (this is different than the Kaplan-Meier estimator):

.. math:: H(t) = (\lambda t)^\rho, \lambda >0, \rho > 0,

Expand Down Expand Up @@ -540,7 +544,7 @@ In lifelines, estimation is available using the ``WeibullFitter`` class. The ``p
Other parametric models: Exponential, Log-Logistic & Log-Normal
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Similarly, there are other parametric models in lifelines. Generally, which parametric model to choose is determined by either knowledge of the distribution of durations, or some sort of model goodness-of-fit. Below are the three parametric models, and the Nelson-Aalen nonparametric model, of the same data.
Similarly, there are other parametric models in *lifelines*. Generally, which parametric model to choose is determined by either knowledge of the distribution of durations, or some sort of model goodness-of-fit. Below are the built-in parametric models, and the Nelson-Aalen nonparametric model, of the same data.

.. code:: python
Expand All @@ -549,31 +553,33 @@ Similarly, there are other parametric models in lifelines. Generally, which para
from lifelines import LogNormalFitter
from lifelines import LogLogisticFitter
from lifelines import NelsonAalenFitter
from lifelines import PiecewiseExponentialFitter
from lifelines.datasets import load_waltons
data = load_waltons()
fig, axes = plt.subplots(2, 3, figsize=(9, 5))
T = data['T']
E = data['E']
wf = WeibullFitter().fit(T, E, label='WeibullFitter')
wbf = WeibullFitter().fit(T, E, label='WeibullFitter')
exf = ExponentialFitter().fit(T, E, label='ExponentalFitter')
lnf = LogNormalFitter().fit(T, E, label='LogNormalFitter')
naf = NelsonAalenFitter().fit(T, E, label='NelsonAalenFitter')
llf = LogLogisticFitter().fit(T, E, label='LogLogisticFitter')
pwf = PiecewiseExponentialFitter([40, 60]).fit(T, E, label='PiecewiseExponentialFitter')
ax = wf.plot(ci_show=False)
ax = exf.plot(ax=ax, ci_show=False)
ax = lnf.plot(ax=ax, ci_show=False)
ax = naf.plot(ax=ax, ci_show=False)
ax = llf.plot(ax=ax, ci_show=False)
plt.title("Cumulative hazard rate estimates\n of Walton's data")
wbf.plot(ax=axes[0][0])
exf.plot(ax=axes[0][1])
lnf.plot(ax=axes[0][2])
naf.plot(ax=axes[1][0])
llf.plot(ax=axes[1][1])
pwf.plot(ax=axes[1][2])
.. image:: images/waltons_cumulative_hazard.png


*lifelines* can also be used to define your own parametic model. There is a tutorial on this available, see `Piecewise Exponential Models and Creating Custom Models`_.

Other types of censorship
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Expand Down Expand Up @@ -628,3 +634,7 @@ Both ``KaplanMeierFitter`` and ``NelsonAalenFitter`` have an optional argument f
.. note:: Nothing changes in the duration array: it still measures time from "birth" to time left study (either by death or censorship). That is, durations refers to the absolute death time rather than a duration relative to the study entry.

.. note:: Other types of censorship, like interval-censorship, are not implemented in *lifelines* yet.

.. _Piecewise Exponential Models and Creating Custom Models: jupyter_notebooks/Piecewise%20Exponential%20Models%20and%20Creating%20Custom%20Models.html


2 changes: 1 addition & 1 deletion docs/conf.py
Expand Up @@ -60,7 +60,7 @@
#
# The short X.Y version.

version = "0.18.1"
version = "0.18.2"
# The full version, including dev info
release = version

Expand Down
Binary file modified docs/images/waltons_cumulative_hazard.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 4 additions & 3 deletions docs/index.rst
Expand Up @@ -34,6 +34,7 @@ Contents:
Survival Regression
jupyter_notebooks/Proportional hazard assumption.ipynb
jupyter_notebooks/Cox residuals.ipynb
jupyter_notebooks/Piecewise Exponential Models and Creating Custom Models.ipynb
Examples


Expand All @@ -47,11 +48,11 @@ Dependencies are from the typical Python data-stack: Numpy, Pandas, Scipy, and o
pip install lifelines
Source code and Issue Tracker
Source Code and Issue Tracker
------------------------------

Available on Github, `CamDavidsonPilon/lifelines <https://github.com/CamDavidsonPilon/lifelines/>`_
Please report bugs, issues and feature extensions there. We also have `Gitter channel <https://gitter.im/python-lifelines/Lobby>`_ open to discuss lifelines:
Available on Github, `CamDavidsonPilon/lifelines <https://github.com/CamDavidsonPilon/lifelines/>`_.
Please report bugs, issues and feature extensions there. We also have `Gitter channel <https://gitter.im/python-lifelines/Lobby>`_ avaiable to discuss survival analysis and *lifelines*:

Citing lifelines
------------------------------
Expand Down

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions lifelines/__init__.py
Expand Up @@ -12,6 +12,7 @@
from lifelines.fitters.aalen_johansen_fitter import AalenJohansenFitter
from lifelines.fitters.log_normal_fitter import LogNormalFitter
from lifelines.fitters.log_logistic_fitter import LogLogisticFitter
from lifelines.fitters.piecewise_exponential_fitter import PiecewiseExponentialFitter

from lifelines.version import __version__

Expand Down

0 comments on commit b470140

Please sign in to comment.