Unique baseline hazard for each strata #268

CamDavidsonPilon · 2016-12-29T01:58:23Z

New PR to replace #196

CamDavidsonPilon · 2016-12-29T01:59:42Z

tests/test_estimation.py

+        cp = CoxPHFitter(normalize=False)
+        cp.fit(rossi, 'week', 'arrest', strata=['race', 'paro', 'mar', 'wexp'], include_likelihood=True)
+        npt.assert_almost_equal(cp.baseline_cumulative_hazard_[(0, 0, 0, 0)].ix[[14, 35, 37, 43, 52]].values, [0.28665890, 0.63524149, 1.01822603, 1.48403930, 1.48403930], decimal=2)
+        npt.assert_almost_equal(cp.baseline_cumulative_hazard_[(0, 0, 0, 1)].ix[[27, 43, 48, 52]].values, [0.35738173, 0.76415714, 1.26635373, 1.26635373], decimal=2)


the baseline hazards are only slightly off, and so the errors accumulate in the cumulative. I'd like to understand why my values are slightly different. @IVANBARRIENTOS, is there a way I can access the non-cumulative hazards?

Also: any others tests you would recommend?

CamDavidsonPilon · 2016-12-29T02:03:30Z

lifelines/fitters/coxph_fitter.py

-        s_0 = self.baseline_survival_
-        col = _get_index(X)
-        return pd.DataFrame(-np.dot(np.log(s_0), v.T), index=self.baseline_survival_.index, columns=col)
+        if self.strata:


I've moved this logic from the predict_survival function to the more "higher up" function

CamDavidsonPilon · 2016-12-29T02:03:41Z

lifelines/fitters/coxph_fitter.py

-        return pd.DataFrame(-np.dot(np.log(s_0), v.T), index=self.baseline_survival_.index, columns=col)
+        if self.strata:
+            cumulative_hazard_ = pd.DataFrame()
+            for stratum, stratified_X in X.groupby(self.strata):


cute use of groupby here

CamDavidsonPilon · 2016-12-29T02:47:55Z

tests/utils/test_utils.py

@@ -306,7 +308,7 @@ def test_both_concordance_index_function_deal_with_ties_the_same_way():
    actual_times = np.array([1, 1, 2])
    predicted_times = np.array([1, 2, 3])
    obs = np.ones(3)
-    assert fast_cindex(actual_times, predicted_times, obs) == slow_cindex(actual_times, predicted_times, obs) == 1.0 
+    assert fast_cindex(actual_times, predicted_times, obs) == slow_cindex(actual_times, predicted_times, obs) == 1.0



all white space changes in this file

CamDavidsonPilon · 2016-12-29T03:15:18Z

tests/test_estimation.py

+        cp = CoxPHFitter(normalize=False)
+        cp.fit(rossi, 'week', 'arrest', strata=['race', 'paro', 'mar', 'wexp'])
+        npt.assert_almost_equal(cp.baseline_cumulative_hazard_[(0, 0, 0, 0)].ix[[14, 35, 37, 43, 52]].values, [0.28665890, 0.63524149, 1.01822603, 1.48403930, 1.48403930], decimal=2)
+        npt.assert_almost_equal(cp.baseline_cumulative_hazard_[(0, 0, 0, 1)].ix[[27, 43, 48, 52]].values, [0.35738173, 0.76415714, 1.26635373, 1.26635373], decimal=2)


the baseline hazards are only slightly off, and so the errors accumulate in the cumulative. I'd like to understand why my values are slightly different. @IVANBARRIENTOS, is there a way I can access the non-cumulative hazards?

Also: any others tests you would recommend?

Ah, I think it's because my estimates of beta are slightly off, and this is just a manifestation of that.

Opened an issue here: #272

…nate into nan values in the strata in which you're estimtaing the survival curve

ibarrien · 2016-12-29T17:15:54Z

Thanks for the updates! Yes looks like it's in the betas. One potential: Lifelines normalizes by default whereas R does not. I'll dig in this weekend

…

On Dec 28, 2016 21:59, "Cameron Davidson-Pilon" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In tests/test_estimation.py <#268>: > @@ -928,6 +929,19 @@ def test_strata_against_r_output(self, rossi): npt.assert_almost_equal(cp.summary['coef'].values, [-0.335, -0.059, 0.100], decimal=3) assert abs(cp._log_likelihood - -436.9339) / 436.9339 < 0.01 + def test_hazard_works_as_intended_with_strata_against_R_output(self, rossi): + """ + > library(survival) + > ross = read.csv('rossi.csv') + > r = coxph(formula = Surv(week, arrest) ~ fin + age + strata(race, + paro, mar, wexp) + prio, data = rossi) + > basehaz(r, centered=FALSE) + """ + cp = CoxPHFitter(normalize=False) + cp.fit(rossi, 'week', 'arrest', strata=['race', 'paro', 'mar', 'wexp']) + npt.assert_almost_equal(cp.baseline_cumulative_hazard_[(0, 0, 0, 0)].ix[[14, 35, 37, 43, 52]].values, [0.28665890, 0.63524149, 1.01822603, 1.48403930, 1.48403930], decimal=2) + npt.assert_almost_equal(cp.baseline_cumulative_hazard_[(0, 0, 0, 1)].ix[[27, 43, 48, 52]].values, [0.35738173, 0.76415714, 1.26635373, 1.26635373], decimal=2) Ah, I think it's because my estimates of beta are slightly off, and this is just a manifestation of that. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#268>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHM9T9TKXcZY9311ogSPDqAZOnan3p8Dks5rMzALgaJpZM4LXTQA> .

CamDavidsonPilon commented Dec 29, 2016

View reviewed changes

Joey Stockermans and others added 5 commits December 29, 2016 11:37

adding functionality to calculate baseline hazards for each strata

dc654e7

fixed small bug in which nan values in separate stratas cross-contami…

c19d501

…nate into nan values in the strata in which you're estimtaing the survival curve

adding test and updating code

7ab995d

fix whitespace

7ca98e8

fix predict_survival_ to merge and not append

74b865d

CamDavidsonPilon force-pushed the unique_baseline_hazard_for_each_strata branch from 33f680d to 74b865d Compare December 29, 2016 16:37

update changelog

e81c8e9

CamDavidsonPilon merged commit 29cc72e into master Dec 29, 2016

CamDavidsonPilon deleted the unique_baseline_hazard_for_each_strata branch December 29, 2016 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unique baseline hazard for each strata #268

Unique baseline hazard for each strata #268

CamDavidsonPilon commented Dec 29, 2016

CamDavidsonPilon Dec 29, 2016 •

edited

Loading

CamDavidsonPilon Dec 29, 2016

jstoxrocky Dec 29, 2016

CamDavidsonPilon Dec 29, 2016

CamDavidsonPilon Dec 29, 2016

CamDavidsonPilon Dec 29, 2016

CamDavidsonPilon Dec 29, 2016

CamDavidsonPilon Dec 29, 2016

ibarrien commented Dec 29, 2016 via email

Unique baseline hazard for each strata #268

Unique baseline hazard for each strata #268

Conversation

CamDavidsonPilon commented Dec 29, 2016

CamDavidsonPilon Dec 29, 2016 • edited Loading

Choose a reason for hiding this comment

CamDavidsonPilon Dec 29, 2016

Choose a reason for hiding this comment

jstoxrocky Dec 29, 2016

Choose a reason for hiding this comment

CamDavidsonPilon Dec 29, 2016

Choose a reason for hiding this comment

CamDavidsonPilon Dec 29, 2016

Choose a reason for hiding this comment

CamDavidsonPilon Dec 29, 2016

Choose a reason for hiding this comment

CamDavidsonPilon Dec 29, 2016

Choose a reason for hiding this comment

CamDavidsonPilon Dec 29, 2016

Choose a reason for hiding this comment

ibarrien commented Dec 29, 2016 via email

CamDavidsonPilon Dec 29, 2016 •

edited

Loading