In [10]:
from logregutils import LogRegModel

# Ebola pediatric diagnostic model

In [11]:
# Load the Ebola pediatric diagnostic model from the paper 
# https://pubmed.ncbi.nlm.nih.gov/35608611/

ebola_diag_model = LogRegModel("ebola-pediatric-diagnosis/model.csv")

# This is a simple logistic regression with all linear terms, we can get the formula of the model with getFormula(n)
# where n is the number of digits in the coefficients. This formula f(x), where x is the vector of features, is used
# to calculate the prediction with the sigmoid function:
# P(y = 1 | x) = 1 / (1 + exp(-f(x)))

print(ebola_diag_model.getFormula(4))

-3.5771 + 3.5537 EbolaContactYes + 1.8815 EbolaContactUnknown + 2.016 AnyBleedingYes - 1.1941 AbdominalPainYes


# Ebola pediatric prognostic model

In [12]:
# Load the Ebola pediatric prognostic model from the paper 
# https://pubmed.ncbi.nlm.nih.gov/36223331/

ebola_prog_model = LogRegModel("ebola-pediatric-prognosis/model.csv")

# This is a more complicated model where patient age and CT (Cycle Threshold, a measure of viral load) are represented by Restricted Cubic Spline (RCS) terms.
print(ebola_prog_model.getFormula(4))

6.5164 - 0.3806 PatientAge + 0.0015 max(PatientAge - 2.0, 0)^3 - 0.0036 max(PatientAge - 10.0, 0)^3 + 0.0021 max(PatientAge - 16.0, 0)^3 - 0.2139 CT + 0.0006 max(CT - 18.6, 0)^3 - 0.001 max(CT - 25.2, 0)^3 + 0.0004 max(CT - 34.5, 0)^3 + 0.3245 AnyBleeding + 0.2672 Diarrhoea + 0.3624 Breathlessness + 0.427 SwallowingProblems


## Understanding RCS terms

In the result from getFormula(), the RCS terms are fully "expanded", for example in the formula above, we have for patient age:

```- 0.3806 PatientAge + 0.0015 max(PatientAge - 2.0, 0)^3 - 0.0036 max(PatientAge - 10.0, 0)^3 + 0.0021 max(PatientAge - 16.0, 0)^3```

This formula comes from the following general form of a RCS term of order 3 involving its coefficients and knots as follows:

```RCS(x, {c0, c1}, {k0, k1, k2}) = c0 * x + c1 * ( (p3(x - k0) - p3(x - k1) * (k2 - k0)/(k2 - k1) + p3(x - k2) * (k1 - k0)/(k2 - k1)) / (k2 - k0)^2 )```

where x is the predictor variable, {c0, c1} the coefficients, {k0, k1, k2} the knots, and the p3(u) function is defined as: 

```p3(u) = max(0, u)^3```

For more details about RCS, see the following course notes from the Regression Modelling Strategies book:

https://hbiostat.org/rmsc/genreg.html#sec-rcspline