# $\pmb{\beta}_D$

## $\pmb{\beta}_D$ Manipulation analysis:
*   low $\beta_0$: higher chances of survival
*   low $\beta_1$: higher chances of survival under t=1 <- perhaps we can change D to be $\beta_1*(1-t)$ to control monotonicity!
*   low $\beta_2$: higher chances of survival for positive X, and the opposite of negative

In [1]:
from typing import Optional, List

import numpy as np
import pandas as pd
from IPython.core.display import display
from numpy import random
from sklearn.linear_model import LogisticRegression

from consts import default_random_seed
from estimations import estimate_beta_d_from_realizations
from sample_generation import create_sample

random.seed(default_random_seed)

In [2]:
def manipulate_beta_d():
    beta_d_list = [
    ([0.0, 0.0, 0.0], "beta's=0 -> 1/(1+e^0) -> death is random"),
    ([0.0, -5.0, 1.0], "low beta_1 -> higher chances of survival under t=1 -> mainly P and AS"),
    ([0.0, 10.0, 1.0], "high beta_1 -> lower chances of survival under t=1 -> mainly H and D"),
    ([0.0, 10.0, 10.0], "high beta_1 + high beta_2 -> lower chances of survival under t=1 -> mainly H and D"),
    ([-10.0, 10.0, 1.0], "high beta_1 + low beta_0 -> mainly H and AS"),
    ([-10.0, -10.0, 1.0], "low beta_1 + low beta_0 -> mainly AS"),
    ([-2.0, -2.0, 1.0], "low beta_1 + low beta_0 -> mainly AS, less extreme"),
    ]

    for beta_d_i, desc in beta_d_list:
        df = create_sample(beta_d = beta_d_i)
        print(f"for beta_D {beta_d_i} ({desc}):")
        display(pd.DataFrame({"count":df.stratum.value_counts(),"%":df.stratum.value_counts(normalize=True)*100}))
        print("\n\n")

    # Understand X effect (beta_2)

In [3]:
manipulate_beta_d()

for beta_D [0.0, 0.0, 0.0] (beta's=0 -> 1/(1+e^0) -> death is random):


Unnamed: 0,count,%
AS,259,25.9
P,253,25.3
D,249,24.9
H,239,23.9





for beta_D [0.0, -5.0, 1.0] (low beta_1 -> higher chances of survival under t=1 -> mainly P and AS):


Unnamed: 0,count,%
AS,511,51.1
P,481,48.1
H,5,0.5
D,3,0.3





for beta_D [0.0, 10.0, 1.0] (high beta_1 -> lower chances of survival under t=1 -> mainly H and D):


Unnamed: 0,count,%
H,516,51.6
D,484,48.4





for beta_D [0.0, 10.0, 10.0] (high beta_1 + high beta_2 -> lower chances of survival under t=1 -> mainly H and D):


Unnamed: 0,count,%
D,482,48.2
H,480,48.0
AS,38,3.8





for beta_D [-10.0, 10.0, 1.0] (high beta_1 + low beta_0 -> mainly H and AS):


Unnamed: 0,count,%
H,507,50.7
AS,493,49.3





for beta_D [-10.0, -10.0, 1.0] (low beta_1 + low beta_0 -> mainly AS):


Unnamed: 0,count,%
AS,1000,100.0





for beta_D [-2.0, -2.0, 1.0] (low beta_1 + low beta_0 -> mainly AS, less extreme):


Unnamed: 0,count,%
AS,848,84.8
P,134,13.4
H,14,1.4
D,4,0.4







## $\pmb{\beta}_D$ Estimation:

### Estimating $\pmb{\beta}_D$ from the observed realizations

In [4]:
estimate_beta_d_from_realizations([0.01, -5.0, 1.0])
print("\n****\n")
estimate_beta_d_from_realizations([-2.0, -2.0, 1.0])

beta_d_hat: [-0.07, -4.1, 0.94]
(True beta_d: [0.01, -5.0, 1.0])

****

beta_d_hat: [-1.99, -1.68, 0.63]
(True beta_d: [-2.0, -2.0, 1.0])


[-1.99, -1.68, 0.63]

### Using both D's (potential outecomes)



Using sklearn's LogisticRegression.<br>
Note that for both $D(0)$ and $D(1)$-> there is only $\beta_0 + \beta_2\cdot x$:<br>


*   $D(0)$: $\beta_0 + \beta_1\cdot 0 + \beta_2\cdot x = \beta_0 + \beta_2\cdot x$<br>
*   $D(1)$: $\beta_0 + \beta_1\cdot 1 + \beta_2\cdot x = \underbrace{\beta_0 +  \beta_1}_{\beta_0'} + \beta_2\cdot x = \beta_0' + \beta_2\cdot x$

Final $\hat{\pmb{\beta}}_D$ will be assembled like so:<br>


*   $\hat{\beta}_0 = \hat{\beta}_0(0)$
*   $\hat{\beta}_1 = \underbrace{\hat{\beta}_0(1)}_{:=\beta_0'} - \hat{\beta}_0(0)$
*   $\hat{\beta}_2 = $ Average of $\hat{\beta}_2(0)$ and $\hat{\beta}_2(1)$


In [5]:
def estimate_beta_d(true_beta_d_for_estimation: List[float],
                    df: Optional[pd.DataFrame]=None):
    if df == None:
        df = create_sample(beta_d = true_beta_d_for_estimation)

    features = [[x_i] for x_i in list(df.x)]
    y = list(df.D0)
    clf = LogisticRegression(random_state=0).fit(features, y)
    beta_d_hat_t0 = [round(float(clf.intercept_),2)] + [round(beta,2) for beta in list(clf.coef_[0])]
    print("for T=0:")
    print(f"beta_d_hat: {beta_d_hat_t0}")

    # unique, counts = np.unique(clf.predict(features), return_counts=True)
    # print(f"values count for y_hat: {dict(zip(unique, counts))}")
    # unique, counts = np.unique(y, return_counts=True)
    # print(f"values count for : {dict(zip(unique, counts))}")

    y = list(df.D1)
    clf = LogisticRegression(random_state=0).fit(features, y)
    beta_d_hat_t1 = [round(float(clf.intercept_),2)] + [round(beta,2) for beta in list(clf.coef_[0])]
    print("for T=0:")
    print(f"beta_d_hat: {beta_d_hat_t1}")

    combined_beta_d = [beta_d_hat_t0[0], beta_d_hat_t1[0]-beta_d_hat_t0[0], np.mean([beta_d_hat_t0[1], beta_d_hat_t1[1]])]
    combined_beta_d = [round(beta,2) for beta in combined_beta_d]
    print(f"\nCombining both: {combined_beta_d}")
    print(f"(True beta_d: {true_beta_d_for_estimation})")



In [6]:
estimate_beta_d([0.01, -5.0, 1.0])
print("\n****\n")
estimate_beta_d([-2.0, -2.0, 1.0])


for T=0:
beta_d_hat: [-0.05, 0.87]
for T=0:
beta_d_hat: [-4.82, -0.0]

Combining both: [-0.05, -4.77, 0.44]
(True beta_d: [0.01, -5.0, 1.0])

****

for T=0:
beta_d_hat: [-1.97, 1.16]
for T=0:
beta_d_hat: [-4.07, 0.71]

Combining both: [-1.97, -2.1, 0.94]
(True beta_d: [-2.0, -2.0, 1.0])
