# Automatic Inference
I implement the procedure described in [Farrell, Liang, and Misra (2021)](https://arxiv.org/abs/2010.14694). I make use of R code provided in the Causal Machine Learning course offered in the Fall of 2023 by Max Farrell and Sanjog Misra.

Our data consists of $\boldsymbol{X}$ (`dnn_features`), $Y$ (`outcomes`), and $\boldsymbol{Z}$ (`structural_features`). We define a set of parameter functions $\boldsymbol{\theta}(\boldsymbol{X})$ (`structural_parameters`). The estimand is the expected value of a statistic $\boldsymbol{H}$: $\mu_0 = \mathbb{E}[\boldsymbol{H}(\boldsymbol{X},\boldsymbol{\theta}(\boldsymbol{X}); \boldsymbol{Z})]$. The outcome variable $Y$ is linked to the parameter functions $\mathbf{\theta}(\cdot)$ by the equality $\mathbb{E}[Y | \mathbf{X} = \mathbf{x}, \mathbf{Z} = \mathbf{z}] = G(\mathbf{\theta}(\boldsymbol{X}), \boldsymbol{Z})$, where we call $G(\cdot, \cdot)$ the structural layer (`structural_layer`).

When projecting the hessian of the loss function onto $\boldsymbol{X}$ for the estimation of $\boldsymbol{\Lambda}(\boldsymbol{X})$, it is sometimes possible to avoid estimation. For example, with a linear $G(\boldsymbol{\theta}(\boldsymbol{X}), \boldsymbol{Z})$ and squared loss, we can compute the hessian directly. This code does _not_ account for such possibilities and will rely on automatic differentiation for the hessian and a DNN for the projection of this hessian onto X

### To do:
| Task | Status | Notes |
|-|-|-|
| Accommodate vector statistics | Not started | Check use of `statistic_dim` in `automatic_inference.py` |
| Create a general-use function for sample splits | Not started | Need to check handling of remainder cases |
| Add diagnostics for structural parameter estimation | Not started | Start with code for CML final |


## 0. Libraries

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

import automatic_inference as auto_inf

import numpy as np
# import torch.linalg as linalg
# import matplotlib.pyplot as plt
import math

## 1. Data generating process (DGP)

In [2]:
torch.manual_seed(12345)  # Set the seed for reproducibility

N = 18000  # observation count
K = 2  # feature count

# Draw independent features from standard normal
dnn_features = torch.randn(N, K)

# Build the structural parameters from features
structural_parameters = torch.cat(
    (dnn_features[:, 0].view(N, 1), 3 + dnn_features[:, 1].view(N, 1)), dim=1
)
structural_parameters_dim = structural_parameters.shape[1]

# The structural feature is a binary treatment indicator
structural_features = 1 * (torch.randn(N, 1) > 0).view(N, 1)


# Define the correspondence between structural parameters and structural features
# We use a linear correspondence here; let's not get too crazy
# CM: this is a pretty common structural layer, so I should move it to the .py file
def structural_layer(structural_parameters, structural_features):
    structural_layer_eval = structural_parameters[:, 0:1] + torch.sum(
        (structural_features * structural_parameters[:, 1:]), axis=1, keepdim=True
    )

    return structural_layer_eval


# Calculate outcomes (structural component + noise)
outcomes_structural = structural_layer(structural_parameters, structural_features)
outcomes = outcomes_structural + torch.randn(N, 1)

In [3]:
# CM: create a function for sample splits
# CM: check N / splits != integer case

# Create splits
perm = torch.randperm(N)  # create a permutation of the indices

num_splits = 3  # number of splits

split_size = N // num_splits  # compute the size of each split

splits = []  # store splits in a list of dictionaries
for s in range(num_splits):
    indices = perm[s * split_size : (s + 1) * split_size]

    # Use indices to create a split
    split = {
        "dnn_features": dnn_features[indices],
        "structural_features": structural_features[indices],
        "outcomes": outcomes[indices],
        "structural_parameters": structural_parameters[indices],
    }

    # Add the split to the list of splits
    splits.append(split)

## 2. Estimation

### 2.1. Estimate structural parameters

In [4]:
# Hyperparameters
hidden_sizes = [30, 30]
dropout_rate = 0.0
learning_rate = 5e-3
weight_decay = 0.0  # no L2 regularization
# num_epochs = 2000
num_epochs = 2000

# Initialize loss function; we use mean squared error
loss_function = nn.MSELoss(reduction = "mean")

# We will initialize the model and optimizer in each loop below

In [5]:
print("Estimating structural parameters")

models_structural_parameters = []  # trained models for structural parameters

for split in range(num_splits):
    print(f"Split {split + 1}")

    # Initialize neural network
    model = auto_inf.DeepNeuralNetworkReLU(
        input_dim=K,
        hidden_sizes=hidden_sizes,
        output_dim=structural_parameters_dim,
        dropout_rate=dropout_rate,
    )

    # Initialize optimizer; we use stochastic gradient descent
    optimizer = optim.SGD(
        model.parameters(), lr=learning_rate, weight_decay=weight_decay
    )

    model_fit = auto_inf.train_dnn(
        splits[split]["dnn_features"],
        splits[split]["structural_features"],
        splits[split]["outcomes"],
        structural_layer,
        model,
        loss_function,
        optimizer,
        num_epochs,
    )

    models_structural_parameters.append(model)

Estimating structural parameters
Split 1


100%|██████████| 2000/2000 [00:02<00:00, 750.75it/s]


Split 2


100%|██████████| 2000/2000 [00:02<00:00, 750.62it/s]


Split 3


100%|██████████| 2000/2000 [00:02<00:00, 749.61it/s]


In [6]:
# CM: add diagnostics for structural parameter estimation

### 1.2. Estimate the conditional expectation of the Hessian of the loss function, $\boldsymbol{\Lambda}(\boldsymbol{X})$
That is quite the mouthful.

In [7]:
models_expected_hessian = []  # trained models for expected hessians

models_expected_hessian.append(
    auto_inf.estimate_expected_hessian(
        splits,
        0,
        models_structural_parameters,
        2,
        nn.MSELoss(reduction="mean"),
        structural_layer,
        hidden_sizes,
        dropout_rate,
        learning_rate,
        weight_decay,
        num_epochs,
    )
)
models_expected_hessian.append(
    auto_inf.estimate_expected_hessian(
        splits,
        1,
        models_structural_parameters,
        0,
        nn.MSELoss(reduction="mean"),
        structural_layer,
        hidden_sizes,
        dropout_rate,
        learning_rate,
        weight_decay,
        num_epochs,
    )
)
models_expected_hessian.append(
    auto_inf.estimate_expected_hessian(
        splits,
        2,
        models_structural_parameters,
        1,
        nn.MSELoss(reduction="mean"),
        structural_layer,
        hidden_sizes,
        dropout_rate,
        learning_rate,
        weight_decay,
        num_epochs,
    )
)

Split 1 with structural parameter DNN 3
Element (0, 0)


100%|██████████| 2000/2000 [00:02<00:00, 826.22it/s]


Element (0, 1)


100%|██████████| 2000/2000 [00:02<00:00, 858.49it/s]


Reflecting to element (1, 0)
Element (1, 1)


100%|██████████| 2000/2000 [00:02<00:00, 838.47it/s]




Split 2 with structural parameter DNN 1
Element (0, 0)


100%|██████████| 2000/2000 [00:02<00:00, 856.54it/s]


Element (0, 1)


100%|██████████| 2000/2000 [00:02<00:00, 867.03it/s]


Reflecting to element (1, 0)
Element (1, 1)


100%|██████████| 2000/2000 [00:02<00:00, 879.74it/s]




Split 3 with structural parameter DNN 2
Element (0, 0)


100%|██████████| 2000/2000 [00:02<00:00, 865.85it/s]


Element (0, 1)


100%|██████████| 2000/2000 [00:02<00:00, 848.83it/s]


Reflecting to element (1, 0)
Element (1, 1)


100%|██████████| 2000/2000 [00:02<00:00, 863.93it/s]








### 1.3. Estimate influence function

In [8]:
def statistic(x, theta, z):
    N = theta.size(0)
    return theta[:, 1].view(N, 1)

In [9]:
estimates_influence_function = []

estimates_influence_function.append(
    auto_inf.estimate_influence_function(
        splits,
        0,
        models_structural_parameters,
        1,
        nn.MSELoss(reduction="mean"),
        models_expected_hessian,
        2,
        structural_layer,
        statistic,
    )
)
estimates_influence_function.append(
    auto_inf.estimate_influence_function(
        splits,
        1,
        models_structural_parameters,
        2,
        nn.MSELoss(reduction="mean"),
        models_expected_hessian,
        0,
        structural_layer,
        statistic,
    )
)
estimates_influence_function.append(
    auto_inf.estimate_influence_function(
        splits,
        2,
        models_structural_parameters,
        0,
        nn.MSELoss(reduction="mean"),
        models_expected_hessian,
        1,
        structural_layer,
        statistic,
    )
)

## 3. Results

In [10]:
# # Concatenate influence function values across splits and store as np array
estimates_influence_function_np = np.concatenate(
    [
        estimates_influence_function[0].detach().numpy(),
        estimates_influence_function[1].detach().numpy(),
        estimates_influence_function[2].detach().numpy(),
    ]
)

# Calculate estimate and standard error from concatenated influence function
est = estimates_influence_function_np.mean()
se = math.sqrt(estimates_influence_function_np.var() / N)

In [11]:
# Report results
print('Mean:', round(est, 4))
print('S.E.:', round(se, 4))
print('95% CI: [', round(est - 1.96 * se, 4), ', ',
      round(est + 1.96 * se, 4), ']', sep = '')

Mean: 2.9894
S.E.: 0.0168
95% CI: [2.9565, 3.0222]
