# Sequential Testing of Contexts
This notebook will provide a quick demonstration on how to use the sequential testing function and interpret the results. This is useful because we can find which features are the most context dependent, as well as which contexts are most important.

We do this by iterating through each context, predictor, and target combination to see if context matters for that feature. We determine this by calculating the p-values for the effects of context.

In [1]:
import numpy as np
import pandas as pd
from contextualized.analysis.pvals import test_sequential_contexts
from contextualized.easy import ContextualizedRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes

import logging
logging.getLogger("pytorch_lightning").setLevel(logging.ERROR)

## Gather Data for Training

In [5]:
n_samples = 1000
n_outcomes = 1
n_context = 2
n_observed = 2

C = np.random.uniform(-1, 1, size=(n_samples, n_context))
C = pd.DataFrame(C, columns=[f'context_{i}' for i in range(n_context)])

X_coefficient = 0.5
X_intercept = 0.2
X = np.random.uniform(-1, 1, size=(n_samples, n_observed))
X = pd.DataFrame(X, columns=[f'observed_{i}' for i in range(n_observed)])

# phi = np.random.uniform(-1, 1, size=(n_context, n_observed, n_outcomes)) # making a 3D tensor
phi = np.arange(n_context * n_observed * n_outcomes).reshape(n_context, n_observed, n_outcomes)
beta = np.tensordot(C, phi, axes=1) + np.random.normal(0, 0.01, size=(n_samples, n_observed, n_outcomes))
Y = np.array([np.tensordot(X[i], beta[i], axes=1) for i in range(n_samples)])

In [6]:
# converting to pandas dataframe
C_train_df = pd.DataFrame(C)
X_train_df = pd.DataFrame(X)
Y_train_df = pd.DataFrame(Y)

## Using the sequential testing function with a ContextualizedRegressor

In [7]:
%%capture
pvals = test_sequential_contexts(ContextualizedRegressor, C_train_df, X_train_df, Y_train_df, encoder_type="mlp", max_epochs=3, learning_rate=1e-2, n_bootstraps=10)

## Analyzing results

Below, the displayed p-values are found from the sequential testing of context features on predictor variables for a given target variable. These p-values are calculated based on the consistency of the sign of effects across multiple bootstraps. A p-value close to 0 suggests a strong consistency and significance in the effect, indicating whether the relationship is consistently positive or negative. This analysis enables us to quantify the uncertainty associated with the bootstrap confidence intervals.

In [8]:
pvals

Unnamed: 0,Context,Predictor,Target,Pvals
0,0,0,0,0.090909
1,0,1,0,0.181818
2,1,0,0,0.363636
3,1,1,0,0.454545


In [10]:
print(phi[:5])

[[[0]
  [1]]

 [[2]
  [3]]]
