# Sampling

How would you choose $n$ observations from a total of $N$ to effectively estimate (say) the accuracy of a classifier? For example, imagine that our budget is limited and we can only annotate $n=100$ examples from data of size $N=10^{7}$! 

In this notebook, we show how to 
* Sample via simple random sampling (SRS) and stratified simple random sampling (SSRS) with proportional and Neyman allocation, all without replacement
* Estimate the metric of interest $\mathbb{E}[Z]$ with the Horvitz-Thompson (HT) and difference (DF) estimators

Besides estimating the value of the metric, we also computs its variance, which would allow us to create confidence intervals for the estimates.  

We focus on estimating the precision of the binary accuracy of a multi-class classifier. Other evaluation metrics can be estimated in a similar way. 




## Load the data

Consider a multi-class classification task on ImageNet-A. Predictions are generated by a CLIP model with ViT-L-14 as visual encoder. Let's load the packages as well as predictions $(m_1(X), \dots, m_K(X))$ because that's all we have right now. You can also plug in your own data! 

In [20]:
import torch
import numpy as np
import pandas as pd
from cascade import ModelPerformanceEvaluator
from sklearn.cluster import KMeans

In [21]:
data_folder = "../data/predictions/"
data_name = "imagenet-a/ViT-L-14_zero_shot.pt"
df = torch.load(data_folder + data_name)

preds = np.array(df["PredictionProb"]) # model predictions
budget = 100 # sample size survey
total_sample_size = len(preds) # total siperformancee of the data 

We take performance to be the binary accuracy of the classifier and we will try to estimate its value on the dataset. 

### 1. Predict performance

We obtain an estimate of the expected performance for each observation, that is we construct a proxy $\hat{Z}$ for $Z$. This proxy can be based on _anything_, but, the more strongly associated it is with $Z$, the more precise our estimates of $\mathbb{E}[Z]$ will be. 

In this notebook we use the predictions of the model under evaluation to construct $\hat{Z}$. This means that we set $\hat{Z} = \arg \max_{k\in [K]} m_k(X)$. Ideally, we may want to at least calibrate these predictions. Let's skip this step here.  

In [22]:
predicted_labels = np.argmax(preds, axis=1)
proxy_performance = preds[np.arange(len(predicted_labels)), predicted_labels]

### 2. Stratify

SSRS requires dividing the population into strata, from which we will select which samples should be annotated. We form the strata by running k-means on the predictions, following the recommendations from the paper. However, other sample designs can be applied here as well, e.g., the strata could be formed by running a Gaussian mixture model on the feature representations of the data obtained from a neural network. 

In [23]:
evaluator = ModelPerformanceEvaluator(proxy_performance=proxy_performance, budget = budget)
evaluator.stratify_data(features=proxy_performance, clustering_algorithm=KMeans(n_clusters=10, random_state=0, n_init='auto'))

### 3. Sample

We now sample from the data with SRS and SSRS with optimal allocation. In practice, you would choose only strategy.

In [24]:
srs_evaluator = ModelPerformanceEvaluator(proxy_performance=proxy_performance, budget = budget)
sample_indices_srs = srs_evaluator.sample_data(sampling_method = 'srs')

# optimal allocation
evaluator.allocate_budget(allocation_type="proportional")
sample_indices_ssrs = evaluator.sample_data(sampling_method="ssrs")

### 4. Annotate

Pretend that in this step we annotate the selected samples. Here they are (luckily) already available in the torch file. 

In [25]:
performance = (predicted_labels == np.array(df['Target']))

### 5. Estimate

Now we can estimate the performance on our data subset!

In [30]:

estimates = pd.DataFrame({
    'SRS-HT': srs_evaluator.compute_estimate(performance[sample_indices_srs], estimator="ht"), 
    'SRS-DF': srs_evaluator.compute_estimate(performance[sample_indices_srs], estimator="df"), 
    'SSRS-opt-HT': evaluator.compute_estimate(performance[sample_indices_ssrs], estimator="ht")
})

estimates.index = ['estimate', 'variance']
estimates.T

Unnamed: 0,estimate,variance
SRS-HT,0.42,0.002428
SRS-DF,0.415192,0.002
SSRS-opt-HT,0.454791,0.002146
