### Looking at subset of activations with DeepScan

This notebook shows the code steps to:

1. Select hyperparameters to run deepscan.
2. Load $H_0$ and $H_1$ activations.
3. Extract activations across several layers.
4. Run DeepScan and save output files and metrics.

In [1]:
from deepscan.util.sampler import Sampler
from deepscan.util.pvalranges_calculator import PvalueCalculator
from deepscan.util.utils import scan_write_metrics, customsort
from tqdm import tqdm
from multiprocessing import Pool
import numpy as np

The hyperparameters defined below are:
1. Type of run, `individual` looks at one statement at a time and yields the activations that make that sample the most divergent. `group` yields the subset of statements that are the most divergent with a fixed subset of activations.
2. Scoring function: In this case, `hc` stands for [Higher-Criticism test](https://arxiv.org/pdf/math/0410072). DeepScan has implemented several scoring functions. ' hc tends to yield smaller subsets with very extreme p-value ranges, as this would produce large values in the numerator and smaller ones in the denominator compared to `bj`.
3. Proportions of samples during evaluation time are defined in `clean_ssize` and `anom_ssize`.

In [2]:
PATH_NEG = "./data/activations/llama3_negative_31_anti-immigration.npy"
PATH_POS = "./data/activations/llama3_positive_31_anti-immigration.npy"
typerun = "group"
scoring = "hc"
model = "Meta-Llama-3-8B-Instruct"
dataset = "anti-immigration"

In [3]:
size = 50
bg = np.load(PATH_NEG, allow_pickle=True)
abnormal = np.load(PATH_POS, allow_pickle=True)

clean = bg[:size]
bg = bg[size:]

print(bg.shape, abnormal.shape, clean.shape)

(250, 4096) (300, 4096) (50, 4096)


In [4]:
runs = {
    "group": {
        "clean": {"clean_ssize": 50, "anom_ssize": 0},
        "abnormal": {"clean_ssize": 25, "anom_ssize": 25},
    }
}

The steps to run DeepScan are:
1. Create expectation for $H_0$ with `PvalueCalculator(bg)`.
2. Obtain pvalues for samples in test set from both $H_0$ and $H_1$ `pvalcalculator.get_pvalue_ranges()`.
3. Do a random sampling with the sizes defined in the hyper parameters  `Sampler.sample()`.
4. and then call the scanning search, scoring and output saving with `scan_write_metrics()`.

In [7]:
number_runs = 100

for key in ["clean", "abnormal"]:
    print("Run for key: {}".format(key))
    clean_ssize = runs[typerun][key]["clean_ssize"]
    anom_ssize = runs[typerun][key]["anom_ssize"]

    if (clean_ssize != 1 and typerun == "group") or (
        clean_ssize == 1 and typerun == "individual"
    ):
        resultsfile = "./output/clean_output_{}_{}.txt".format(dataset, scoring)
    if anom_ssize != 0:
        resultsfile = "./output/adv_output_{}_{}.txt".format(dataset, scoring)

    bg = customsort(bg, conditional=False)
    pvalcalculator = PvalueCalculator(bg)

    records_pvalue_ranges = pvalcalculator.get_pvalue_ranges(clean, pvaltest="1tail")
    anom_records_pvalue_ranges = pvalcalculator.get_pvalue_ranges(
        abnormal, pvaltest="1tail"
    )

    if anom_ssize == 1 and clean_ssize == 0:
        run = anom_records_pvalue_ranges.shape[0]

    elif clean_ssize == 1 and anom_ssize == 0:
        run = records_pvalue_ranges.shape[0]

    samples, _ = Sampler.sample(
        records_pvalue_ranges,
        anom_records_pvalue_ranges,
        clean_ssize,
        anom_ssize,
        number_runs,
        conditional=False,
    )

    pool = Pool(processes=5)
    calls = []

    for r_indx in range(number_runs):
        pred_classes = None
        run_sampled_indices = None
        sampled_indices = None

        calls.append(
            pool.apply_async(
                scan_write_metrics,
                [
                    samples[r_indx],
                    pred_classes,
                    clean_ssize,
                    anom_ssize,
                    resultsfile,
                    1,
                    False,
                    None,
                    scoring,
                    -1,
                    run_sampled_indices,
                ],
            )
        )

    print("Beginning Scanning...")
    for sample in tqdm(calls):
        sample.get()

Run for key: clean
Beginning Scanning...


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:02<00:00, 37.43it/s]


Run for key: abnormal
Beginning Scanning...


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:02<00:00, 39.14it/s]


In [6]:
from utils.utils_nodes import get_anom_nodes

PATH_OUTPUT_ABN = "output/adv_output_{}_{}.txt".format(dataset, scoring)
PATH_OUTPUT_CLN = "output/clean_output_{}_{}.txt".format(dataset, scoring)

_, _, precision, recall, _ = get_anom_nodes(PATH_OUTPUT_ABN)
print(
    "Precision for {} - {}".format(model, dataset),
    np.array(precision).mean(),
    np.array(precision).std(),
)
print(
    "Recall for {} - {}".format(model, dataset),
    np.array(recall).mean(),
    np.array(recall).std(),
)

Precision for Meta-Llama-3-8B-Instruct - anti-immigration 0.802633755838154 0.06394636294421967
Recall for Meta-Llama-3-8B-Instruct - anti-immigration 0.8488888888888889 0.1213467266075249
