# Batch aware query strategies

These strategies keep in mind that the model will query multiple samples simultaneously, and tries to pick samples that don't give the same info.

## The core set approach

A **core set** is a subset of a  dataset, such that when a model is trained on the subset it will produce a function that is close to a model resulting from training on the entire dataset. The idea is to only select samples from the core set. Concrete we should **choose a batch such that when added to the labeled set, the maximum distance between an unlabeled example and a labeled example is minimized.**

In [1]:
# imports

import numpy as np

from skactiveml.classifier import SklearnClassifier
from skactiveml.pool import KLDivergenceMaximization
from skactiveml.utils import MISSING_LABEL

from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.preprocessing import LabelBinarizer
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier

In [4]:
def al_batch(clf, batch_size=1, iterations=1, data_size=100):
    base_clf = clf
    data = []
    qs = KLDivergenceMaximization()
    for rand in range(iterations):
        # Create the data
        X, y = make_classification(n_samples=data_size*4, n_features=2, n_redundant=0, weights=[0.8,0.2], random_state=rand)
        Xf, Xt, yf, yt = train_test_split(X, y, random_state=rand);
        clf = SklearnClassifier(base_clf, classes=np.unique(yf))
        y = np.full(shape=yf.shape, fill_value=MISSING_LABEL)

        clf.fit(Xf, y)
        out = []
        for _ in range(int(data_size/5)):
            query_idx = qs.query(Xf, y, reg=clf, batch_size=batch_size)
            y[query_idx] = yf[query_idx]
            clf.fit(Xf, y)
            out.append(clf.score(Xt, yt))
        data.append(out)
    return np.mean(np.array(data), axis=0)

In [5]:
data = al_batch(LogisticRegression(), batch_size=5, data_size=150)



TypeError: `reg`  has type `<class 'skactiveml.classifier._wrapper.SklearnClassifier'>`, but must have type `<class 'skactiveml.base.ProbabilisticRegressor'>`.