# Basic Usage
The falsification pooler identifies experiment conditions under which the loss $\hat{\mathcal{L}}(M,X,Y,\vec{x})$ of the best candidate model is predicted to be the highest. This loss is approximated with a multi-layer perceptron, which is trained to predict the loss of a candidate model, $M$, given experiment conditions $X$  and dependent measures $Y$ that have already been probed.

We begin with importing the relevant packages.

In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression
from autora.variable import DV, IV, ValueType, VariableCollection
from autora.experimentalist.pooler.falsification import falsification_pooler

## Example 1: Sampling fom a Sinus Function

In this example, we will consider a dataset resembling the sine function. We will then fit a linear model to the data and use the falsification pooler to identify experiment conditions under which the model is predicted to perform the worst.

First, we define the independent variable $x$ and the dependent variable $y$. We consider a domain of $x \in [0, 2\pi]$, and sample 100 data points from this domain.

In [2]:
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)

We define the candidate experimental conditons $X'$ from which we seek to sample.

In [3]:
X_prime = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

Next, we need to specify how many samples we would like to collect. In this case, we pick $n=2$.

In [4]:
n = 2

Finally, we can call the novelty sampler. Note that $X'$ is the first argument to the sampler, followed by the "reference" conditions $X$, and the number of samples.

In [5]:
X_sampled = novelty_sampler(condition_pool = X_prime, reference_conditions = X, num_samples = n, metric = "euclidean", integration = "sum")
print(X_sampled)

NameError: name 'novelty_sampler' is not defined

The novelty sampler also works for experiments with multiple indendent variables. In the following example, we define $X$ as a single experimental condition composed of three independent factors. We choose from a pool $X'$ composed of four experimental conditons.

In [None]:
X = np.array([[1, 1, 1]])
X_prime = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

Next, we sample a single experimental condition from the pool $X'$ which yields the greatest summed Euclidean distance to the existing condition in $X$.

In [None]:
X_sampled = novelty_sampler(condition_pool = X_prime, reference_conditions = X, num_samples = 1, metric = "euclidean", integration = "sum")
print(X_sampled)

We can also obtain "novelty" scores for the sampled experiment conditions using ``novelty_score_sampler''. The scores are z-scored with respect to all conditions from the pool. In the following example, we sample 2 conditions and return their novelty scores.

In [None]:
X_sampled, scores = novelty_score_sampler(condition_pool = X_prime, reference_conditions = X, num_samples = 2, metric = "euclidean", integration = "sum")
print(X_sampled)
print(scores)

The novelty scores align with the sampled experiment conditions (in descending order of the novelty score).