# Quickstart


`AutoEmulate`'s goal is to make it easy to create an emulator for your simulation. Here's the basic workflow:

In [None]:
import numpy as np
import random
import torch
from autoemulate.compare import AutoEmulate
from autoemulate_design import LatinHypercube
from autoemulate.simulations.projectile import simulate_projectile

In [None]:
seed = 43 
np.random.seed(seed)
random.seed(seed)
_ = torch.manual_seed(seed)

## Design of Experiments

Before we build an emulator or surrogate model, we need to get a set of input/output pairs from the simulation. This is called the **Design of Experiments (DoE)** and is currently not a key part of `AutoEmulate`, as this step is tricky to automate and will run on more complex compute infrastructure for expensive simulations. There are lots of sampling techniques, but here we are using Latin Hypercube Sampling.   

Below, `simulate_projectile` is a simulation for a projectil motion with drag (see [here](https://mogp-emulator.readthedocs.io/en/latest/intro/tutorial.html) for details). It takes two inputs, the drag coefficient (on a log scale) and the velocity and outputs the distance the projectile travelled. We sample 100 sets of inputs `X` using a Latin Hypercube Sampler and run the simulator for those inputs to get the outputs `y`.

In [None]:
# sample from a simulation
lhd = LatinHypercube([(-5., 1.), (0., 1000.)]) # (upper, lower) bounds for each parameter
X = lhd.sample(100)
y = np.array([simulate_projectile(x) for x in X])
X.shape, y.shape

## Comparing emulators

This is the core of `AutoEmulate`. With a set of inputs / outputs, we can run a full machine learning pipeline, including data processing, model fitting, model selection and potentially hyperparameter optimisation in just a few lines of code. First, we initialise an `AutoEmulate` object. Then, we run `setup(X, y)`, providing the simulation inputs and outputs. Lastly, `compare()` will fit a range of different models to the data and evaluate them using cross-validation, returning the best emulator.

In [None]:
# compare emulator models
ae = AutoEmulate()
ae.setup(X, y)
ae.compare()

We can have a look at the average cross-validation results for each model:

In [None]:
ae.summarise_cv()

And create plots comparing the models:

In [None]:
ae.plot_cv()

## Evaluating on the test set

`AutoEmulate` has already split the data into a training set and a test set. After looking at the cross-validation results, we can retrieve a fitted emulator and evaluate it on the test set. The GP predicts well on unseen data.

In [None]:
gp = ae.get_model("GaussianProcess")
ae.evaluate(gp)

But it's always useful to plot the predictions too.

In [None]:
ae.plot_eval(gp, input_index=[0, 1])

## Refitting the emulator

Before applying the emulator, we refit it on the entire dataset, including training and test set. This is done with the `refit()` method.


In [None]:
gp_final = ae.refit(gp)

## Predictions

We can use the best model to make predictions for new inputs. Emulators in `AutoEmulate` are `scikit-learn` estimators, so we can use the `predict` method to make predictions. 

In [None]:
gp_final.predict(X[:10])