# Minimising computation time

`AutoEmulate` can be slow if the input data has many observations (rows) or many output variables. By default, `AutoEmulate` cross-validates each model, so we're computing 5 fits per models. The computation time will be relatively short for datasets up to a few thousands of datapoints, but some models (e.g. Gaussian Processes) don't scale well, so computation time might quickly become an issue. 

In this tutorial we walk through four strategies to speed up `AutoEmulate`:

1) parallise model fits using `n_jobs` 
2) restrict the range of models using `model_subset` 
3) run fewer cross validation folds using `cross_validator` 
4) for hyperparameter search:
    - all of the above
    - run fewer iterations using `param_search_iters`

In [11]:
from sklearn.datasets import make_regression
from autoemulate.compare import AutoEmulate

Let's make a dataset.

In [12]:
X, y = make_regression(n_samples=500, n_features=10, n_targets=5)
X.shape, y.shape

((500, 10), (500, 5))

And see how long `AutoEmulate` takes to run (without hyperparameter search).

In [13]:
import time

start = time.time()

em = AutoEmulate()
em.setup(X, y)
em.compare()

end = time.time()
print(f"Time taken: {end - start} seconds")

Unnamed: 0,Values
Simulation input shape (X),"(500, 10)"
Simulation output shape (y),"(500, 5)"
# hold-out set samples (test_set_size),100
Do hyperparameter search (param_search),False
Type of hyperparameter search (search_type),random
# sampled parameter settings (param_search_iters),20
Scale data before fitting (scale),True
Scaler (scaler),StandardScaler
Dimensionality reduction before fitting (reduce_dim),False
Dimensionality reduction method (dim_reducer),PCA


Initializing:   0%|          | 0/11 [00:00<?, ?it/s]

Time taken: 69.29615592956543 seconds


### 1) parallise model fits using `n_jobs`
The n_jobs parameter allows you to specify the number of CPU cores to use for parallel processing. Setting n_jobs = -1  uses all available cores, speeding up computations when working with large datasets.

Note: Maxing out all available cores might not always lead to faster computation times. Due to overhead from parallelization, memory bandwidth limitations, and potential load imbalances, using more cores can sometimes result in diminishing returns or even slower performance.

Here we accomplish a speed-up by setting n_jobs to 5.

In [14]:
start = time.time()

em = AutoEmulate()
em.setup(X, y, n_jobs=5, print_setup=False)
em.compare()

end = time.time()
print(f"Time taken: {end - start} seconds")

Initializing:   0%|          | 0/11 [00:00<?, ?it/s]

Time taken: 23.07688879966736 seconds



### 2) restrict the range of models using `model_subset` 

Another approach is to limit the range of models by selecting a subset of relevant types based on your domain and problem expertise. This selection process typically considers factors such as the nature of the problem, data characteristics or the need for interpretability. By narrowing down the types of models, you can reduce the computational burden and focus on the most promising architectures for your specific task.

In [6]:
em = AutoEmulate()
em.setup(X, y, print_setup=False)

# let's see all models
em.print_model_names()

# setup with fewer models
start = time.time()

em.setup(X, y, model_subset=["sop", "rbf", "gb"], print_setup=False)
em.compare()

end = time.time()
print(f"Time taken: {end - start} seconds")

Unnamed: 0,short name
SecondOrderPolynomial,sop
RadialBasisFunctions,rbf
RandomForest,rf
GradientBoosting,gb
GaussianProcess,gp
SupportVectorMachines,svm
LightGBM,lgbm
PyTorchMultiLayerPerceptron,ptmlp
PyTorchRadialBasisFunctionsNetwork,ptrbfn
NeuralNetSk,nns


Initializing:   0%|          | 0/3 [00:00<?, ?it/s]

Time taken: 2.798367977142334 seconds


### 3) run fewer cross validation folds using `cross_validator` 

With larger datasets, you might initially want to set the number of folds for the cross validation to 3 instead of 5 (the default), so that there are fewer models to fit. `AutoEmulate` takes a `cross_validator` argument, which takes an scklearn cross validator or [splitter](https://scikit-learn.org/stable/api/sklearn.model_selection.html). Let's use kfold with 3 splits, which saves 2 model fits per model.

In [15]:
from sklearn.model_selection import KFold

start = time.time()

em = AutoEmulate()
em.setup(X, y, cross_validator=KFold(n_splits=3), print_setup=False)
em.compare()

end = time.time()
print(f"Time taken: {end - start} seconds")

Initializing:   0%|          | 0/11 [00:00<?, ?it/s]

Time taken: 32.2603919506073 seconds


### 4) modify hyperparameter search

If we want to use hyperparameter search, we suddenly have to fit many more models. For each model, we might have 20 different parameter combinations, and because we cross validate each combination, we are running 20 * 5 = 100 model fits per model. It's therefore recommended to focus on a few models of interest when using hyperparameter search.

To get a ballpark figure for how long hyperparameter search might take, we can run `AutoEmulate` without hyperparameter search, and then multiply the time taken by the number of parameter combinations we want to try.

In [8]:
start = time.time()

em = AutoEmulate()
em.setup(X, y, print_setup=False)
em.compare()

end = time.time()
run_time = end - start
print(f"Time taken: {run_time} seconds")

Initializing:   0%|          | 0/11 [00:00<?, ?it/s]

Time taken: 48.87414813041687 seconds


The default number parameter combinations to search over (see`param_search_iters`) is 20, so we can expect hyperparameter search to take `20 * run_time` seconds. Although this can be sped up by running in parallel or running fewer cross-validation folds, we are usually interested only in some emulator models anyway, which will speed up computation time. To figure out which models to optimise, let's inspect the cv results from the training data.

In [16]:
em.print_results()

Unnamed: 0,model,short,r2,rmse
0,RadialBasisFunctions,rbf,1.0,0.0
1,SecondOrderPolynomial,sop,1.0,0.0
2,GaussianProcess,gp,0.9999,1.3354
3,ConditionalNeuralProcess,cnp,0.996,10.4046
4,NeuralNetSk,nns,0.9377,40.0966
5,SupportVectorMachines,svm,0.8984,55.6115
6,LightGBM,lgbm,0.878,61.048
7,GradientBoosting,gb,0.8602,65.4266
8,RandomForest,rf,0.6716,100.154
9,PyTorchMultiLayerPerceptron,ptmlp,-0.0065,175.3482


Now, we might like to see whether the Support Vector Machines model could do better with better hyperparameters. To minimise computation time, we only take those this one model, and run 50 iterations (instead of the default 20), we run them in parallel using 5 jobs, and we run 3 fold cross validation (instead of the default 5).

In [31]:
start = time.time()

em_svm = AutoEmulate()
em_svm.setup(X, y, model_subset=["svm"], param_search=True, param_search_iters=50, 
         n_jobs=5, cross_validator=KFold(n_splits=3), print_setup=False)
em_svm.compare()
em_svm.print_results()

end = time.time()
print(f"Time taken: {end - start} seconds")

Initializing:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,model,short,r2,rmse
0,SupportVectorMachines,svm,0.9974,8.807


Time taken: 0.40414905548095703 seconds


### todo: inspect hyperparameters

### Additional information

`AutoEmulate` version: 0.1.0.post1