# Example of Topic Model Optimization 

In this notebook, we illustrate how optimize the hyper-parameters of the LDA model trough the Bayesian Optimization.

In [None]:
Colab configuration

In [None]:
!pip install octis
from pathlib import Path
Path("/content/M10").mkdir(parents=True, exist_ok=True)
!wget -P /content/M10/ https://raw.githubusercontent.com/MIND-Lab/OCTIS/master/octis/preprocessed_datasets/M10/corpus.txt
!wget -P /content/M10/ https://raw.githubusercontent.com/MIND-Lab/OCTIS/master/octis/preprocessed_datasets/M10/labels.txt
!wget -P /content/M10/ https://raw.githubusercontent.com/MIND-Lab/OCTIS/master/octis/preprocessed_datasets/M10/metadata.json
!wget -P /content/M10/ https://raw.githubusercontent.com/MIND-Lab/OCTIS/master/octis/preprocessed_datasets/M10/vocabulary.txt
import sys
sys.path.insert(0,'/content/OCTIS')

Load the libraries.

In [None]:
from octis.models.LDA import LDA
from octis.dataset.dataset import Dataset
from octis.optimization.optimizer import Optimizer
from skopt.space.space import Real,Categorical
from octis.evaluation_metrics.coherence_metrics import Coherence

Choose a dataset.

In [None]:
dataset = Dataset()
dataset.load_custom_dataset_from_folder("/content/M10")

Choose a model.

In [None]:
model = LDA(num_topics=25, iterations=200)

Choose the metric function to optimize.

In [None]:
metric_parameters = {
        'texts': dataset.get_corpus(),
        'topk': 10,
        'measure': 'c_npmi'
}
npmi = Coherence(metric_parameters)

Create the search space for optimization.

In [None]:
search_space = {
   "alpha": Real(low=0.001, high=5.0),
   "eta": Real(low=0.001, high=5.0)
}

Select the path where the results (json file) will be saved.

In [None]:
save_path='results/test1/'

Select the number of iterations and model runs (for each iteration).

In [None]:
number_of_call=5
model_runs=3

Launch the optimization.

In [None]:
optimizer=Optimizer()
optimization_result = optimizer.optimize(model,dataset, npmi,search_space,
                                         number_of_call=number_of_call,
                                         model_runs=model_runs,
                                         save_path=save_path)

You can save the main results of the optimization in a csv file.

In [None]:
optimization_result.save_to_csv("results.csv")