# ==== INTERACTIVE CLUSTERING : COMPUTATION TIME STUDY ====
> ### Stage 1 : Initialize computation environments for experiments.

------------------------------
## READ-ME BEFORE RUNNING

### Quick Description

This notebook is **aimed at create environments needed to run computation time study experiments**.
- Environments are represented by subdirectories in the `/experiments` folder. A full path to an experiment environment is `/experiments/[TASK]/[DATASET]/[ALGORITHM]`.
- The path is composed of A. the task concerned (among _preprocessing_, _vectorization_, _sampling_ and _clustering_), B. the dataset used (various size), and C. the algorithm to inspect (impletation and settings of the task)

At beginning of the comparative study, **run this notebook to set up experiments you want**.

Then, **go to the notebook `2_Run_algorithms_and_Estimate_computation_time.ipynb` to run and evaluate each experiment you have set**.

### Description each steps

- 2.1. **Set up `Task` environments**:
    - _Description_: Create a subdirectory for each task to evaluate.
    - _Setting_: A dictionary define all possible configurations of task environments.
    - _Folder content_:
        - `dict_of_texts.json`: texts from the dataset;
        - `config.json`: a json file with all parameters.
    - _Available tasks_:
        - `preprocessing`
        - `vectorization`
        - `sampling`
        - `clustering`
        
- 2.2. **Set up `Dataset` environments**:
    - _Description_: Create a subdirectory, store parameters for the dataset and pre-format dataset for next computations. To get a bigger dataset, fake data can be generate (by adding some spelling errors).
    - _Setting_: A dictionary define all possible configurations of datatset environments.
    - _Folder content_:
        - `dict_of_texts.json`: texts from the dataset;
        - `config.json`: a json file with all parameters.
    - _Available datasets_:
        - [French trainset for chatbots dealing with usual requests on bank cards v2.0.0](http://doi.org/10.5281/zenodo.4769949)
    - _Available dataset settings_:
         - define the dataset size;
         - define the dataset generation random seed.

- 2.3. **Set up `Algorithm` environments**:
    - _Description_: Create a subdirectory and store type of algorithm to test (depending on the task).
    - _Setting_: A dictionary define all possible configurations of preprocessing + vectorization + clustering environments.
    - _Folder content_:
        - `config.json`: a json file with all preprocessing parameters;
        - `computation_time.json`: a json file with estimated computation time.
    - _Available algorithm settings (depending on the task)_:
        - _preprocessing_: _simple_, _lemma_, _filtered_;
        - _vectorization_: _tfidf_, _spacy_;
        - _sampling_: _closest, _farthest_, _2 randoms_;
        - _clustering_: _kmeans_, _spectral_, _4 hierarchicals_;
        - _random seed_.

------------------------------
## 1. IMPORT PYTHON DEPENDENCIES

In [None]:
import os
import faker
import listing_envs
from typing import Any, Dict, List, Tuple
from scipy.sparse import csr_matrix
import pandas as pd
import json
import random
import pickle  # noqa: S403
from cognitivefactory.interactive_clustering.utils.preprocessing import (
    preprocess,
)
from cognitivefactory.interactive_clustering.utils.vectorization import (
    vectorize,
)
from cognitivefactory.interactive_clustering.constraints.factory import (
    managing_factory
)
from cognitivefactory.interactive_clustering.sampling.factory import (
    sampling_factory
)

------------------------------
## 2. CREATE COMPUTATION ENVIRONMENTS

------------------------------
### 2.1. Set `Task` subdirectories

Define environments with different `tasks`.

In [None]:
ENVIRONMENTS_FOR_TASKS: Dict[str, Any] = {
    "preprocessing": {
        "_TYPE": "task",
        "_TASK": "preprocessing",
        "_DESCRIPTION": "cognitivefactory.interactive_clustering.utils.preprocessing",
    },
    "vectorization": {
        "_TYPE": "task",
        "_TASK": "vectorization",
        "_DESCRIPTION": "cognitivefactory.interactive_clustering.utils.vectorization",
    },
    "sampling": {
        "_TYPE": "task",
        "_TASK": "sampling",
        "_DESCRIPTION": "cognitivefactory.interactive_clustering.sampling.factory",
    },
    "clustering": {
        "_TYPE": "task",
        "_TASK": "clustering",
        "_DESCRIPTION": "cognitivefactory.interactive_clustering.clustering.factory",
    },
}

Create `task` environments using `ENVIRONMENTS_FOR_TASKS` configuration dictionary.

In [None]:
### ### ### ### ###
### LOOP FOR ALL ENVIRONMENTS CONFIGURED...
### ### ### ### ###
for ENV_NAME_task, CONFIG_task in ENVIRONMENTS_FOR_TASKS.items():

    ### ### ### ### ###
    ### CREATE AND CONFIGURE ENVIRONMENT.
    ### ### ### ### ###

    # Name the configuration.
    CONFIG_task["_ENV_NAME"] = ENV_NAME_task
    CONFIG_task["_ENV_PATH"] = "../experiments/" + ENV_NAME_task + "/"

    # Check if the environment already exists.
    if os.path.exists(str(CONFIG_task["_ENV_PATH"])):
        continue

    # Create directory for this environment.
    os.mkdir(str(CONFIG_task["_ENV_PATH"]))

    # Store configuration file.
    with open(str(CONFIG_task["_ENV_PATH"]) + "config.json", "w") as file_t1:
        json.dump(CONFIG_task, file_t1)

# End
print("\n#####")
print("END - Task environments configuration.")

------------------------------
### 2.2. Set `Dataset` subdirectories

Select `task` environments in which `create` dataset environments

In [None]:
# Get list of tasks environments.
LIST_OF_TASKS_ENVIRONMENTS: List[str] = listing_envs.get_list_of_tasks_env_paths()
print(
    "There are",
    "`" + str(len(LIST_OF_TASKS_ENVIRONMENTS)) + "`",
    "created tasks environments in `../experiments`",
)
LIST_OF_TASKS_ENVIRONMENTS

Define environments with different `datasets`.

In [None]:
ENVIRONMENTS_FOR_DATASETS: Dict[str, Any] = {
    # Case of bank cards management.
    "bank_cards_v2-size_{size_str}-rand_{rand_str}".format(size_str=size, rand_str=rand): {
        "_TYPE": "dataset",
        "_DESCRIPTION": "This dataset represents examples of common customer requests relating to bank cards management. It can be used as a training set for a small chatbot intended to process these usual requests.",
        "file_name": "French_trainset_for_chatbots_dealing_with_usual_requests_on_bank_cards_v2.0.0.xlsx",
        "sheet_name": "dataset",
        "language": "fr",
        "dataset": "bank_cards_v2",
        "size": size,
        "random_seed": rand,
    }
    for size in [1000, 2000, 3000, 4000, 5000,]
    for rand in [1, 2, 3, 4, 5,]
}

Create `dataset` environments using `ENVIRONMENTS_FOR_DATASETS` configuration dictionary.

In [None]:
### ### ### ### ###
### LOOP FOR ALL ENVIRONMENTS CONFIGURED...
### ### ### ### ###
for PARENT_ENV_PATH_task in LIST_OF_TASKS_ENVIRONMENTS:
    for (
        ENV_NAME_dataset,
        CONFIG_dataset,
    ) in ENVIRONMENTS_FOR_DATASETS.items():

        ### ### ### ### ###
        ### CREATE AND CONFIGURE ENVIRONMENT.
        ### ### ### ### ###

        # Name the configuration.
        CONFIG_dataset["_ENV_NAME"] = ENV_NAME_dataset
        CONFIG_dataset["_ENV_PATH"] = (
            PARENT_ENV_PATH_task + ENV_NAME_dataset + "/"
        )

        # Check if the environment already exists.
        if os.path.exists(str(CONFIG_dataset["_ENV_PATH"])):
            continue

        # Create directory for this environment.
        os.mkdir(str(CONFIG_dataset["_ENV_PATH"]))

        # Store configuration file.
        with open(str(CONFIG_dataset["_ENV_PATH"]) + "config.json", "w") as file_d1:
            json.dump(CONFIG_dataset, file_d1)

        ### ### ### ### ###
        ### LOAD DATASET.
        ### ### ### ### ###

        # Load dataset.
        df_dataset: pd.DataFrame = pd.read_excel(
            io="../../datasets/" + CONFIG_dataset["file_name"],
            sheet_name=CONFIG_dataset["sheet_name"],
            engine="openpyxl",
        )

        ### ### ### ### ###
        ### LOAD DATASET.
        ### ### ### ### ###

        # Load base for `dict_of_texts` and `dict_of_true_intents`.
        # > Force `str` type to avoid typing errors.

        base_dict_of_texts: Dict[str, str] = {
            str(data_id): str(value["QUESTION"])
            for data_id, value in df_dataset.to_dict("index").items()
        }

        base_dict_of_true_intents: Dict[str, str] = {
            str(data_id): str(value["INTENT"])
            for data_id, value in df_dataset.to_dict("index").items()
        }
            
        # Fake dataset if needed (i.e. artificially add data by generating random spelling errors).
        faker_results: Tuple[Dict[str, str], Dict[str, str]] = faker.fake_dataset(
            dict_of_texts=base_dict_of_texts,
            dict_of_true_intents=base_dict_of_true_intents,
            size=CONFIG_dataset["size"],
            random_seed=CONFIG_dataset["random_seed"],
        )
        dict_of_texts: Dict[str, str] = faker_results[0]
        dict_of_true_intents: Dict[str, str] = faker_results[1]

        # Store `dict_of_texts` and `dict_of_true_intents`.
        with open(str(CONFIG_dataset["_ENV_PATH"]) + "dict_of_texts.json", "w") as file_d2:
            json.dump(dict_of_texts, file_d2)

        with open(
            str(CONFIG_dataset["_ENV_PATH"]) + "dict_of_true_intents.json", "w"
        ) as file_d3:
            json.dump(dict_of_true_intents, file_d3)

# End
print("\n#####")
print("END - Dataset environments configuration.")

------------------------------
### 2.3. Set `algorithm` subdirectories

Select `dataset` environments in which create `algorithm` environments

In [None]:
# Get list of dataset environments.
LIST_OF_DATASET_ENVIRONMENTS: List[str] = listing_envs.get_list_of_dataset_env_paths()
print(
    "There are",
    "`" + str(len(LIST_OF_DATASET_ENVIRONMENTS)) + "`",
    "created dataset environments in `../experiments`",
)
LIST_OF_DATASET_ENVIRONMENTS

Define environments with different uses of `algorithm`.

In [None]:
ENVIRONMENTS_FOR_ALGORITHM: Dict[str, Any] = {}

In [None]:
# Case of preprocessing.
for rand in [1, 2, 3, 4, 5,]:
    ENVIRONMENTS_FOR_ALGORITHM.update({
        "simple_prep-rand_{rand_str}".format(rand_str=rand): {
            "_TYPE": "algorithm",
            "_TASK": "preprocessing",
            "_ALGORITHM": "simple_prep",
            "_DESCRIPTION": "Simple preprocessing (lowercase, accents, punctuation, whitspace)",
            "preprocessing": {
                "apply_preprocessing": True,
                "apply_lemmatization": False,
                "apply_parsing_filter": False,
                "spacy_language_model": "fr_core_news_md",
            },
            "random_seed": rand,
        },
        "lemma_prep-rand_{rand_str}".format(rand_str=rand): {
            "_TYPE": "algorithm",
            "_TASK": "preprocessing",
            "_ALGORITHM": "lemma_prep",
            "_DESCRIPTION": "Lemmatized preprocessing (lowercase, accents, punctuation, whitspace, lemmatization)",
            "preprocessing": {
                "apply_preprocessing": True,
                "apply_lemmatization": True,
                "apply_parsing_filter": False,
                "spacy_language_model": "fr_core_news_md",
            },
            "random_seed": rand,
        },
        "filter_prep-rand_{rand_str}".format(rand_str=rand): {
            "_TYPE": "algorithm",
            "_TASK": "preprocessing",
            "_ALGORITHM": "filter_prep",
            "_DESCRIPTION": "Filtered preprocessing (lowercase, accents, punctuation, whitspace, dependency filter)",
            "preprocessing": {
                "apply_preprocessing": True,
                "apply_lemmatization": False,
                "apply_parsing_filter": True,
                "spacy_language_model": "fr_core_news_md",
            },
            "random_seed": rand,
        },
    })

In [None]:
# Case of vectorization.
for rand in [1, 2, 3, 4, 5,]:
    ENVIRONMENTS_FOR_ALGORITHM.update({
        "tfidf-rand_{rand_str}".format(
            rand_str=rand
        ): {
            "_TYPE": "algorithm",
            "_TASK": "vectorization",
            "_ALGORITHM": "tfidf",
            "_DESCRIPTION": "TFIDF vectorization.",
            "preprocessing": {
                "apply_preprocessing": True,
                "apply_lemmatization": False,
                "apply_parsing_filter": False,
                "spacy_language_model": "fr_core_news_md",
            },
            "vectorization": {
                "vectorizer_type": "tfidf",
                "spacy_language_model": None,
            },
            "random_seed": rand,
        },
        "spacy-rand_{rand_str}".format(
            rand_str=rand
        ): {
            "_TYPE": "algorithm",
            "_TASK": "vectorization",
            "_ALGORITHM": "spacy",
            "_DESCRIPTION": "Spacy vectorization.",
            "preprocessing": {
                "apply_preprocessing": True,
                "apply_lemmatization": False,
                "apply_parsing_filter": False,
                "spacy_language_model": "fr_core_news_md",
            },
            "vectorization": {
                "vectorizer_type": "spacy",
                "spacy_language_model": "fr_core_news_md",
            },
            "random_seed": rand,
        },
    })

In [None]:
# Case of sampling.
for nb_to_select in range(50, 251, 50):
    for previous_constraints in range(0, 5001, 500):
        for previous_clustering in range(10, 51, 10):
            for rand in [1, 2, 3, 4, 5,]:
                ENVIRONMENTS_FOR_ALGORITHM.update({
                    "random-select_{select_str}-rand_{rand_str}-prev_const{const_str}_clu{clu_str}".format(
                        select_str=nb_to_select,
                        rand_str=rand,
                        const_str=previous_constraints,
                        clu_str=previous_clustering,
                    ): {
                        "_TYPE": "algorithm",
                        "_TASK": "sampling",
                        "_ALGORITHM": "random",
                        "_DESCRIPTION": "Random sampling, {select_str} combinations to select, {const_str} previous constraints, {clu_str} clusters.".format(
                            select_str=nb_to_select,
                            const_str=previous_constraints,
                            clu_str=previous_clustering,
                        ),
                        "preprocessing": {
                            "apply_preprocessing": True,
                            "apply_lemmatization": False,
                            "apply_parsing_filter": False,
                            "spacy_language_model": "fr_core_news_md",
                        },
                        "vectorization": {
                            "vectorizer_type": "tfidf",
                            "spacy_language_model": None,
                        },
                        "sampling": {
                            "algorithm": "random",
                            "nb_to_select": nb_to_select,
                        },
                        "previous": {
                            "clustering": previous_clustering, 
                            "constraints": previous_constraints,
                        },
                        "random_seed": rand,
                    },
                    "in_same-select_{select_str}-rand_{rand_str}-prev_const{const_str}_clu{clu_str}".format(
                        select_str=nb_to_select,
                        rand_str=rand,
                        const_str=previous_constraints,
                        clu_str=previous_clustering,
                    ): {
                        "_TYPE": "algorithm",
                        "_TASK": "sampling",
                        "_ALGORITHM": "in_same",
                        "_DESCRIPTION": "Random in same cluster sampling, {select_str} combinations to select, {const_str} previous constraints, {clu_str} clusters.".format(
                            select_str=nb_to_select,
                            const_str=previous_constraints,
                            clu_str=previous_clustering,
                        ),
                        "preprocessing": {
                            "apply_preprocessing": True,
                            "apply_lemmatization": False,
                            "apply_parsing_filter": False,
                            "spacy_language_model": "fr_core_news_md",
                        },
                        "vectorization": {
                            "vectorizer_type": "tfidf",
                            "spacy_language_model": None,
                        },
                        "sampling": {
                            "algorithm": "random_in_same_cluster",
                            "nb_to_select": nb_to_select,
                        },
                        "previous": {
                            "clustering": previous_clustering, 
                            "constraints": previous_constraints,
                        },
                        "random_seed": rand,
                    },
                    "closest-select_{select_str}-rand_{rand_str}-prev_const{const_str}_clu{clu_str}".format(
                        select_str=nb_to_select,
                        rand_str=rand,
                        const_str=previous_constraints,
                        clu_str=previous_clustering,
                    ): {
                        "_TYPE": "algorithm",
                        "_TASK": "sampling",
                        "_ALGORITHM": "closest",
                        "_DESCRIPTION": "Closest in different clusters sampling, {select_str} combinations to select, {const_str} previous constraints, {clu_str} clusters.".format(
                            select_str=nb_to_select,
                            const_str=previous_constraints,
                            clu_str=previous_clustering,
                        ),
                        "preprocessing": {
                            "apply_preprocessing": True,
                            "apply_lemmatization": False,
                            "apply_parsing_filter": False,
                            "spacy_language_model": "fr_core_news_md",
                        },
                        "vectorization": {
                            "vectorizer_type": "tfidf",
                            "spacy_language_model": None,
                        },
                        "sampling": {
                            "algorithm": "closest_in_different_clusters",
                            "nb_to_select": nb_to_select,
                        },
                        "previous": {
                            "clustering": previous_clustering, 
                            "constraints": previous_constraints,
                        },
                        "random_seed": rand,
                    },
                    "farthest-select_{select_str}-rand_{rand_str}-prev_const{const_str}_clu{clu_str}".format(
                        select_str=nb_to_select,
                        rand_str=rand,
                        const_str=previous_constraints,
                        clu_str=previous_clustering,
                    ): {
                        "_TYPE": "algorithm",
                        "_TASK": "sampling",
                        "_ALGORITHM": "farthest",
                        "_DESCRIPTION": "Farthest in same cluster sampling, {select_str} combinations to select, {const_str} previous constraints, {clu_str} clusters.".format(
                            select_str=nb_to_select,
                            const_str=previous_constraints,
                            clu_str=previous_clustering,
                        ),
                        "preprocessing": {
                            "apply_preprocessing": True,
                            "apply_lemmatization": False,
                            "apply_parsing_filter": False,
                            "spacy_language_model": "fr_core_news_md",
                        },
                        "vectorization": {
                            "vectorizer_type": "tfidf",
                            "spacy_language_model": None,
                        },
                        "sampling": {
                            "algorithm": "farthest_in_same_cluster",
                            "nb_to_select": nb_to_select,
                        },
                        "previous": {
                            "clustering": previous_clustering, 
                            "constraints": previous_constraints,
                        },
                        "random_seed": rand,
                    },
                })

In [None]:
# Case of clustering.
for nb_clusters in range(5, 51, 5):
    for previous_constraints in range(0, 5001, 500):
        for rand in [1, 2, 3, 4, 5,]:
            ENVIRONMENTS_FOR_ALGORITHM.update({
                "kmeans_COP-clusters_{nb_clusters_str}-rand_{rand_str}-prev_const{const_str}".format(
                    nb_clusters_str=nb_clusters,
                    rand_str=rand,
                    const_str=previous_constraints,
                ): {
                    "_TYPE": "algorithm",
                    "_TASK": "clustering",
                    "_ALGORITHM": "kmeans_COP",
                    "_DESCRIPTION": "KMeans (COP) clustering, {nb_clusters_str} clusters, {const_str} previous constraints.".format(
                        nb_clusters_str=nb_clusters,
                        const_str=previous_constraints,
                    ),
                    "preprocessing": {
                        "apply_preprocessing": True,
                        "apply_lemmatization": False,
                        "apply_parsing_filter": False,
                        "spacy_language_model": "fr_core_news_md",
                    },
                    "vectorization": {
                        "vectorizer_type": "tfidf",
                        "spacy_language_model": None,
                    },
                    "clustering": {
                        "algorithm": "kmeans",
                        "init**kargs": {
                            "model": "COP",
                            "max_iteration": 150,
                            "tolerance": 1e-4,
                        },
                        "nb_clusters": nb_clusters,
                    },
                    "previous": {
                        "constraints": previous_constraints,
                    },
                    "random_seed": rand,
                },
                "hier_ward-clusters_{nb_clusters_str}-rand_{rand_str}-prev_const{const_str}".format(
                    nb_clusters_str=nb_clusters,
                    rand_str=rand,
                    const_str=previous_constraints,
                ): {
                    "_TYPE": "algorithm",
                    "_TASK": "clustering",
                    "_ALGORITHM": "hier_ward",
                    "_DESCRIPTION": "Hierarchical (WARD) clustering, {nb_clusters_str} clusters, {const_str} previous constraints.".format(
                        nb_clusters_str=nb_clusters,
                        const_str=previous_constraints,
                    ),
                    "preprocessing": {
                        "apply_preprocessing": True,
                        "apply_lemmatization": False,
                        "apply_parsing_filter": False,
                        "spacy_language_model": "fr_core_news_md",
                    },
                    "vectorization": {
                        "vectorizer_type": "tfidf",
                        "spacy_language_model": None,
                    },
                    "clustering": {
                        "algorithm": "hierarchical",
                        "init**kargs": {
                            "linkage": "ward",
                        },
                        "nb_clusters": nb_clusters,
                    },
                    "previous": {
                        "constraints": previous_constraints,
                    },
                    "random_seed": rand,
                },
                "hier_average-clusters_{nb_clusters_str}-rand_{rand_str}-prev_const{const_str}".format(
                    nb_clusters_str=nb_clusters,
                    rand_str=rand,
                    const_str=previous_constraints,
                ): {
                    "_TYPE": "algorithm",
                    "_TASK": "clustering",
                    "_ALGORITHM": "hier_average",
                    "_DESCRIPTION": "Hierarchical (AVERAGE) clustering, {nb_clusters_str} clusters, {const_str} previous constraints.".format(
                        nb_clusters_str=nb_clusters,
                        const_str=previous_constraints,
                    ),
                    "preprocessing": {
                        "apply_preprocessing": True,
                        "apply_lemmatization": False,
                        "apply_parsing_filter": False,
                        "spacy_language_model": "fr_core_news_md",
                    },
                    "vectorization": {
                        "vectorizer_type": "tfidf",
                        "spacy_language_model": None,
                    },
                    "clustering": {
                        "algorithm": "hierarchical",
                        "init**kargs": {
                            "linkage": "average",
                        },
                        "nb_clusters": nb_clusters,
                    },
                    "previous": {
                        "constraints": previous_constraints,
                    },
                    "random_seed": rand,
                },
                "hier_complete-clusters_{nb_clusters_str}-rand_{rand_str}-prev_const{const_str}".format(
                    nb_clusters_str=nb_clusters,
                    rand_str=rand,
                    const_str=previous_constraints,
                ): {
                    "_TYPE": "algorithm",
                    "_TASK": "clustering",
                    "_ALGORITHM": "hier_complete",
                    "_DESCRIPTION": "Hierarchical (COMPLETE) clustering, {nb_clusters_str} clusters, {const_str} previous constraints.".format(
                        nb_clusters_str=nb_clusters,
                        const_str=previous_constraints,
                    ),
                    "preprocessing": {
                        "apply_preprocessing": True,
                        "apply_lemmatization": False,
                        "apply_parsing_filter": False,
                        "spacy_language_model": "fr_core_news_md",
                    },
                    "vectorization": {
                        "vectorizer_type": "tfidf",
                        "spacy_language_model": None,
                    },
                    "clustering": {
                        "algorithm": "hierarchical",
                        "init**kargs": {
                            "linkage": "complete",
                        },
                        "nb_clusters": nb_clusters,
                    },
                    "previous": {
                        "constraints": previous_constraints,
                    },
                    "random_seed": rand,
                },
                "hier_single-clusters_{nb_clusters_str}-rand_{rand_str}-prev_const{const_str}".format(
                    nb_clusters_str=nb_clusters,
                    rand_str=rand,
                    const_str=previous_constraints,
                ): {
                    "_TYPE": "algorithm",
                    "_TASK": "clustering",
                    "_ALGORITHM": "hier_single",
                    "_DESCRIPTION": "Hierarchical (SINGLE) clustering, {nb_clusters_str} clusters, {const_str} previous constraints.".format(
                        nb_clusters_str=nb_clusters,
                        const_str=previous_constraints,
                    ),
                    "preprocessing": {
                        "apply_preprocessing": True,
                        "apply_lemmatization": False,
                        "apply_parsing_filter": False,
                        "spacy_language_model": "fr_core_news_md",
                    },
                    "vectorization": {
                        "vectorizer_type": "tfidf",
                        "spacy_language_model": None,
                    },
                    "clustering": {
                        "algorithm": "hierarchical",
                        "init**kargs": {
                            "linkage": "single",
                        },
                        "nb_clusters": nb_clusters,
                    },
                    "previous": {
                        "constraints": previous_constraints,
                    },
                    "random_seed": rand,
                },
                "spectral_SPEC-clusters_{nb_clusters_str}-rand_{rand_str}-prev_const{const_str}".format(
                    nb_clusters_str=nb_clusters,
                    rand_str=rand,
                    const_str=previous_constraints,
                ): {
                    "_TYPE": "algorithm",
                    "_TASK": "clustering",
                    "_ALGORITHM": "spectral_SPEC",
                    "_DESCRIPTION": "Spectral (SPEC) clustering, {nb_clusters_str} clusters, {const_str} previous constraints.".format(
                        nb_clusters_str=nb_clusters,
                        const_str=previous_constraints,
                    ),
                    "preprocessing": {
                        "apply_preprocessing": True,
                        "apply_lemmatization": False,
                        "apply_parsing_filter": False,
                        "spacy_language_model": "fr_core_news_md",
                    },
                    "vectorization": {
                        "vectorizer_type": "tfidf",
                        "spacy_language_model": None,
                    },
                    "clustering": {
                        "algorithm": "spectral",
                        "init**kargs": {
                            "model": "SPEC",
                        },
                        "nb_clusters": nb_clusters,
                    },
                    "previous": {
                        "constraints": previous_constraints,
                    },
                    "random_seed": rand,
                },
            })

Create `algorithm` environments using `ENVIRONMENTS_FOR_ALGORITHM` configuration dictionary.

In [None]:
### ### ### ### ###
### LOOP FOR ALL ENVIRONMENTS CONFIGURED...
### ### ### ### ###
for PARENT_ENV_PATH_dataset in LIST_OF_DATASET_ENVIRONMENTS:
    for (
        ENV_NAME_algorithm,
        CONFIG_algorithm,
    ) in ENVIRONMENTS_FOR_ALGORITHM.items():

        ### ### ### ### ###
        ### CREATE AND CONFIGURE ENVIRONMENT.
        ### ### ### ### ###
        
        # Create environments only if its the good task !
        if CONFIG_algorithm["_TASK"] != PARENT_ENV_PATH_dataset.split("/")[2]:
            continue

        # Name the configuration.
        CONFIG_algorithm["_ENV_NAME"] = ENV_NAME_algorithm
        CONFIG_algorithm["_ENV_PATH"] = (
            PARENT_ENV_PATH_dataset + ENV_NAME_algorithm + "/"
        )

        # Check if the environment already exists.
        if os.path.exists(str(CONFIG_algorithm["_ENV_PATH"])):
            continue

        # Create directory for this environment.
        os.mkdir(str(CONFIG_algorithm["_ENV_PATH"]))

        # Store configuration file.
        with open(
            str(CONFIG_algorithm["_ENV_PATH"]) + "config.json", "w"
        ) as file_a1:
            json.dump(CONFIG_algorithm, file_a1)


# End
print("\n#####")
print("END - Algorithm environments configuration.")

------------------------------
### 3. Get all created environments

In [None]:
# Get list of experiment environments.
LIST_OF_EXPERIMENT_ENVIRONMENTS: List[
    str
] = listing_envs.get_list_of_algorithm_env_paths()
print(
    "There are",
    "`" + str(len(LIST_OF_EXPERIMENT_ENVIRONMENTS)) + "`",
    "created experiment environments in `../experiments`",
)
#LIST_OF_EXPERIMENT_ENVIRONMENTS