<p align="center">
  <img src="https://www.verifia.ca/assets/logo.png" width="160px" alt="VerifIA Logo"/><br>
  <strong>© 2025 VerifIA. All rights reserved.</strong>
</p>

### VerifIA - Model Verification: Hotel Cancellation Prediction with Scikit-Learn

In this notebook we address a hotel cancellation prediction problem. We use a dataset of hotel bookings and build classification pipelines (using models such as SVC and Random Forest). The notebook demonstrates how to tune model hyperparameters, wrap the best model with VerifIA’s standardized interface, and then verify its rule consistency against a domain configuration—either generated automatically using AI-powered domain generation or loaded from a YAML file.

In [None]:
!pip install requests
!pip install scikit-optimize
!pip install "../../dist/verifia-0.1.0-py3-none-any.whl[genflow]"

#### 0. Download Resources

Before running any other cells, make sure you have all required resources:

In [2]:
!curl -sL https://tinyurl.com/r6m2zk87 -o downloader.py

In [3]:
from downloader import download_resource
url = 'https://www.verifia.ca/assets/use-cases/'
download_resource(url+'data/hotel_cancellation.csv', 
                  dest_dir='../data')
download_resource(url+'domains/hotel_cancellation.yaml', 
                  dest_dir='../domains')
download_resource(url+'documents/hotel_cancellation.zip', 
                  dest_dir='../documents/hotel_cancellation')

{'extracted': True,
 'files': ['articles/',
  'articles/End-to-End Hotel Booking Cancellation Machine Learning Model.pdf',
  'articles/eXplainable predictions for booking cancellation.pdf',
  'articles/Hotel Booking Cancellation Prediction Using Applied Bayesian Models.pdf',
  'articles/hotel booking cancellation prediction.pdf',
  'articles/hotel booking demand datasets.pdf',
  'articles/Modeling and Forecasting Hotel Booking Cancellations.pdf',
  'data_report.pdf',
  'domain_definition_meeting_notes.pdf',
  'domain_definition_report.pdf',
  'feature_selection_report.pdf',
  'sensitivity_analysis_meeting_notes.pdf',
  'sensitivity_analysis_report.pdf']}

#### 1. Importing Libraries and Setting Up

First, we import necessary libraries for data processing (Pandas, NumPy), model building (scikit-learn), hyperparameter tuning (skopt’s BayesSearchCV and RandomizedSearchCV), and VerifIA modules (for model wrapping, verification, and domain generation).

In [None]:
%load_ext autoreload
%autoreload 2
import os
import getpass
import pandas as pd
from skopt import BayesSearchCV
from skopt.callbacks import DeadlineStopper, DeltaYStopper
from scipy.stats import loguniform
from sklearn import svm
from verifia.models import SKLearnModel, build_from_model_card
from verifia.verification.results import RulesViolationResult
from verifia.context.data import Dataset
from verifia.verification.verifiers import RuleConsistencyVerifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler, OneHotEncoder, RobustScaler
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer

#### 2. Defining Constants and Loading the Data

Key constants are defined, including a random seed for reproducibility, model directory paths, and the data file path. The hotel cancellation dataset is then loaded from a CSV file, and the target variable (*is_canceled*) is separated from the feature columns. Categorical features are identified from the dataset based on data type.


In [5]:
RAND_SEED = 0
MODELS_DIRPATH = "../models"
DATA_PATH = "../data/hotel_cancellation.csv"
dataframe = pd.read_csv(DATA_PATH)
target_name = "is_canceled"
feature_names = set(dataframe.columns) - {target_name}
cat_feature_names = set(dataframe.select_dtypes(include=["object"]).columns) - {target_name}

#### 3. Building the Model Wrapper

Using the `build_from_model_card` function from VerifIA, we create a `SKLearnModel` wrapper. This wrapper encapsulates essential metadata such as the model’s name, version, type (classification), feature names, target name, and local directory. The wrapper serves as a standardized interface between your model and the VerifIA verification framework.

In [7]:
model_wrapper:SKLearnModel = build_from_model_card({
    "name": "hotel_cancellation",
    "version": "2",
    "type": "classification",
    "description": "model predicts the hotel cancellations.",
    "framework": "sklearn",
    "feature_names": feature_names,
    "cat_feature_names": cat_feature_names,
    "target_name": target_name,
    "local_dirpath": MODELS_DIRPATH
})

#### 4. Preparing the Dataset

The dataset is transformed into a VerifIA `Dataset` object. This object ensures that the data is properly formatted for model training, evaluation, and subsequent rule verification. The dataset is then split into training and testing subsets (80/20 split) to allow for both hyperparameter tuning and final evaluation.

In [8]:
dataset = Dataset(dataframe, model_wrapper.target_name, 
                  model_wrapper.feature_names, 
                  model_wrapper.cat_feature_names)
train_dataset, test_dataset = dataset.split(0.8, RAND_SEED)

#### 5. Creating Classification Pipelines

Two types of pipelines are constructed:

- **SVC Pipeline:**  
  A pipeline is built using a `ColumnTransformer` that scales numerical features (using RobustScaler) and one-hot encodes categorical features. An SVC classifier (with probability estimates enabled) is then added to the pipeline.

- **Random Forest Pipeline:**  
  Another pipeline is created with the same preprocessing steps followed by a RandomForestClassifier (or alternatively, a GradientBoostingClassifier).  

Hyperparameter tuning is set up for each model using different search strategies:
- For SVC, a randomized search is performed over parameters such as kernel type, regularization (C), and gamma.
- For Random Forest, a Bayesian search is defined over various hyperparameters (e.g., number of estimators, max depth, minimum samples split, etc.).


In [6]:
preprocessor = ColumnTransformer(transformers=[
    ("num", RobustScaler(), train_dataset.num_feature_idxs),
    ("cat", OneHotEncoder(handle_unknown='ignore'), train_dataset.cat_feature_idxs)
])

svc = svm.SVC(probability=True, random_state=RAND_SEED)
    
svc_pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('predictor', svc) 
])

In [8]:
cv_splits_count, max_trials = 2, 1
search_dict = {'predictor__kernel': ['linear', 'rbf'], 
               'predictor__C': loguniform(1, 1000),
               'predictor__gamma': loguniform(0.0001, 0.1)}

skf = StratifiedKFold(n_splits=cv_splits_count, shuffle=True, random_state=RAND_SEED)
search_func = RandomizedSearchCV(estimator=svc_pipeline,
                                param_distributions=search_dict,
                                n_iter=max_trials,
                                scoring="f1",
                                cv=skf,
                                verbose=10,
                                n_jobs=-1)

In [11]:
X = train_dataset.feature_data()
y = train_dataset.target_data
svc_pipeline.fit(X, y)

: 

In [9]:
preprocessor = ColumnTransformer(transformers=[
    ("num", RobustScaler(), train_dataset.num_feature_idxs),
    ("cat", OneHotEncoder(handle_unknown='ignore'), train_dataset.cat_feature_idxs)
])

rf = RandomForestClassifier(random_state=RAND_SEED)

# gb = GradientBoostingClassifier(random_state=RAND_SEED)

pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('predictor', rf)  # gb
])

In [10]:
cv_splits_count, max_trials, n_hparams_at_trial = 5, 10, 3
skf = StratifiedKFold(n_splits=cv_splits_count, shuffle=True, random_state=RAND_SEED)
search_spaces = {
                    'predictor__n_estimators': [10, 50, 100, 200, 400, 600, 800],
                    'predictor__max_depth': [None, 4, 6, 8, 10, 20, 30, 40, 50],
                    'predictor__min_samples_split': [2, 5, 10, 15, 20],
                    'predictor__min_samples_leaf': [1, 3, 5, 7, 10, 15],
                    'predictor__min_weight_fraction_leaf': [0.0, 0.1, 0.2, 0.3, 0.4],
                    'predictor__max_features': ['sqrt', 'log2'],
                    'predictor__max_leaf_nodes': [None, 10, 20, 30, 40, 50],
                    'predictor__min_impurity_decrease': [0.0, 0.1, 0.2, 0.3, 0.4],
                }
hparams_tuner = BayesSearchCV(estimator=pipeline,                                    
                    search_spaces=search_spaces,                      
                    scoring='f1',                                  
                    cv=skf,                               # number of splits for cross-validation            
                    n_iter=max_trials,                                # max number of trials
                    n_points=n_hparams_at_trial,                      # number of hyperparameter sets evaluated at the same time
                    iid=False,                                        # if not iid it optimizes on the cv score
                    return_train_score=False,                         
                    refit=False,                                      
                    optimizer_kwargs={'base_estimator': 'GP'},        # optmizer parameters: we use Gaussian Process (GP)
                    n_jobs=-1,                                      
                    random_state=RAND_SEED) 

In [11]:
counter = 1
def onstep(res):
    global counter
    x0 = res.x_iters   # List of input points
    y0 = res.func_vals # Evaluation of input points
    print(f'Last eval #{counter}: {x0[-1]}', 
          f' - Score {y0[-1]:.3f}')
    print(f' - Best Score {res.fun:.3f}',
          f' - Best Args: {res.x}')
    counter += 1

overdone_control = DeltaYStopper(delta=0.0001)               # We stop if the gain of the optimization becomes too small
time_limit_control = DeadlineStopper(total_time=60 * 45)     # We impose a time limit (45 minutes)

callbacks=[overdone_control, time_limit_control, onstep]

In [12]:
X = train_dataset.feature_data()
y = train_dataset.target_data
hparams_tuner.fit(X, y, callback=callbacks)

hparams_evals_count = len(hparams_tuner.cv_results_['params'])
best_score = hparams_tuner.best_score_
best_score_std = pd.DataFrame(hparams_tuner.cv_results_).iloc[hparams_tuner.best_index_].std_test_score
best_params = hparams_tuner.best_params_
print(f"candidates checked: {hparams_evals_count}, best CV score: {best_score:.3f}, best_score_std:{best_score_std:.3f}")
print(f"best_params: {best_params}")

Last eval #1: [20, 'sqrt', 50, 0.2, 5, 15, 0.3, 100]  - Score -0.000
 - Best Score -0.000  - Best Args: [10, 'log2', 30, 0.3, 15, 20, 0.0, 10]
Last eval #2: [30, 'log2', 40, 0.0, 7, 10, 0.0, 200]  - Score -0.653
 - Best Score -0.653  - Best Args: [30, 'log2', 40, 0.0, 7, 10, 0.0, 200]
Last eval #3: [40, 'log2', None, 0.1, 10, 15, 0.1, 100]  - Score -0.000
 - Best Score -0.653  - Best Args: [30, 'log2', 40, 0.0, 7, 10, 0.0, 200]
Last eval #4: [4, 'log2', 20, 0.2, 10, 15, 0.4, 100]  - Score -0.000
 - Best Score -0.653  - Best Args: [30, 'log2', 40, 0.0, 7, 10, 0.0, 200]
candidates checked: 10, best CV score: 0.653, best_score_std:0.007
best_params: OrderedDict([('predictor__max_depth', 30), ('predictor__max_features', 'log2'), ('predictor__max_leaf_nodes', 40), ('predictor__min_impurity_decrease', 0.0), ('predictor__min_samples_leaf', 7), ('predictor__min_samples_split', 10), ('predictor__min_weight_fraction_leaf', 0.0), ('predictor__n_estimators', 200)])


In [13]:
pipeline.set_params(**best_params)
pipeline.fit(X, y)

#### 6. Wrap the Model and Evaluate Its Performance

We wrap the chosen model pipeline (either the default pipeline or the SVC-based pipeline) using our standardized model wrapper. Then, we calculate the model's predictive performance on the test dataset and print the resulting performance metric name and score. This step ensures that our trained model meets the evaluation criteria before proceeding to domain rule verification.

In [14]:
model_wrapper.wrap_model(pipeline) # or svc_pipeline 
metric_name, metric_score = model_wrapper.calculate_predictive_performance(test_dataset)
print(f"Test Performance Metric : {metric_name}={metric_score}")

Test Performance Metric : F1-Score=0.6343638525564804


#### 7. Loading or Generating the Domain Configuration

VerifIA allows you to create a domain configuration in two way. With the domain configuration available (either generated or loaded), we instantiate the `RuleConsistencyVerifier`. This verifier uses the domain rules and constraints to evaluate whether the model’s predictions on the test data are consistent with our domain knowledge.

##### **Option A: Predefined Domain File:**  
A pre-defined YAML file (e.g., "hotel_cancellation.yaml") can be loaded directly to provide the domain constraints and rules.

In [15]:
DOMAIN_PATH = f"../domains/hotel_cancellation.yaml"
model_verifier = RuleConsistencyVerifier(DOMAIN_PATH)

##### **Option B: AI-Powered Domain Generation:**  
Using the `DomainGenFlow` module, the domain configuration can be generated automatically by providing the training data, descriptive domain knowledge (from external documents located in a PDF directory), and the model card.
  

**Setup OpenAI and LangSmith Keys**

Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above, make sure to set your environment variables to start logging traces.

Accessing the OpenAI API requires an API key, which you can get by creating an account. Once you have a key you'll want to set it as an environment variable by running:

In [None]:
os.environ["LANGCHAIN_TRACING_V2"] = 'true'
os.environ["LANGCHAIN_ENDPOINT"] = 'https://api.smith.langchain.com'
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass(prompt='Your LANGCHAIN_API_KEY? ')
os.environ["OPENAI_API_KEY"] = getpass.getpass(prompt='Your OPENAI_API_KEY? ')
os.environ["USER_AGENT"] = 'my_agent'
os.environ["LANGCHAIN_PROJECT"] = 'VERIFIA_TEST'
os.environ["VERIFIA_GPT_NAME"] = 'gpt-4.1'

: 

In [None]:
from verifia.generation import DomainGenFlow

DOMAIN_PDF_DIRPATH = "../documents/hotel_cancellation"
domain_genflow = DomainGenFlow()
domain_genflow.load_ctx(dataframe=train_dataset.data, 
                        pdfs_dirpath=DOMAIN_PDF_DIRPATH,
                        model_card=model_wrapper.model_card.to_dict())
domain_cfg_dict = domain_genflow.run()
model_verifier = RuleConsistencyVerifier(domain_cfg_dict=domain_cfg_dict)

OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

#### 8. Running the Verification Process

The verifier is connected to the wrapped model and test dataset. In this example, the verification is executed using a Random Sampler (RS) as the search algorithm with specified parameters:
- **Population Size:** 50
- **Maximum Iterations:** 10
- **Original Seed Size:** 100

This process explores the input space to identify any instances where the model’s predictions violate the established rules.


In [16]:
result:RulesViolationResult = model_verifier.verify(model_wrapper)\
                                            .on(test_dataset.data)\
                                            .using("RS")\
                                            .run(pop_size=4, max_iters=3, orig_seed_size=20)

Processing Original Inputs: 17it [00:00, 21.33it/s]it/s]
Processing Original Inputs: 17it [00:00, 21.16it/s]3,  1.25it/s]
Processing Original Inputs: 17it [00:00, 21.31it/s]2,  1.24it/s]
Processing Original Inputs: 17it [00:00, 19.24it/s]2,  1.24it/s]
Processing Original Inputs: 17it [00:00, 18.98it/s]1,  1.19it/s]
Processing Original Inputs: 17it [00:00, 20.43it/s]1,  1.16it/s]
Processing Original Inputs: 17it [00:00, 21.43it/s]0,  1.17it/s]
Processing Original Inputs: 17it [00:01, 14.82it/s]9,  1.20it/s]
Processing Original Inputs: 17it [00:00, 19.93it/s]9,  1.07it/s]
Processing Original Inputs: 17it [00:00, 17.34it/s]8,  1.10it/s]
Processing Original Inputs: 17it [00:00, 18.13it/s]07,  1.07it/s]
Processing Original Inputs: 17it [00:00, 18.61it/s]06,  1.07it/s]
Processing Original Inputs: 17it [00:01, 14.98it/s]05,  1.07it/s]
Processing Original Inputs: 17it [00:01, 12.91it/s]04,  1.01it/s]
Processing Original Inputs: 17it [00:01, 13.05it/s]04,  1.09s/it]
Processing Original Inputs: 

#### 9. Saving the Verification Report and Model Artifacts

Finally, the verification results are saved as an HTML report, which provides a detailed summary of rule compliance and any detected inconsistencies. Additionally, the trained model and its model card are saved for future reference, ensuring that your work is reproducible and that the model’s verification status is archived.


In [17]:
result.save_as_html("../reports/hotel_cancellation.html")

In [18]:
model_wrapper.save_model()
model_wrapper.save_model_card("../models/hotel_cancellation.yaml")