<p align="center">
  <img src="https://www.verifia.ca/assets/logo.png" width="160px" alt="VerifIA Logo"/><br>
  <strong>© 2025 VerifIA. All rights reserved.</strong>
</p>

### VerifIA - Model Verification: House Pricing Prediction with CatBoost

In this notebook we address a house pricing prediction problem using a CatBoost regression model. We tune the model with Bayesian hyperparameter optimization (using BayesSearchCV) on our training data. Next, we wrap the trained model with VerifIA’s CBModel wrapper and generate a domain configuration—using either AI-powered domain generation or loading from a YAML file—to verify that the model’s predictions adhere to the expected domain rules.

In [None]:
!pip install requests
!pip install scikit-optimize
!pip install "../../dist/verifia-0.1.0-py3-none-any.whl[catboost, genflow]"

#### 0. Download Resources

Before running any other cells, make sure you have all required resources:

In [2]:
!curl -sL https://tinyurl.com/r6m2zk87 -o downloader.py

In [3]:
from downloader import download_resource
url = 'https://www.verifia.ca/assets/use-cases/'
download_resource(url+'data/house_price.csv', 
                  dest_dir='../data')
download_resource(url+'domains/house_price.yaml', 
                  dest_dir='../domains')
download_resource(url+'documents/house_price.zip', 
                  dest_dir='../documents/house_price')

{'extracted': True,
 'files': ['articles/',
  'articles/Housing Data.pdf',
  'articles/National comprehensive housing market analysis.pdf',
  'articles/United States Housing Market.pdf',
  'articles/US Compared to Canada 2.pdf',
  'articles/US Compared to Canada 3.pdf',
  'data_report.pdf',
  'domain_definition_meeting_notes.pdf',
  'domain_definition_report.pdf',
  'feature_selection.pdf',
  'sensitivity_analysis_meeting_notes.pdf',
  'sensitivity_analysis_report.pdf']}

#### 1. Importing Libraries and Setting Up

We start by importing necessary libraries including Pandas, CatBoost, and modules from skopt and VerifIA. 

In [None]:
%load_ext autoreload
%autoreload 2
import os
import getpass
import pandas as pd
import catboost as cb 
from skopt import BayesSearchCV
from skopt.callbacks import DeadlineStopper, DeltaYStopper
from verifia.models import CBModel, build_from_model_card
from verifia.verification.results import RulesViolationResult
from verifia.context.data import Dataset
from verifia.verification.verifiers import RuleConsistencyVerifier

#### 2. Data Loading

We define constants such as the random seed, model directory paths, and the data file path. The house pricing dataset is loaded from a CSV file, with the target variable *price* separated from the remaining feature columns.

In [6]:
RAND_SEED = 0
MODELS_DIRPATH = "../models"
DATA_PATH = "../data/house_price.csv"
dataframe = pd.read_csv(DATA_PATH)
target_name = "price"
feature_names = set(dataframe.columns) - {target_name}

#### 3. Building the CatBoost Model Wrapper

Using VerifIA’s `build_from_model_card`, we create a `CBModel` instance. This wrapper encapsulates essential metadata (model name, version, type, feature names, target name, and local directory) and will later be used to link our CatBoost model with the verification framework.

In [7]:
model_wrapper:CBModel = build_from_model_card({
    "name": "house_price",
    "version": "2",
    "type": "regression",
    "description": "model predicts the price of houses",
    "framework": "catboost",
    "feature_names": feature_names,
    "target_name": target_name,
    "local_dirpath": MODELS_DIRPATH
})

#### 4. Preparing the Dataset

The loaded DataFrame is converted into a VerifIA `Dataset` object. This object organizes the data along with its feature and target information, and it automatically detects any categorical features. We then split the dataset into training and testing subsets (using an 80/20 split) for model tuning and final evaluation.

In [8]:
dataset = Dataset(dataframe, model_wrapper.target_name, 
                  model_wrapper.feature_names, 
                  model_wrapper.cat_feature_names)
train_dataset, test_dataset = dataset.split(0.8, RAND_SEED)

#### 5. Hyperparameter Tuning via Bayesian Optimization

We define a search space for key CatBoost hyperparameters (including the number of iterations, learning rate, tree depth, L2 regularization, and border count).  
Using BayesSearchCV from skopt, we perform a Bayesian hyperparameter search over this space with a cross-validation scheme. Callback functions (such as DeltaYStopper and DeadlineStopper) monitor progress and limit runtime. The best hyperparameters are extracted from the search results.

In [9]:
cv_splits_count, max_trials, n_hparams_at_trial = 5, 10, 3
search_spaces = {
                "iterations": [500, 750, 1000],               # Number of boosting iterations.
                "learning_rate": [0.01, 0.05, 0.1, 0.3],      # Step size shrinkage used in updates.
                "depth": [3, 4, 5, 6, 7, 8],                  # Depth of the trees; deeper trees can capture more complex patterns.
                "l2_leaf_reg": [1e-3, 0.01, 0.1, 1, 10],      # L2 regularization on leaf weights to prevent overfitting.
                "border_count": [16, 32, 64, 128, 256],       # Number of splits for numerical features.
                }
hparams_tuner = BayesSearchCV(estimator=cb.CatBoostRegressor(verbose=0, allow_writing_files=False, random_state=RAND_SEED),                                    
                    search_spaces=search_spaces,                      
                    scoring='neg_mean_squared_error',                                  
                    cv=cv_splits_count,                               # number of splits for cross-validation            
                    n_iter=max_trials,                                # max number of trials
                    n_points=n_hparams_at_trial,                      # number of hyperparameter sets evaluated at the same time
                    iid=False,                                        # if not iid it optimizes on the cv score
                    return_train_score=False,                         
                    refit=False,                                      
                    optimizer_kwargs={'base_estimator': 'GP'},        # optmizer parameters: we use Gaussian Process (GP)
                    n_jobs=-1,                                      
                    random_state=RAND_SEED) 


In [10]:
counter = 1
def onstep(res):
    global counter
    x0 = res.x_iters   # List of input points
    y0 = res.func_vals # Evaluation of input points
    print(f'Last eval #{counter}: {x0[-1]}', 
          f' - Score {y0[-1]:.3f}')
    print(f' - Best Score {res.fun:.3f}',
          f' - Best Args: {res.x}')
    counter += 1

overdone_control = DeltaYStopper(delta=0.0001)               # We stop if the gain of the optimization becomes too small
time_limit_control = DeadlineStopper(total_time=60 * 45)     # We impose a time limit (45 minutes)

callbacks=[overdone_control, time_limit_control, onstep]

In [11]:
X = train_dataset.feature_data()
y = train_dataset.target_data
cat_features = train_dataset.cat_feature_idxs
hparams_tuner.fit(X, y, cat_features=cat_features, callback=callbacks)

hparams_evals_count = len(hparams_tuner.cv_results_['params'])
best_score = hparams_tuner.best_score_
best_score_std = pd.DataFrame(hparams_tuner.cv_results_).iloc[hparams_tuner.best_index_].std_test_score
best_params = hparams_tuner.best_params_
print(f"candidates checked: {hparams_evals_count}, best CV score: {best_score:.3f}, best_score_std:{best_score_std:.3f}")
print(f"best_params: {best_params}")

Last eval #1: [16, 6, 500, 1, 0.3]  - Score 28644.007
 - Best Score 21719.784  - Best Args: [64, 7, 750, 1, 0.3]
Last eval #2: [64, 3, 500, 10, 0.1]  - Score 22801.408
 - Best Score 21719.784  - Best Args: [64, 7, 750, 1, 0.3]
Last eval #3: [32, 5, 500, 0.001, 0.1]  - Score 25357.934
 - Best Score 21719.784  - Best Args: [64, 7, 750, 1, 0.3]
Last eval #4: [16, 7, 750, 0.1, 0.3]  - Score 30418.996
 - Best Score 21719.784  - Best Args: [64, 7, 750, 1, 0.3]
candidates checked: 10, best CV score: -21719.784, best_score_std:5909.500
best_params: OrderedDict([('border_count', 64), ('depth', 7), ('iterations', 750), ('l2_leaf_reg', 1), ('learning_rate', 0.3)])


#### 6. Training and Wrapping the Final CatBoost Model

With the best hyperparameters identified, we create a CatBoostRegressor using these settings and train it on the full training data. The trained model is then assigned to the model wrapper. We evaluate its performance on the test dataset by computing a performance metric (e.g. mean squared error), ensuring the model’s predictive power is acceptable.

In [12]:
cb_model = cb.CatBoostRegressor(**best_params, allow_writing_files=False, random_state=RAND_SEED)
cb_model.fit(X, y, cat_features=cat_features)
model_wrapper.wrap_model(cb_model)
metric_name, metric_score = model_wrapper.calculate_predictive_performance(test_dataset)
print(f"Test Performance Metric : {metric_name}={metric_score}")

0:	learn: 310.4541290	total: 129ms	remaining: 1m 36s
1:	learn: 269.2804704	total: 134ms	remaining: 50s
2:	learn: 239.2395798	total: 139ms	remaining: 34.6s
3:	learn: 220.5397192	total: 142ms	remaining: 26.6s
4:	learn: 205.3046589	total: 146ms	remaining: 21.7s
5:	learn: 195.3995721	total: 149ms	remaining: 18.5s
6:	learn: 186.9506973	total: 152ms	remaining: 16.2s
7:	learn: 181.7736535	total: 156ms	remaining: 14.4s
8:	learn: 174.6403285	total: 159ms	remaining: 13.1s
9:	learn: 168.3094718	total: 162ms	remaining: 12s
10:	learn: 165.0062718	total: 164ms	remaining: 11s
11:	learn: 162.6705773	total: 167ms	remaining: 10.3s
12:	learn: 158.1110517	total: 171ms	remaining: 9.68s
13:	learn: 155.7887147	total: 174ms	remaining: 9.17s
14:	learn: 153.3477049	total: 177ms	remaining: 8.68s
15:	learn: 151.8274951	total: 180ms	remaining: 8.27s
16:	learn: 149.9230527	total: 183ms	remaining: 7.9s
17:	learn: 147.6962719	total: 187ms	remaining: 7.61s
18:	learn: 146.1056777	total: 190ms	remaining: 7.32s
19:	learn

#### 7. Loading or Generating the Domain Configuration

VerifIA allows you to create a domain configuration in two way. With the domain configuration available (either generated or loaded), we instantiate the `RuleConsistencyVerifier`. This verifier uses the domain rules and constraints to evaluate whether the model’s predictions on the test data are consistent with our domain knowledge.

##### **Option A: Predefined Domain File:**  
A pre-defined YAML file (e.g., "house_price.yaml") can be loaded directly to provide the domain constraints and rules.

In [13]:
DOMAIN_PATH = f"../domains/house_price.yaml"
model_verifier = RuleConsistencyVerifier(DOMAIN_PATH)

##### **Option B: AI-Powered Domain Generation:**  
VerifIA provides an AI-powered domain generation flow through the `DomainGenFlow` module. By supplying the training data, a directory of domain knowledge documents (in PDF format), and the model card details, a rich domain configuration is generated. This configuration contains variable definitions, constraints, and rules expected to hold true for the house pricing problem.

**Setup OpenAI and LangSmith Keys**

Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above, make sure to set your environment variables to start logging traces.

Accessing the OpenAI API requires an API key, which you can get by creating an account. Once you have a key you'll want to set it as an environment variable by running:

In [None]:
os.environ["LANGCHAIN_TRACING_V2"] = 'true'
os.environ["LANGCHAIN_ENDPOINT"] = 'https://api.smith.langchain.com'
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass(prompt='Your LANGCHAIN_API_KEY? ')
os.environ["OPENAI_API_KEY"] = getpass.getpass(prompt='Your OPENAI_API_KEY? ')
os.environ["USER_AGENT"] = 'my_agent'
os.environ["LANGCHAIN_PROJECT"] = 'VERIFIA_TEST'
os.environ["VERIFIA_GPT_NAME"] = 'gpt-4.1'

In [None]:
from verifia.generation import DomainGenFlow

DOMAIN_PDF_DIRPATH = "../documents/house_price"
domain_genflow = DomainGenFlow()
domain_genflow.load_ctx(dataframe=train_dataset.data, 
                        pdfs_dirpath=DOMAIN_PDF_DIRPATH,
                        model_card=model_wrapper.model_card.to_dict())
domain_cfg_dict = domain_genflow.run()
model_verifier = RuleConsistencyVerifier(domain_cfg_dict=domain_cfg_dict)

INFO:chromadb.telemetry.product.posthog:Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:verifia.generation.agents:Invoking agent for constraints_analysis
INFO:verifia.generation.agents:Invoking agent for rules_analysis
INFO:verifia.generation.agents:Invoking agent for variables_analysis
INFO:verifia.generation.agents:Invoking agent for constraints_retriever
INFO:verifia.generation.agents:Invoking agent for rules_retriever
INFO:verifia.generation.agents:Invoking agent for variables_retriever
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HT

#### 8. Running the Rule Consistency Verifier

The verifier is then connected to our wrapped CatBoost model and the test data. We run the verification using a Random Sampler ("RS") as the search algorithm with specified parameters (population size, number of iterations, and original seed size). This process explores the input space to uncover any instances where the model’s predictions deviate from the defined rules.

In [15]:
result:RulesViolationResult = model_verifier.verify(model_wrapper)\
                                            .on(test_dataset.data)\
                                            .using("RS")\
                                            .run(pop_size=4, max_iters=5, orig_seed_size=10)

INFO:verifia.verification.searchers:Searcher built with algorithm: RS
INFO:verifia.verification.verifiers:Rows removed because they are out of domain: 2 out of 10
INFO:verifia.verification.verifiers:In-Domain rows removed due to error > 13.900: 6 out of 8
Processing Rules:   0%|          | 0/10 [00:00<?, ?it/s]INFO:verifia.verification.verifiers:Detected Rule R1 Violations: []
INFO:verifia.verification.verifiers:Detected Rule R1 Violations: []
INFO:verifia.verification.verifiers:Detected Rule R1 Violations: []
INFO:verifia.verification.verifiers:Detected Rule R1 Violations: []
INFO:verifia.verification.verifiers:Detected Rule R1 Violations: []
INFO:verifia.verification.verifiers:Detected Rule R1 Violations: []
INFO:verifia.verification.verifiers:Detected Rule R1 Violations: []
INFO:verifia.verification.verifiers:Detected Rule R1 Violations: []
INFO:verifia.verification.verifiers:Detected Rule R1 Violations: []
INFO:verifia.verification.verifiers:Detected Rule R1 Violations: []
Processi

#### 9. Saving the Verification Report and Model Artifacts

Finally, the verification results are saved as an HTML report, which provides a detailed summary of rule compliance and any detected inconsistencies. Additionally, we save the trained CatBoost model and its model card configuration for future reference and reproducibility.

In [16]:
result.save_as_html("../reports/house_price.html")

INFO:verifia.verification.results:HTML report saved to ..\reports\house_price.html


In [17]:
model_wrapper.save_model()
model_wrapper.save_model_card("../models/house_price.yaml")

INFO:verifia.models.base:Default model file path constructed: ../models\house_price-2.cb
INFO:verifia.models.cb:Model saved successfully to ..\models\house_price-2.cb
INFO:verifia.models.base:Model card saved to ../models/house_price.yaml
