<p align="center">
  <img src="https://www.verifia.ca/assets/logo.png" width="160px" alt="VerifIA Logo"/><br>
  <strong>© 2025 VerifIA. All rights reserved.</strong>
</p>

### VerifIA - Model Verification: Compressive Strength Regression with TensorFlow
In this notebook we tackle a regression problem using a compressive strength dataset. We build and tune a TensorFlow model with hyperparameter optimization (using scikit-opt), and then we generate a domain configuration (using both predefined domain and AI-powered generation) to verify that our trained model complies with expected domain rules.

In [17]:
!pip install requests
!pip install scikit-optimize
!pip install "../../dist/verifia-0.1.0-py3-none-any.whl[tensorflow,genflow]"

Processing c:\users\houssem\downloads\verifia\dist\verifia-0.1.0-py3-none-any.whl (from verifia==0.1.0)
Installing collected packages: verifia
  Attempting uninstall: verifia
    Found existing installation: verifia 0.1.0
    Uninstalling verifia-0.1.0:
      Successfully uninstalled verifia-0.1.0
Successfully installed verifia-0.1.0


#### 0. Download Resources

Before running any other cells, make sure you have all required resources:

In [18]:
!curl -sL https://tinyurl.com/r6m2zk87 -o downloader.py
!curl -sL https://tinyurl.com/4p9yxa5u -o tf_helpers.py

In [19]:
from downloader import download_resource
url = 'https://www.verifia.ca/assets/use-cases/'
download_resource(url+'data/concrete_compressive_strength.csv', 
                  dest_dir='../data')
download_resource(url+'domains/concrete_compressive_strength.yaml', 
                  dest_dir='../domains')
download_resource(url+'documents/concrete_compressive_strength.zip', 
                  dest_dir='../documents/concrete_compressive_strength')

{'extracted': True,
 'files': ['articles/',
  'articles/Compressive Strength of Concrete and calculation methods.pdf',
  'articles/Concrete characteristics according to Eurocode 2.pdf',
  'articles/Different Grades of Concrete.pdf',
  'articles/Everything You Need to Know.pdf',
  'articles/Table of concrete design properties.pdf',
  'articles/Understanding MPA in Concrete.pdf',
  'data_report.pdf',
  'domain_definition_meeting_notes.pdf',
  'domain_definition_report.pdf',
  'feature_selection_report.pdf',
  'sensitivity_analysis_meeting_notes.pdf',
  'sensitivity_analysis_report.pdf']}

#### 1. Importing Libraries and Setting Up

First, we import necessary libraries for data processing, hyperparameter tuning (skopt), TensorFlow model creation, and VerifIA functionalities such as model wrapping and rule verification.

In [None]:
%load_ext autoreload
%autoreload 2
import os
import getpass
import gc
import pandas as pd
import numpy as np
import tensorflow as tf 
from skopt.callbacks import DeadlineStopper, DeltaYStopper
from skopt.space import Categorical, Integer
from skopt.utils import use_named_args
from skopt import gp_minimize
from sklearn.model_selection import BaseCrossValidator, KFold
from verifia.models import TFModel, build_from_model_card
from verifia.verification.results import RulesViolationResult
from verifia.context.data import Dataset
from verifia.verification.verifiers import RuleConsistencyVerifier
from tf_helpers import df_to_dataset, preprocessing_layer, create_model_to_tune

#### 2. Defining Constants and Loading the Data

We define constants (e.g., random seed, directory paths) and load the compressive strength dataset from a CSV file. The dataset contains measurements along with the target variable *concrete_compressive_strength*. Feature names are inferred from the dataset (excluding the target).

In [22]:
RAND_SEED = 0
MODELS_DIRPATH = "../models"
DATA_PATH = "../data/concrete_compressive_strength.csv"
dataframe = pd.read_csv(DATA_PATH)
target_name = "concrete_compressive_strength"
feature_names = set(dataframe.columns) - {target_name}

#### 3. Building the TensorFlow Model Wrapper

Using VerifIA’s `build_from_model_card`, we create a `TFModel` wrapper. This wrapper stores essential metadata (model name, version, type, feature names, target name, and storage directory) that VerifIA requires for later verification tasks. Although the model itself is yet to be built or tuned, this wrapper acts as a container for the final model.

In [23]:
model_wrapper:TFModel = build_from_model_card({
    "name": "concrete_compressive_strength",
    "version": "2",
    "type": "regression",
    "description": "model predicts the compressive strength of concretes",
    "framework": "tensorflow",
    "feature_names": feature_names,
    "target_name": target_name,
    "local_dirpath": MODELS_DIRPATH
})

#### 4. Preparing the Dataset

We convert the loaded DataFrame into a VerifIA `Dataset` object. This object facilitates operations such as splitting the data into training and testing subsets. The training set will be used for hyperparameter tuning and model training, while the test set is reserved for final evaluation and rule verification.

In [24]:
dataset = Dataset(dataframe, model_wrapper.target_name, 
                  model_wrapper.feature_names, 
                  model_wrapper.cat_feature_names)
train_dataset, test_dataset = dataset.split(0.8, RAND_SEED)

#### 5. Hyperparameter Tuning via Cross-Validation

A custom objective function is defined to tune our TensorFlow model. This function:
- Uses a KFold cross-validation scheme on the training data.
- Constructs training and validation datasets from the DataFrame.
- Builds the model (with parameters passed from the hyperparameter space) using helper functions for preprocessing and model creation.
- Trains the model with early stopping and collects the validation score.
  
The objective function is then optimized using `gp_minimize` from skopt over a defined search space (covering the number of layers, neurons per layer, activation function, dropout, learning rate, batch size, etc.). Callback functions (for progress updates, early termination, and time limits) are also set up.

In [25]:
def make_objective(train_dataset:Dataset, cv:BaseCrossValidator, space, scoring):
    
    @use_named_args(space) 
    def objective(**params):
        curr_bs = int(params['batch_size'])
        num_feature_names = train_dataset.num_feature_names
        cat_feature_names = train_dataset.cat_feature_names
        X = train_dataset.feature_data()
        y = train_dataset.target_data
        validation_scores = list()
        for train_index, val_index in cv.split(X):
            train_ds = df_to_dataset(X.iloc[train_index,:], y.iloc[train_index], batch_size=curr_bs)
            val_ds = df_to_dataset(X.iloc[val_index,:], y.iloc[val_index], shuffle=False, batch_size=curr_bs)
            all_inputs, encoded_features = preprocessing_layer(num_feature_names, cat_feature_names, train_ds)
            tf_model = create_model_to_tune(all_inputs, encoded_features, scoring,
                                            train_dataset.target_data.mean(), 
                                            train_dataset.target_data.std(), 
                                            **params)

            early_stopping = tf.keras.callbacks.EarlyStopping(monitor=f'val_{scoring}', 
                                                                mode='min', 
                                                                patience=15, 
                                                                verbose=0)
                    
            run = tf_model.fit(train_ds, validation_data=val_ds, verbose=0,
                                callbacks=[early_stopping], 
                                epochs=1000) 
                    
            validation_scores.append(np.min(run.history[f"val_{scoring}"]))

        return np.mean(validation_scores)

    return objective

In [26]:
counter = 1
def onstep(res):
    global counter
    x0 = res.x_iters   # List of input points
    y0 = res.func_vals # Evaluation of input points
    print(f'Last eval #{counter}: {x0[-1]}', 
          f' - Score {y0[-1]:.3f}')
    print(f' - Best Score {res.fun:.3f}',
          f' - Best Args: {res.x}')
    counter += 1

overdone_control = DeltaYStopper(delta=0.0001)               # We stop if the gain of the optimization becomes too small
time_limit_control = DeadlineStopper(total_time=60 * 45)     # We impose a time limit (45 minutes)

callbacks=[overdone_control, time_limit_control, onstep]

In [27]:
cv_splits_count, max_trials, n_hparams_at_trial = 5, 10, 3
kf = KFold(n_splits=cv_splits_count, shuffle=True, random_state=RAND_SEED)

search_spaces = [
    Integer(2, 5, name='n_layers'),
    Categorical([128, 256, 512], name='layer_1'),
    Categorical([64, 128, 256], name='layer_2'),
    Categorical([32, 64, 128], name='layer_3'),
    Categorical([16, 32, 64], name='layer_4'),
    Categorical([8, 16, 32], name='layer_5'),
    Categorical(["leaky_relu", "relu"], name='activation'),
    Categorical([0.0, 0.15, 0.1, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5], name='dropout'),
    Categorical([True, False], name='batch_normalization'),
    Categorical([0.03, 0.01, 0.003, 0.001, 3e-4], name='learning_rate'),
    Categorical([64, 128, 196, 256, 452, 512], name='batch_size')
]
objective = make_objective(train_dataset,
                           cv=kf,
                           space=search_spaces,                           
                           scoring="mean_absolute_error")

gp_round = gp_minimize(func=objective,
                       dimensions=search_spaces,
                       n_calls=max_trials,
                       n_points=n_hparams_at_trial,
                       callback=callbacks,
                       random_state=RAND_SEED)

best_params = {sp.name:x for sp, x in zip(gp_round.space, gp_round.x)}
print(f"Best score: {gp_round.fun:.3f}, Best hyperparameters: {best_params}")
gc.collect()


Last eval #1: [4, 512, 256, 128, 32, 16, 'leaky_relu', 0.0, True, 0.003, 452]  - Score 10.171
 - Best Score 10.171  - Best Args: [4, 512, 256, 128, 32, 16, 'leaky_relu', 0.0, True, 0.003, 452]
Last eval #2: [3, 256, 256, 64, 32, 16, 'relu', 0.15, False, 0.003, 452]  - Score 3.717
 - Best Score 3.717  - Best Args: [3, 256, 256, 64, 32, 16, 'relu', 0.15, False, 0.003, 452]
Last eval #3: [4, 512, 256, 64, 32, 32, 'leaky_relu', 0.25, True, 0.001, 128]  - Score 5.462
 - Best Score 3.717  - Best Args: [3, 256, 256, 64, 32, 16, 'relu', 0.15, False, 0.003, 452]
Last eval #4: [2, 128, 64, 32, 32, 32, 'leaky_relu', 0.35, False, 0.03, 512]  - Score 4.520
 - Best Score 3.717  - Best Args: [3, 256, 256, 64, 32, 16, 'relu', 0.15, False, 0.003, 452]
Last eval #5: [4, 128, 128, 128, 32, 8, 'leaky_relu', 0.35, False, 0.001, 256]  - Score 5.154
 - Best Score 3.717  - Best Args: [3, 256, 256, 64, 32, 16, 'relu', 0.15, False, 0.003, 452]
Last eval #6: [5, 256, 128, 64, 32, 16, 'relu', 0.2, False, 0.0003,

29349

#### 6. Selecting the Best Hyperparameters and Determining Training Epochs

After optimization, the best hyperparameters are extracted. Then, using cross-validation, we determine an optimal number of training epochs (by computing the median of the epoch counts where the validation loss was minimized across folds). This ensures that the final model is trained for an appropriate duration without overfitting.

In [28]:
X = train_dataset.feature_data()
y = train_dataset.target_data
num_feature_names = train_dataset.num_feature_names
cat_feature_names = train_dataset.cat_feature_names
scoring = "mean_absolute_error"
best_batch_size = int(best_params['batch_size'])
train_ds = df_to_dataset(X, y, batch_size=best_batch_size)
all_inputs, encoded_features = preprocessing_layer(num_feature_names, cat_feature_names, train_ds)
list_epochs = list()
for train_index, val_index in kf.split(X):
    train_ds = df_to_dataset(X.iloc[train_index,:], y.iloc[train_index], batch_size=best_batch_size)
    val_ds = df_to_dataset(X.iloc[val_index,:], y.iloc[val_index], shuffle=False, batch_size=best_batch_size)
    tf_model = create_model_to_tune(all_inputs, encoded_features, scoring,
                                train_dataset.target_data.mean(), 
                                train_dataset.target_data.std(), 
                                **best_params)
    early_stopping = tf.keras.callbacks.EarlyStopping(monitor=f'val_{scoring}', 
                                                        mode='min', 
                                                        patience=15, 
                                                        verbose=0)
    run = tf_model.fit(train_ds, validation_data=val_ds, verbose=0,
                callbacks=[early_stopping], 
                epochs=1000) 
    list_epochs.append(np.argmin(run.history[f'val_{scoring}']) + 1)
best_epochs = int(np.median(list_epochs))
gc.collect()

15628

#### 7. Final Model Training and Evaluation

With the best hyperparameters and the optimal number of epochs determined, we train the final TensorFlow model on the training data. The trained model is assigned back to the model wrapper. We then evaluate the model’s performance on the test dataset using metrics (for example, mean absolute error), which provides a quantitative measure of its predictive power.

In [29]:
tf_model = create_model_to_tune(all_inputs, encoded_features, scoring,
                                train_dataset.target_data.mean(), 
                                train_dataset.target_data.std(), 
                                **best_params)
tf_model.fit(train_ds, validation_data=val_ds, verbose=0,
                callbacks=[early_stopping], 
                epochs=best_epochs) 
model_wrapper.model = tf_model
metric_name, metric_score = model_wrapper.calculate_predictive_performance(test_dataset)
print(f"Test Performance Metric : {metric_name}={metric_score}")

Test Performance Metric : RMSE=5.563825974279319



#### 8. Generating the Domain Configuration

VerifIA provides two options for generating the domain configuration for your use case:


##### **Option A – Using a Predefined Domain YAML File**  
If you already have a domain configuration YAML file prepared for your use case, simply create a `RuleConsistencyVerifier` object using that file. 

In [31]:
DOMAIN_PATH = f"../domains/concrete_compressive_strength.yaml"
model_verifier = RuleConsistencyVerifier(DOMAIN_PATH)

##### **Option B – Generating the Domain Configuration Using AI**  
Alternatively, you can automatically generate the domain configuration by leveraging VerifIA’s AI-powered `DomainGenFlow`. In this option, you provide your training data and a directory containing PDF documents with external domain knowledge. The tool then generates a rich domain dictionary that captures expected variable ranges, constraints, and rule relationships. 

**Setup OpenAI and LangSmith Keys**

Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above, make sure to set your environment variables to start logging traces.

Accessing the OpenAI API requires an API key, which you can get by creating an account. Once you have a key you'll want to set it as an environment variable by running:

In [None]:
os.environ["LANGCHAIN_TRACING_V2"] = 'true'
os.environ["LANGCHAIN_ENDPOINT"] = 'https://api.smith.langchain.com'
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass(prompt='Your LANGCHAIN_API_KEY? ')
os.environ["OPENAI_API_KEY"] = getpass.getpass(prompt='Your OPENAI_API_KEY? ')
os.environ["USER_AGENT"] = 'my_agent'
os.environ["LANGCHAIN_PROJECT"] = 'VERIFIA_TEST'
os.environ["VERIFIA_GPT_NAME"] = 'gpt-4.1'

In [None]:
from verifia.generation import DomainGenFlow

DOMAIN_PDF_DIRPATH = "../documents/compressive_strength"
domain_genflow = DomainGenFlow()
domain_genflow.load_ctx(dataframe=train_dataset.data, 
                        pdfs_dirpath=DOMAIN_PDF_DIRPATH,
                        model_card=model_wrapper.model_card.to_dict())
domain_cfg_dict = domain_genflow.run(save=True, local_path="./domain.yaml")
model_verifier = RuleConsistencyVerifier(domain_cfg_dict=domain_cfg_dict)

Using the generated domain configuration (or an existing YAML file), we instantiated a `RuleConsistencyVerifier`. This component will use the domain rules and constraints to verify whether the model’s predictions conform to our domain expectations when applied to the test data.

#### 9. Running the Verification Process

The verifier is configured by connecting it with our wrapped model and the test dataset. We then run the verification using a chosen search algorithm (in this case, the Random Sampler "RS") with specified parameters (population size, number of iterations, and original seed size). The verification explores the input space to uncover any rule violations.

In [34]:
result:RulesViolationResult = model_verifier.verify(model_wrapper)\
                                            .on(test_dataset.data)\
                                            .using("RS")\
                                            .run(pop_size=4, max_iters=3, orig_seed_size=25)

Processing Original Inputs: 15it [00:04,  3.04it/s]it/s]
Processing Original Inputs: 15it [00:05,  2.88it/s]8,  4.95s/it]
Processing Original Inputs: 15it [00:05,  2.97it/s]7,  5.11s/it]
Processing Original Inputs: 15it [00:04,  3.01it/s]1,  5.09s/it]
Processing Original Inputs: 15it [00:04,  3.04it/s]5,  5.05s/it]
Processing Original Inputs: 15it [00:04,  3.15it/s]0,  5.01s/it]
Processing Original Inputs: 15it [00:04,  3.14it/s]3,  4.93s/it]
Processing Original Inputs: 15it [00:05,  2.96it/s]8,  4.88s/it]
Processing Original Inputs: 15it [00:04,  3.03it/s]4,  4.94s/it]
Processing Original Inputs: 15it [00:04,  3.07it/s]9,  4.95s/it]
Processing Original Inputs: 15it [00:05,  2.98it/s]04,  4.93s/it]
Processing Original Inputs: 15it [00:05,  2.51it/s]59,  4.96s/it]
Processing Original Inputs: 15it [00:05,  2.96it/s]58,  5.28s/it]
Processing Original Inputs: 15it [00:05,  2.99it/s]52,  5.22s/it]
Processing Original Inputs: 15it [00:04,  3.17it/s]46,  5.16s/it]
Processing Original Inputs: 

#### 10. Saving Verification Results and Model Artifacts

Finally, the verification results are saved as an HTML report for easy sharing and further analysis. Additionally, the final model and its model card are saved, allowing you to archive your model’s configuration and verification status for future reference or reproducibility.

In [None]:
result.save_as_html("../reports/concrete_compressive_strength.html")

In [None]:
model_wrapper.save_model()
model_wrapper.save_model_card("../models/concrete_compressive_strength.yaml")