# Taxonomy Evaluation Preparation

In this notebook, we are going to prepare the data for the evaluation of each iteration of the taxonomy. 

The evaluation will focus on the metrics of laconicity, lucidity, completeness, adaption, innovation, orthogonality and soundness as outlined by Kaplan et al. [1]. There is already a implementation available to calculate these metrics [here](https://github.com/Eden-06/abstraction-quality/). We will be using this implementation. For this we need to prepare 3 files which will be generated when this notebook is run.
1. A `extractions.txt` file which contains all the extractions from the literature analysis
2. A `taxonomy.txt` file which contains the taxonomy of the iteration we want to evaluate
3. A `mapping.txt` file which maps each of the extractions to the taxonomy
4. A `orthogonality.xlsx` file for each iteration which contains a matrix for each taxonomy category and class.

After the files have been generated, there will be a folder for each increment of the taxonomy. In each folder there will be individual files for each iteration. 
The next step is to use the `aquality.rb` script from the implementation mentioned above to calculate the metrics. For this, the script has to be put inside of the folder of the iteration and run.
This will generate a `results.txt` file which contains the results of the evaluation. 

Next, there will be orthogonality excel files for each iteration. They have to be manually filled out following the explanations by Kaplan et a. [1].

Moreover, this notebook is calculating the adaption and innovation scores for each iteration:
1. A `innovation_and_adaption.txt` file which contains the innovation and adaption scores. 

#### Note
The generation of these files can be easily and automatically done because we decided to document the taxonomy creation process with structured `JSON` artifacts that are easy for machine processing. The following code will generate the files. The evaluation is then done with the provided script from the implementation mentioned above in `ruby`.



[1] Kaplan, A., et al.: Introducing an evaluation method for taxonomies. In: EASE. ACM (2022), accepted, to appear

In [4]:
# First we are going to load the JSON files
import os
import json

current_dir = os.getcwd()
tax_inc_dir = os.path.join("../construction_and_refinement/taxonomy_increments.json")
cluster_inc_dir = os.path.join("../clustering/cluster_increments.json")
extraction_dir = os.path.join("../extraction/extraction.json")

with open(os.path.join(current_dir, cluster_inc_dir), "r", encoding="utf-8") as f:
    cluster_increments = json.load(f)

with open(os.path.join(current_dir, tax_inc_dir), "r", encoding="utf-8") as f:
    taxonomy_increments = json.load(f)

with open(os.path.join(current_dir, extraction_dir), "r", encoding="utf-8") as f:
    extraction = json.load(f)

# Storage directory
storage_dir = os.path.join(current_dir, "metric_calculations")

In [5]:
# Here we prepare the extractions file
extractions = []
for paper in extraction:
    title = paper.get("title")
    for p_type_dict in paper.get("types"):
        p_type = p_type_dict.get("type")
        extractions.append(f"{p_type} ({title})")

In [6]:
# Next we need to get each type of the extraction process mapped to each cluster
# This is necessary, because in the taxonomy, we can trace the construction back to each cluster
cluster_id_types = {}
for cluster in cluster_increments:
    iteration = cluster.get("iteration")
    if not iteration == 2:
        continue
    amount_of_cluster_types = 0
    for cluster_dict in cluster.get("clusters"):
        cluster_id = cluster_dict.get("id")
        for subcluster_dict in cluster_dict.get("subclusters"):
            subcluster_id = subcluster_dict.get("id")
            for type_dict in subcluster_dict.get("types"):
                if subcluster_id not in cluster_id_types:
                    cluster_id_types[subcluster_id] = []
                cluster_id_types[subcluster_id].append(type_dict)
                amount_of_cluster_types += 1
    print(f"Amount of cluster types: {amount_of_cluster_types}")

# Then we prepare a helper function that created the taxonomy and mapping dictionaries
def get_taxonomy_eval_data(iteration: int = 1):
    """
    Returns the catalog taxonomy and mapping dictionaries for a given iteration.
    """
    catalog_taxonomy = []
    mapping = []
    for iteration_dict in taxonomy_increments:
        tax_iteration = iteration_dict.get("iteration")
        if not iteration == tax_iteration:
            continue

        for dimension_dict in iteration_dict.get("taxonomy"):
            dimension_name = dimension_dict.get("name", "Unknown")
            for category_dict in dimension_dict.get("categories"):
                category_name = category_dict.get("name", "Unknown")
                for subcategory_dict in category_dict.get("categories", []):
                    subcategory_name = subcategory_dict.get("name", "Unknown")
                    catalog_taxonomy.append(
                        dimension_name + " - " + category_name + " - " + subcategory_name)
                    based_on_clusters = subcategory_dict.get("based_on_clusters", [])
                    for cluster_id in based_on_clusters:
                        if cluster_id in cluster_id_types:
                            for type_dict in cluster_id_types[cluster_id]:
                                type_name = type_dict.get("name", "Unknown")
                                type_source = type_dict.get("source", "Unknown")
                                mapping.append(
                                    dimension_name + " - " + category_name + " - " + subcategory_name + " : " + type_name + " (" + type_source + ")")

                catalog_taxonomy.append(
                    dimension_name + " - " + category_name)
                based_on_clusters = category_dict.get("based_on_clusters", [])
                for cluster_id in based_on_clusters:
                    if cluster_id in cluster_id_types:
                        for type_dict in cluster_id_types[cluster_id]:
                            type_name = type_dict.get("name", "Unknown")
                            type_source = type_dict.get("source", "Unknown")
                            mapping.append(
                                dimension_name + " - " + category_name + " : " + type_name + " (" + type_source + ")")
                            
    mapping.sort()
    return catalog_taxonomy, mapping

def compute_innovation_and_adaption(catalog_taxonomy: list, mapping: list):
    """
    Computes the innovation and adaptation metrics.
    """
    adapted_count = 0
    new_count = 0
    
    for taxonomy_item in catalog_taxonomy:
        if any(m.startswith(taxonomy_item + " :") for m in mapping):
            adapted_count += 1
        else:
            new_count += 1
    
    total = len(catalog_taxonomy)
    innovation = new_count / total if total > 0 else 0.0
    adaptation = adapted_count / total if total > 0 else 0.0

    innovation_str = f"innovation: {innovation:.2f} ({new_count}/{total})"
    adaptation_str = f"adaptation: {adaptation:.2f} ({adapted_count}/{total})"
    return innovation_str, adaptation_str


# Now we prepare a helper function to store the results in the storage directory
def save_taxonomy_eval_data(iteration: int = 1, catalog_taxonomy: list = [], mapping: list = [], extractions: list = []):
    iteration_dir = os.path.join(storage_dir, f"taxonomy_iteration_{iteration}")
    os.makedirs(iteration_dir, exist_ok=True)
    with open(os.path.join(iteration_dir, "catalog_taxonomy.txt"), "w", encoding="utf-8") as f:
        f.write("\n".join(catalog_taxonomy))

    with open(os.path.join(iteration_dir, "mapping.txt"), "w", encoding="utf-8") as f:
        f.write("\n".join(mapping))

    with open(os.path.join(iteration_dir, "extractions.txt"), "w", encoding="utf-8") as f:
        f.write("\n".join(extractions))

    innovation_score, adaptation_score = compute_innovation_and_adaption(catalog_taxonomy, mapping)

    with open(os.path.join(iteration_dir, "innovation_and_adaption.txt"), "w", encoding="utf-8") as f:
        f.write(innovation_score)
        f.write("\n")
        f.write(adaptation_score)


Amount of cluster types: 227


In [7]:
iteration = 1
catalog_taxonomy, mapping = get_taxonomy_eval_data(iteration)
print(f"Length of catalog taxonomy: {len(catalog_taxonomy)}")
display(catalog_taxonomy)
print(f"Length of mapping: {len(mapping)}")
display(mapping)
save_taxonomy_eval_data(iteration, catalog_taxonomy, mapping, extractions)

Length of catalog taxonomy: 58


['Graph Representation - Single Fact',
 'Graph Representation - Multi Fact',
 'Answer Type - Undefined',
 'Answer Type - Other',
 'Answer Type - Date',
 'Answer Type - Distance Measurement',
 'Answer Type - Actor',
 'Answer Type - Technology',
 'Answer Type - Definition',
 'Answer Type - Time',
 'Answer Type - Name',
 'Answer Type - Title',
 'Answer Type - Bibliometric Numbers',
 'Answer Type - Software System',
 'Answer Type - Monetary',
 'Answer Type - Abbreviation',
 'Answer Type - Instructional',
 'Answer Type - Procedure/Technique',
 'Answer Type - Organization',
 'Answer Type - Duration',
 'Answer Type - Boolean',
 'Answer Type - Entity',
 'Answer Type - Description',
 'Answer Type - Properties',
 'Answer Type - Human/Person',
 'Answer Type - Location',
 'Answer Type - Quantitative',
 'Answer Type - Tool/Notation',
 'Answer Type - Solution',
 'Answer Type - Theoretical Framework',
 'Question Type - Negation',
 'Question Type - Dependency',
 'Question Type - Contingency',
 'Questi

Length of mapping: 162


['Answer Credibility - Debate : Debate (A Non-Factoid Question-Answering Taxonomy)',
 'Answer Credibility - Factual : Accuracy (A Taxonomy for Classifying Questions Asked in Social Question and Answering)',
 'Answer Credibility - Factual : Evidence-Based (A Non-Factoid Question-Answering Taxonomy)',
 'Answer Credibility - Factual : Factual Questions (Divide and Conquer the EmpiRE: A Community-Maintainable Knowledge Graph of Empirical Research in Requirements Engineering)',
 'Answer Credibility - Opinion : Experience (A Non-Factoid Question-Answering Taxonomy)',
 'Answer Credibility - Opinion : Knowledge (A Taxonomy for Classifying Questions Asked in Social Question and Answering)',
 'Answer Credibility - Predictive : Predictive (Types of research questions: descriptive, predictive, or causal)',
 'Answer Type - Abbreviation : Abbreviation (Learning Question Classifiers)',
 'Answer Type - Actor : Actor (The Future of Empirical Methods in Software Engineering Research)',
 'Answer Type - B

In [8]:
iteration = 2
catalog_taxonomy, mapping = get_taxonomy_eval_data(iteration)
print(f"Length of catalog taxonomy: {len(catalog_taxonomy)}")
display(catalog_taxonomy)
print(f"Length of mapping: {len(mapping)}")
display(mapping)
save_taxonomy_eval_data(iteration, catalog_taxonomy, mapping, extractions)

Length of catalog taxonomy: 28


['Graph Representation - Single Fact',
 'Graph Representation - Multi Fact',
 'Answer Type - Named Entity',
 'Answer Type - Description',
 'Answer Type - Temporal',
 'Answer Type - Quantitative',
 'Answer Type - Boolean',
 'Answer Type - Other Type',
 'Answer Format - Simple',
 'Answer Format - Explanatory',
 'Answer Format - Enumerative',
 'Answer Format - Other Format',
 'Question Type - Negation',
 'Question Type - Relationship',
 'Question Type - Superlative',
 'Question Type - Counting',
 'Question Type - Ranking',
 'Question Type - Comparison',
 'Question Type - Multiple Intentions',
 'Question Type - Temporal',
 'Question Type - Aggregation',
 'Answer Credibility - Objective',
 'Answer Credibility - Subjective',
 'Question Goal - Reasoning',
 'Question Goal - Problem Solving',
 'Question Goal - Problematization',
 'Question Goal - Improvement',
 'Question Goal - Prediction']

Length of mapping: 157


['Answer Credibility - Objective : Accuracy (A Taxonomy for Classifying Questions Asked in Social Question and Answering)',
 'Answer Credibility - Objective : Evidence-Based (A Non-Factoid Question-Answering Taxonomy)',
 'Answer Credibility - Objective : Factual Questions (Divide and Conquer the EmpiRE: A Community-Maintainable Knowledge Graph of Empirical Research in Requirements Engineering)',
 'Answer Credibility - Subjective : Debate (A Non-Factoid Question-Answering Taxonomy)',
 'Answer Credibility - Subjective : Experience (A Non-Factoid Question-Answering Taxonomy)',
 'Answer Credibility - Subjective : Knowledge (A Taxonomy for Classifying Questions Asked in Social Question and Answering)',
 'Answer Format - Enumerative : First Order - Properties (The Classification of Research Questions)',
 'Answer Format - Enumerative : Instruction (A Non-Factoid Question-Answering Taxonomy)',
 'Answer Format - Enumerative : List (Ripple Down Rules for question answering)',
 'Answer Format - E

In [9]:
iteration = 3
catalog_taxonomy, mapping = get_taxonomy_eval_data(iteration)
print(f"Length of catalog taxonomy: {len(catalog_taxonomy)}")
display(catalog_taxonomy)
print(f"Length of mapping: {len(mapping)}")
display(mapping)
save_taxonomy_eval_data(iteration, catalog_taxonomy, mapping, extractions)

Length of catalog taxonomy: 37


['Graph Representation - Single Fact',
 'Graph Representation - Multi Fact',
 'Answer Type - Named Entity',
 'Answer Type - Description',
 'Answer Type - Temporal',
 'Answer Type - Quantitative',
 'Answer Type - Boolean',
 'Answer Type - Other Type',
 'Condition Type - Named Entity',
 'Condition Type - Description',
 'Condition Type - Temporal',
 'Condition Type - Quantitative',
 'Condition Type - Other Type',
 'Answer Format - Simple',
 'Answer Format - Explanatory',
 'Answer Format - Enumerative',
 'Answer Format - Other Format',
 'Retrieval Operation - Basic',
 'Retrieval Operation - Negation',
 'Retrieval Operation - Relationship',
 'Retrieval Operation - Superlative',
 'Retrieval Operation - Counting',
 'Retrieval Operation - Ranking',
 'Retrieval Operation - Comparison',
 'Retrieval Operation - Aggregation',
 'Intention Count - Single Intention',
 'Intention Count - Multiple Intentions',
 'Answer Credibility - Normative',
 'Answer Credibility - Objective',
 'Answer Credibility - 

Length of mapping: 231


['Answer Credibility - Objective : Accuracy (A Taxonomy for Classifying Questions Asked in Social Question and Answering)',
 'Answer Credibility - Objective : Evidence-Based (A Non-Factoid Question-Answering Taxonomy)',
 'Answer Credibility - Objective : Factual Questions (Divide and Conquer the EmpiRE: A Community-Maintainable Knowledge Graph of Empirical Research in Requirements Engineering)',
 'Answer Credibility - Subjective : Debate (A Non-Factoid Question-Answering Taxonomy)',
 'Answer Credibility - Subjective : Experience (A Non-Factoid Question-Answering Taxonomy)',
 'Answer Credibility - Subjective : Knowledge (A Taxonomy for Classifying Questions Asked in Social Question and Answering)',
 'Answer Format - Enumerative : First Order - Properties (The Classification of Research Questions)',
 'Answer Format - Enumerative : Instruction (A Non-Factoid Question-Answering Taxonomy)',
 'Answer Format - Enumerative : List (Ripple Down Rules for question answering)',
 'Answer Format - E

In [None]:
# The following function has been generated with the help of ChatGPT
# and is used to create the orthogonality matrix Excel file for each 
# taxonomy iteration. 

import os
import pandas as pd
import xlsxwriter
from xlsxwriter.utility import xl_rowcol_to_cell, xl_range

# --- Helper Functions to Extract Categories from Taxonomy ---

def get_subcategories(category, parent_prefix=""):
    """
    Recursively extract subcategory names. The full name is built by concatenating
    the parent name with the subcategory name.
    """
    subcats = []
    for subcat in category.get("categories", []):
        # Build a full name that includes hierarchy: parent > child
        subcat_full_name = f"{parent_prefix} > {subcat.get('name', '')}"
        subcats.append(subcat_full_name)
        subcats.extend(get_subcategories(subcat, parent_prefix=subcat_full_name))
    return subcats

def extract_categories_from_taxonomy(tax_it):
    """
    For one taxonomy iteration (a dictionary from taxonomy_increments),
    extract all category names. We prefix each category with its dimension name.
    """
    categories = []
    for dim in tax_it.get("taxonomy", []):
        dim_name = dim.get("name", "")
        for cat in dim.get("categories", []):
            cat_full_name = f"{dim_name}: {cat.get('name', '')}"
            categories.append(cat_full_name)
            categories.extend(get_subcategories(cat, parent_prefix=cat_full_name))
    return categories

def build_hierarchy_levels(cat_str):
    """
    Convert a category string (e.g. 'Dim1: Cat1 > Sub1 > Sub2')
    into a list of hierarchy levels.
    """
    parts = cat_str.split(": ")
    if len(parts) == 1:
        levels = [parts[0]]
    else:
        dimension = parts[0]
        rest = parts[1]
        subparts = rest.split(" > ")
        levels = [dimension] + subparts
    return levels

def count_leaf_categories_in_taxonomy(tax_it):
    """
    Count the number of leaf categories (categories with no subcategories)
    in a given taxonomy iteration.
    """
    def count_leaf(cat):
        # If a category has no subcategories, count it as a leaf.
        if not cat.get("categories"):
            return 1
        # Otherwise, count the leaves in its subcategories.
        total = 0
        for sub in cat.get("categories", []):
            total += count_leaf(sub)
        return total

    leaf_count = 0
    for dim in tax_it.get("taxonomy", []):
        for cat in dim.get("categories", []):
            leaf_count += count_leaf(cat)
    return leaf_count

# --- Function to Generate Formatted Orthogonality Matrix Excel File with Summary ---

def generate_orthogonality_excel(tax_it):
    """
    For one taxonomy iteration (tax_it), create an Excel file with a hierarchical
    header (for both rows and columns), an orthogonality matrix with:
      - Off-diagonals filled with 0,
      - Diagonals left empty,
      - Conditional formatting that colors cells green if value==0,
        red if value==1, and grey if blank.
    Also adds metric calculations to the right of the matrix and a summary of
    key metrics in the bottom left of the sheet.
    """
    iteration = tax_it.get("iteration")
    if iteration is None:
        print("No iteration number found, skipping.")
        return

    # Extract category names (each as a string with hierarchy information)
    cat_names = extract_categories_from_taxonomy(tax_it)
    if not cat_names:
        print(f"No categories found for taxonomy iteration {iteration}. Skipping file creation.")
        return

    # Build hierarchical levels for each category.
    cat_levels = [build_hierarchy_levels(name) for name in cat_names]
    # Determine the maximum depth (number of header rows/columns needed)
    header_depth = max(len(levels) for levels in cat_levels)
    # Pad all lists so that each has the same length.
    cat_levels = [levels + [""]*(header_depth - len(levels)) for levels in cat_levels]

    num_cats = len(cat_names)

    # Define starting row and column for the matrix area.
    start_row = header_depth  # where matrix rows begin
    start_col = header_depth  # where matrix columns begin

    # Create a new Excel workbook and worksheet.
    output_filename = f"orthogonality_iteration{iteration}.xlsx"
    workbook = xlsxwriter.Workbook(output_filename)
    worksheet = workbook.add_worksheet("Orthogonality Matrix")

    # Define formats.
    header_format = workbook.add_format({
        'bold': True, 'align': 'center', 'valign': 'vcenter', 'border': 1
    })
    cell_format = workbook.add_format({
        'align': 'center', 'valign': 'vcenter', 'border': 1
    })
    metric_format = workbook.add_format({
        'bold': True, 'bg_color': '#D3D3D3', 'align': 'center',
        'valign': 'vcenter', 'border': 1
    })
    summary_label_format = workbook.add_format({
        'bold': True, 'align': 'left'
    })
    summary_value_format = workbook.add_format({
        'align': 'right'
    })
    summary_percent_format = workbook.add_format({
        'num_format': '0.00%', 'align': 'right'
    })
    # Formats for conditional formatting (colors can be adjusted)
    green_format = workbook.add_format({
        'bg_color': '#C6EFCE', 'border': 1, 'align': 'center', 'valign': 'vcenter'
    })
    red_format = workbook.add_format({
        'bg_color': '#FFC7CE', 'border': 1, 'align': 'center', 'valign': 'vcenter'
    })
    grey_format = workbook.add_format({
        'bg_color': '#D9D9D9', 'border': 1, 'align': 'center', 'valign': 'vcenter'
    })

    # --- Write Column Headers (Hierarchical) ---
    for level in range(header_depth):
        j = 0
        while j < num_cats:
            this_value = cat_levels[j][level]
            merge_start = j
            while j < num_cats and cat_levels[j][level] == this_value:
                j += 1
            merge_end = j - 1
            first_cell_row = level
            first_cell_col = start_col + merge_start
            last_cell_row = level
            last_cell_col = start_col + merge_end
            if merge_start == merge_end:
                worksheet.write(first_cell_row, first_cell_col, this_value, header_format)
            else:
                worksheet.merge_range(first_cell_row, first_cell_col,
                                      last_cell_row, last_cell_col,
                                      this_value, header_format)
    # Label for the metric column.
    metric_label_col = start_col + num_cats
    worksheet.merge_range(0, metric_label_col, header_depth-1, metric_label_col,
                          "Row Metric", header_format)

    # --- Write Row Headers (Hierarchical) ---
    for level in range(header_depth):
        i = 0
        while i < num_cats:
            this_value = cat_levels[i][level]
            merge_start = i
            while i < num_cats and cat_levels[i][level] == this_value:
                i += 1
            merge_end = i - 1
            first_cell_row = start_row + merge_start
            first_cell_col = level
            last_cell_row = start_row + merge_end
            last_cell_col = level
            if merge_start == merge_end:
                worksheet.write(first_cell_row, first_cell_col, this_value, header_format)
            else:
                worksheet.merge_range(first_cell_row, first_cell_col,
                                      last_cell_row, last_cell_col,
                                      this_value, header_format)
    # Label for the metric row (bottom left)
    metric_label_row = start_row + num_cats
    worksheet.merge_range(metric_label_row, 0, metric_label_row, header_depth-1,
                          "Column Metric", header_format)

    # --- Fill the Matrix Cells ---
    for i in range(num_cats):
        for j in range(num_cats):
            cell_row = start_row + i
            cell_col = start_col + j
            if i == j:
                # Diagonal cell remains empty.
                worksheet.write(cell_row, cell_col, "", cell_format)
            else:
                worksheet.write(cell_row, cell_col, 0, cell_format)

    # --- Add Conditional Formatting to Matrix Cells ---
    matrix_range = xl_range(start_row, start_col, start_row + num_cats - 1, start_col + num_cats - 1)
    # Green for cells with 0.
    worksheet.conditional_format(matrix_range, {
        'type': 'cell',
        'criteria': '==',
        'value': 0,
        'format': green_format
    })
    # Red for cells with 1.
    worksheet.conditional_format(matrix_range, {
        'type': 'cell',
        'criteria': '==',
        'value': 1,
        'format': red_format
    })
    # Grey for blank cells.
    worksheet.conditional_format(matrix_range, {
        'type': 'blanks',
        'format': grey_format
    })

    # --- Add Metrics Calculations Next to the Matrix ---
    # Row Metric (sum each row).
    for i in range(num_cats):
        cell_row = start_row + i
        start_cell = xl_rowcol_to_cell(cell_row, start_col)
        end_cell = xl_rowcol_to_cell(cell_row, start_col + num_cats - 1)
        formula = f"=SUM({start_cell}:{end_cell})"
        worksheet.write_formula(cell_row, start_col + num_cats, formula, metric_format)

    # Column Metric (sum each column).
    for j in range(num_cats):
        cell_col = start_col + j
        start_cell = xl_rowcol_to_cell(start_row, cell_col)
        end_cell = xl_rowcol_to_cell(start_row + num_cats - 1, cell_col)
        formula = f"=SUM({start_cell}:{end_cell})"
        worksheet.write_formula(start_row + num_cats, cell_col, formula, metric_format)
    # Grand total in the bottom-right corner.
    total_range_start = xl_rowcol_to_cell(start_row + num_cats, start_col)
    total_range_end = xl_rowcol_to_cell(start_row + num_cats, start_col + num_cats - 1)
    worksheet.write_formula(start_row + num_cats, start_col + num_cats,
                              f"=SUM({total_range_start}:{total_range_end})", metric_format)

    # --- Add Summary of Metrics at the Bottom Left ---
    # Compute summary values based on the taxonomy and matrix location.
    leaf_count = count_leaf_categories_in_taxonomy(tax_it)
    total_fields = leaf_count * leaf_count          # Matrix is leaf_count x leaf_count.
    relevant_fields = total_fields - leaf_count       # Excluding diagonal (disabled) cells.
    disabled_fields = leaf_count

    # We'll place the summary a few rows below the matrix.
    summary_start_row = start_row + num_cats + 3
    summary_col_label = 0  # Labels in column A.
    summary_col_value = 1  # Values in column B.

    # Row 0: Leaf count.
    worksheet.write(summary_start_row + 0, summary_col_label, "Leaf count", summary_label_format)
    worksheet.write(summary_start_row + 0, summary_col_value, leaf_count, summary_value_format)

    # Row 1: Total fields.
    worksheet.write(summary_start_row + 1, summary_col_label, "Total fields", summary_label_format)
    worksheet.write(summary_start_row + 1, summary_col_value, total_fields, summary_value_format)

    # Row 2: Relevant fields.
    worksheet.write(summary_start_row + 2, summary_col_label, "Relevant fields", summary_label_format)
    worksheet.write(summary_start_row + 2, summary_col_value, relevant_fields, summary_value_format)

    # Row 3: Disable field.
    worksheet.write(summary_start_row + 3, summary_col_label, "Disable field", summary_label_format)
    worksheet.write(summary_start_row + 3, summary_col_value, disabled_fields, summary_value_format)

    # Row 4: Overlaps (using a formula that sums the entire matrix range).
    worksheet.write(summary_start_row + 4, summary_col_label, "Overlaps", summary_label_format)
    overlaps_formula = f"=SUM({matrix_range})"
    worksheet.write_formula(summary_start_row + 4, summary_col_value, overlaps_formula, summary_value_format)

    # Row 5: Orthogonal = Total fields - Disabled fields - Overlaps.
    worksheet.write(summary_start_row + 5, summary_col_label, "Orthogonal", summary_label_format)
    # Get the cell reference for Overlaps (which is in column B of summary row 4).
    overlaps_cell = xl_rowcol_to_cell(summary_start_row + 4, summary_col_value, row_abs=True, col_abs=True)
    orthogonal_formula = f"={total_fields}-{disabled_fields}-{overlaps_cell}"
    worksheet.write_formula(summary_start_row + 5, summary_col_value, orthogonal_formula, summary_value_format)

    # Row 6: Score = Overlaps / (Total fields - Disabled fields).
    worksheet.write(summary_start_row + 6, summary_col_label, "Score", summary_label_format)
    score_formula = f"={overlaps_cell}/({total_fields}-{disabled_fields})"
    worksheet.write_formula(summary_start_row + 6, summary_col_value, score_formula, summary_value_format)

    # Row 7: Percentage = 1 - Score.
    worksheet.write(summary_start_row + 7, summary_col_label, "Percentage", summary_label_format)
    # Get the cell reference for Score (in column B of summary row 6).
    score_cell = xl_rowcol_to_cell(summary_start_row + 6, summary_col_value, row_abs=True, col_abs=True)
    percentage_formula = f"=1-{score_cell}"
    worksheet.write_formula(summary_start_row + 7, summary_col_value, percentage_formula, summary_percent_format)

    # --- Set Column Widths (optional) ---
    # Adjust widths for header columns.
    for col in range(header_depth):
        worksheet.set_column(col, col, 20)
    # Adjust widths for the matrix and metric columns.
    for col in range(start_col, start_col + num_cats + 1):
        worksheet.set_column(col, col, 12)
    # Adjust widths for the summary columns.
    worksheet.set_column(summary_col_label, summary_col_label, 15)
    worksheet.set_column(summary_col_value, summary_col_value, 15)

    workbook.close()
    print(f"Created formatted file with summary: {output_filename}")


# --- Loop Over Taxonomy Iterations and Create Files ---
for tax_it in taxonomy_increments:
    generate_orthogonality_excel(tax_it)


Created formatted file with summary: orthogonality_iteration1_formatted.xlsx
Created formatted file with summary: orthogonality_iteration2_formatted.xlsx
Created formatted file with summary: orthogonality_iteration3_formatted.xlsx
