# Condition Textual Variations Analysis

This notebook performs textual variation analysis on historical records data to identify all unique values in condition-related columns. The goal is to catalog the diverse terminology used across different record types and understand how conditions were expressed in colonial documents.

## Overview

The analysis workflow includes:
1. Load and harmonize datasets
2. Extract unique values from condition-related columns
3. Count term frequencies and contexts
4. Analyze textual variations and mixed categories
5. Export comprehensive catalog to `data/interim` for further analysis

**Note**: This is exploratory analysis of textual variations - columns often mix different condition categories (e.g., marital status + age descriptors).

## Setup and Imports

Import necessary libraries for data processing and analysis.

In [1]:
# Core data manipulation
import pandas as pd
import re
import json
from collections import Counter

# Project utilities
from utils.ColumnManager import ColumnManager
from utils.LoggerHandler import setup_logger

# Setup logging
logger = setup_logger("conditionTextualVariations")

print("All imports successful!")

All imports successful!


## Configuration

Define dataset paths and existing condition mappings.

In [2]:
# Dataset configuration
DATAFRAMES_CONFIG = {
    "bautismos": {
        "csv_file": "../data/raw/bautismos.csv",
        "mapping_file": "../data/mappings/bautismosMapping.json"
    },
    "entierros": {
        "csv_file": "../data/raw/entierros.csv",
        "mapping_file": "../data/mappings/entierrosMapping.json"
    },
    "matrimonios": {
        "csv_file": "../data/raw/matrimonios.csv",
        "mapping_file": "../data/mappings/matrimoniosMapping.json"
    }
}

# Load existing condition mappings
with open('../data/mappings/conditionMapping.json', 'r', encoding='utf-8') as f:
    condition_mappings = json.load(f)

# Extract existing mapped terms by category
EXISTING_MAPPINGS = {
    "legitimacy_status": list(condition_mappings["attribute_mappings"]["legitimacy_status"].keys()),
    "social_condition": list(condition_mappings["attribute_mappings"]["social_condition"].keys()),
    "marital_status": list(condition_mappings["attribute_mappings"]["marital_status"].keys())
}

print(f"Configured {len(DATAFRAMES_CONFIG)} datasets for analysis")
print(f"Loaded existing mappings:")
for category, terms in EXISTING_MAPPINGS.items():
    print(f"  - {category}: {len(terms)} terms")

Configured 3 datasets for analysis
Loaded existing mappings:
  - legitimacy_status: 14 terms
  - social_condition: 18 terms
  - marital_status: 12 terms


## Textual Variation Extraction Function

Simple function to extract all textual variations from columns matching a pattern.

In [3]:
def extract_textual_variations_from_columns(column_pattern, min_frequency=1):
    """
    Extract all textual variations from columns matching a pattern.
    This reveals how conditions were expressed in historical documents,
    including mixed categories (e.g., marital status + age descriptors).
    
    Args:
        column_pattern: Regex pattern to match column names
        min_frequency: Minimum frequency for a term to be included
        
    Returns:
        Dictionary with term frequencies and contextual metadata
    """
    term_data = {}
    
    for dataset, info in DATAFRAMES_CONFIG.items():
        print(f"\nExtracting textual variations from {dataset}...")
        csv_path = info["csv_file"]
        mapping_path = info["mapping_file"]

        # Load and harmonize data
        df = pd.read_csv(csv_path)
        column_manager = ColumnManager()
        df = column_manager.harmonize_columns(df, mapping_path)

        # Find matching columns
        if isinstance(column_pattern, str):
            pattern = re.compile(column_pattern)
        else:
            pattern = column_pattern
            
        matching_columns = [col for col in df.columns if pattern.search(col)]
        print(f"Columns found: {matching_columns}")

        if not matching_columns:
            continue
        
        # Process each column - extract all textual variations
        for col in matching_columns:
            print(f"Processing column: {col}")
            
            # Get non-null values with basic cleaning
            values = df[col].dropna().astype(str)
            values = values[values != 'nan']
            values = values[values.str.strip() != '']
            
            # Simple lowercase normalization for consistency
            cleaned_values = values.str.lower().str.strip()
            
            print(f"Found {len(cleaned_values)} non-empty entries")
            
            # Count occurrences
            value_counts = cleaned_values.value_counts()
            
            for term, count in value_counts.items():
                if term not in term_data:
                    term_data[term] = {
                        'frequency': 0, 
                        'columns': set(), 
                        'datasets': set(),
                        'raw_examples': set()
                    }
                
                term_data[term]['frequency'] += count
                term_data[term]['columns'].add(col)
                term_data[term]['datasets'].add(dataset)
                
                # Keep a few raw examples (before cleaning)
                original_values = values[cleaned_values == term].head(3)
                term_data[term]['raw_examples'].update(original_values.tolist())
    
    # Filter by frequency and convert sets to lists
    filtered_terms = {}
    for term, data in term_data.items():
        if data['frequency'] >= min_frequency:
            filtered_terms[term] = {
                'frequency': data['frequency'],
                'columns': list(data['columns']),
                'datasets': list(data['datasets']),
                'raw_examples': list(data['raw_examples'])[:3]  # Keep max 3 examples
            }
    
    return filtered_terms

print("Textual variation extraction function defined!")

Textual variation extraction function defined!


## Textual Variation Analysis Function

Function to analyze extracted textual variations and compare with known mappings.

In [4]:
def analyze_textual_variations(terms_dict, category_name, existing_mappings, show_details=True):
    """
    Analyze textual variations and compare with existing mappings.
    Note: Results may show mixed categories as historical records often 
    combined different condition types in single columns.
    
    Args:
        terms_dict: Dictionary of extracted textual variations
        category_name: Name of the condition category being analyzed
        existing_mappings: List of already mapped terms
        show_details: Whether to show detailed output
        
    Returns:
        Tuple of (mapped_terms, unmapped_terms)
    """
    if not terms_dict:
        print(f"No textual variations found for {category_name}")
        return [], []
    
    # Sort by frequency
    sorted_terms = sorted(terms_dict.items(), key=lambda x: x[1]['frequency'], reverse=True)
    
    # Separate mapped vs unmapped terms
    mapped_terms = []
    unmapped_terms = []
    
    for term, info in sorted_terms:
        if term in existing_mappings:
            mapped_terms.append((term, info))
        else:
            unmapped_terms.append((term, info))
    
    if show_details:
        print(f"\n=== {category_name.upper()} TEXTUAL VARIATIONS ===")
        print(f"Total unique variations: {len(terms_dict)}")
        print(f"Known mappings found: {len(mapped_terms)}")
        print(f"Unmapped variations: {len(unmapped_terms)}")
        print(f"Note: Columns may contain mixed categories (e.g., marital + age descriptors)")
        
        # Show all terms with details
        df_terms = pd.DataFrame([
            {
                'term': term,
                'frequency': info['frequency'],
                'known_mapping': 'Yes' if term in existing_mappings else 'No',
                'datasets': ', '.join(info['datasets']),
                'columns': ', '.join(info['columns']),
                'raw_examples': ', '.join(info['raw_examples'])
            }
            for term, info in sorted_terms
        ])
        
        print(f"\n{category_name.title()} Textual Variations Summary:")
        display(df_terms)
        
        # Show unmapped variations for further analysis
        if unmapped_terms:
            print(f"\n--- UNMAPPED TEXTUAL VARIATIONS FOR {category_name.upper()} ---")
            for term, info in unmapped_terms[:20]:  # Top 20 unmapped terms
                freq = info['frequency']
                datasets = ', '.join(info['datasets'])
                examples = ', '.join(f"'{ex}'" for ex in info['raw_examples'])
                print(f"  '{term}' -> freq: {freq}, datasets: {datasets}, examples: {examples}")
    
    return mapped_terms, unmapped_terms

print("Textual variation analysis function defined!")

Textual variation analysis function defined!


## Extract Legitimacy Status Terms

Extract and analyze terms from legitimacy status columns.

In [5]:
# Extract legitimacy status textual variations
legitimacy_variations = extract_textual_variations_from_columns(
    column_pattern=re.compile(r".*legitimacy_status.*"),
    min_frequency=1
)

# Analyze and show results
legitimacy_mapped, legitimacy_unmapped = analyze_textual_variations(
    legitimacy_variations, 
    "legitimacy_status", 
    EXISTING_MAPPINGS["legitimacy_status"]
)


Extracting textual variations from bautismos...
Columns found: ['baptized_legitimacy_status']
Processing column: baptized_legitimacy_status
Found 6332 non-empty entries

Extracting textual variations from entierros...
Columns found: ['deceased_legitimacy_status']
Processing column: deceased_legitimacy_status
Found 1148 non-empty entries

Extracting textual variations from matrimonios...
Columns found: ['groom_legitimacy_status', 'bride_legitimacy_status']
Processing column: groom_legitimacy_status
Found 1416 non-empty entries
Processing column: bride_legitimacy_status
Found 1442 non-empty entries

=== LEGITIMACY_STATUS TEXTUAL VARIATIONS ===
Total unique variations: 315
Known mappings found: 6
Unmapped variations: 309
Note: Columns may contain mixed categories (e.g., marital + age descriptors)

Legitimacy_Status Textual Variations Summary:


Unnamed: 0,term,frequency,known_mapping,datasets,columns,raw_examples
0,hijo legitimo,1495,No,"entierros, matrimonios, bautismos","deceased_legitimacy_status, groom_legitimacy_s...",Hijo legitimo
1,hija legitima,1366,No,"entierros, matrimonios, bautismos","deceased_legitimacy_status, bride_legitimacy_s...",Hija legitima
2,hijo legítimo,996,No,"entierros, bautismos","deceased_legitimacy_status, baptized_legitimac...","Hijo legítimo, hijo legítimo"
3,legítima,958,Yes,"entierros, matrimonios","deceased_legitimacy_status, bride_legitimacy_s...",legítima
4,legítimo,937,Yes,"entierros, matrimonios","deceased_legitimacy_status, groom_legitimacy_s...",legítimo
...,...,...,...,...,...,...
310,bastarda de padre no conocido,1,No,matrimonios,bride_legitimacy_status,bastarda de padre no conocido
311,expósita,1,No,matrimonios,bride_legitimacy_status,expósita
312,hija naturala,1,No,matrimonios,bride_legitimacy_status,Hija naturala
313,hija natural / espurio,1,No,matrimonios,bride_legitimacy_status,Hija natural / espurio



--- UNMAPPED TEXTUAL VARIATIONS FOR LEGITIMACY_STATUS ---
  'hijo legitimo' -> freq: 1495, datasets: entierros, matrimonios, bautismos, examples: 'Hijo legitimo'
  'hija legitima' -> freq: 1366, datasets: entierros, matrimonios, bautismos, examples: 'Hija legitima'
  'hijo legítimo' -> freq: 996, datasets: entierros, bautismos, examples: 'Hijo legítimo', 'hijo legítimo'
  'hija legítima' -> freq: 917, datasets: entierros, bautismos, examples: 'hija legítima', 'Hija legítima'
  'hijo legítimo, indio' -> freq: 521, datasets: bautismos, examples: 'hijo legítimo, indio'
  'hija legítima, india' -> freq: 428, datasets: bautismos, examples: 'hija legítima, india'
  'hijo natural' -> freq: 383, datasets: entierros, matrimonios, bautismos, examples: 'hijo natural', 'Hijo natural'
  'hija natural' -> freq: 350, datasets: entierros, matrimonios, bautismos, examples: 'Hija natural', 'hija natural'
  'hija legítima, indígena' -> freq: 118, datasets: bautismos, examples: 'hija legítima, indígena'


## Extract Social Condition Terms

Extract and analyze terms from social condition columns.

In [6]:
# Extract social condition textual variations
social_variations = extract_textual_variations_from_columns(
    column_pattern=re.compile(r".*social_condition.*"),
    min_frequency=1
)

# Analyze and show results
social_mapped, social_unmapped = analyze_textual_variations(
    social_variations, 
    "social_condition", 
    EXISTING_MAPPINGS["social_condition"]
)


Extracting textual variations from bautismos...
Columns found: ['father_social_condition', 'mother_social_condition', 'parents_social_condition', 'godfather_social_condition', 'godmother_social_condition']
Processing column: father_social_condition
Found 836 non-empty entries
Processing column: mother_social_condition
Found 976 non-empty entries
Processing column: parents_social_condition
Found 3845 non-empty entries
Processing column: godfather_social_condition
Found 2047 non-empty entries
Processing column: godmother_social_condition
Found 1155 non-empty entries

Extracting textual variations from entierros...
Columns found: []

Extracting textual variations from matrimonios...
Columns found: ['groom_social_condition', 'groom_father_social_condition', 'groom_mother_social_condition', 'bride_social_condition', 'bride_father_social_condition', 'bride_mother_social_condition', 'godparent_1_social_condition', 'godparent_2_social_condition', 'godparent_3_social_condition']
Processing col

Unnamed: 0,term,frequency,known_mapping,datasets,columns,raw_examples
0,-,4311,No,bautismos,"mother_social_condition, father_social_conditi...",-
1,tributario/tributaria,1382,No,bautismos,"mother_social_condition, father_social_condition",Tributario/tributaria
2,doña,528,Yes,"matrimonios, bautismos","godparent_2_social_condition, mother_social_co...",doña
3,indigena,482,No,"matrimonios, bautismos","bride_father_social_condition, mother_social_c...","indigena, Indigena"
4,don,332,Yes,"matrimonios, bautismos","bride_father_social_condition, godparent_1_soc...","don, don"
...,...,...,...,...,...,...
951,don. por delegación de federico pérez albela,1,No,matrimonios,godparent_1_social_condition,don. Por delegación de Federico Pérez Albela
952,"señora capitana, doña",1,No,matrimonios,godparent_2_social_condition,"señora capitana, doña"
953,hija de jose julian bendezu,1,No,matrimonios,godparent_2_social_condition,hija de Jose Julian Bendezu
954,esposa del padrino. doña. vecina de aucara,1,No,matrimonios,godparent_2_social_condition,esposa del padrino. Doña. Vecina de Aucara



--- UNMAPPED TEXTUAL VARIATIONS FOR SOCIAL_CONDITION ---
  '-' -> freq: 4311, datasets: bautismos, examples: '-'
  'tributario/tributaria' -> freq: 1382, datasets: bautismos, examples: 'Tributario/tributaria'
  'indigena' -> freq: 482, datasets: matrimonios, bautismos, examples: 'indigena', 'Indigena'
  'indios de pampamarca' -> freq: 244, datasets: matrimonios, bautismos, examples: 'Indios de Pampamarca', 'indios de Pampamarca'
  'indios' -> freq: 190, datasets: matrimonios, bautismos, examples: 'indios'
  'tributarios' -> freq: 172, datasets: matrimonios, bautismos, examples: 'tributarios', 'Tributarios'
  'naturales y vecinos de pampamarca' -> freq: 141, datasets: matrimonios, examples: 'naturales y vecinos de Pampamarca'
  'naturales y vecinos de esta [aucara]' -> freq: 136, datasets: matrimonios, examples: 'naturales y vecinos de esta [Aucara]'
  '"inds. [indios] de pampamarca"' -> freq: 131, datasets: bautismos, examples: '"inds. [indios] de Pampamarca"'
  'indios de ishua' -> f

## Extract Marital Status Terms

Extract and analyze terms from marital status columns.

In [7]:
# Extract marital status textual variations
marital_variations = extract_textual_variations_from_columns(
    column_pattern=re.compile(r".*marital_status.*"),
    min_frequency=1
)

# Analyze and show results
marital_mapped, marital_unmapped = analyze_textual_variations(
    marital_variations, 
    "marital_status", 
    EXISTING_MAPPINGS["marital_status"]
)


Extracting textual variations from bautismos...
Columns found: []

Extracting textual variations from entierros...
Columns found: ['deceased_marital_status']
Processing column: deceased_marital_status
Found 1557 non-empty entries

Extracting textual variations from matrimonios...
Columns found: ['groom_marital_status', 'bride_marital_status']
Processing column: groom_marital_status
Found 1505 non-empty entries
Processing column: bride_marital_status
Found 1474 non-empty entries

=== MARITAL_STATUS TEXTUAL VARIATIONS ===
Total unique variations: 35
Known mappings found: 6
Unmapped variations: 29
Note: Columns may contain mixed categories (e.g., marital + age descriptors)

Marital_Status Textual Variations Summary:
Columns found: []

Extracting textual variations from entierros...
Columns found: ['deceased_marital_status']
Processing column: deceased_marital_status
Found 1557 non-empty entries

Extracting textual variations from matrimonios...
Columns found: ['groom_marital_status', 'br

Unnamed: 0,term,frequency,known_mapping,datasets,columns,raw_examples
0,soltera,1389,Yes,"entierros, matrimonios","deceased_marital_status, bride_marital_status","soltera, Soltera"
1,soltero,1321,Yes,"entierros, matrimonios","deceased_marital_status, groom_marital_status,...","Soltero, soltero"
2,viuda,406,Yes,"entierros, matrimonios","deceased_marital_status, bride_marital_status","viuda, Viuda"
3,casado,400,Yes,entierros,deceased_marital_status,"casado, Casado"
4,viudo,368,Yes,"entierros, matrimonios","deceased_marital_status, groom_marital_status","viudo, Viudo"
5,-,282,No,entierros,deceased_marital_status,-
6,casada,247,Yes,entierros,deceased_marital_status,Casada
7,parvula,24,No,entierros,deceased_marital_status,Parvula
8,parvulo,24,No,entierros,deceased_marital_status,Parvulo
9,"""marido que fue""",18,No,entierros,deceased_marital_status,"""Marido que fue"", ""marido que fue"""



--- UNMAPPED TEXTUAL VARIATIONS FOR MARITAL_STATUS ---
  '-' -> freq: 282, datasets: entierros, examples: '-'
  'parvula' -> freq: 24, datasets: entierros, examples: 'Parvula'
  'parvulo' -> freq: 24, datasets: entierros, examples: 'Parvulo'
  '"marido que fue"' -> freq: 18, datasets: entierros, examples: '"Marido que fue"', '"marido que fue"'
  'parbulo' -> freq: 10, datasets: entierros, examples: 'Parbulo'
  '"mujer que fue"' -> freq: 8, datasets: entierros, examples: '"mujer que fue"'
  'parbula' -> freq: 6, datasets: entierros, examples: 'Parbula'
  'solteros' -> freq: 4, datasets: matrimonios, examples: 'solteros'
  'casada - viuda' -> freq: 3, datasets: entierros, examples: 'Casada - Viuda'
  '"casado"' -> freq: 2, datasets: entierros, examples: '"casado"'
  '[ilegible]' -> freq: 2, datasets: entierros, examples: '[ilegible]'
  'mujer de' -> freq: 2, datasets: entierros, examples: 'Mujer de'
  'marido que fue de' -> freq: 2, datasets: entierros, examples: 'Marido que fue de'
  '

## Analyze Mixed Categories and Variations

Examine how textual variations reveal mixed categories and complex terminology usage.

In [8]:
def analyze_mixed_categories(category_unmapped, category_name, min_freq=2):
    """
    Analyze unmapped variations to identify mixed categories and complex terminology.
    """
    if not category_unmapped:
        print(f"No unmapped variations for {category_name}")
        return {}
    
    analysis_results = {}
    frequent_terms = [(term, info) for term, info in category_unmapped if info['frequency'] >= min_freq]
    
    if frequent_terms:
        print(f"\n--- MIXED CATEGORY ANALYSIS FOR {category_name.upper()} ---")
        print(f"Frequent unmapped variations (freq >= {min_freq}):")
        
        for term, info in frequent_terms:
            freq = info['frequency']
            datasets = ', '.join(info['datasets'])
            columns = ', '.join(info['columns'])
            examples = ', '.join(f"'{ex}'" for ex in info['raw_examples'])
            
            analysis_results[term] = {
                "frequency": freq,
                "datasets": info['datasets'],
                "columns": info['columns'],
                "raw_examples": info['raw_examples'],
                "analysis_note": "Review for mixed categories or alternative terminology"
            }
            
            print(f"  '{term}' (freq: {freq})")
            print(f"    -> Found in: {datasets}")
            print(f"    -> Columns: {columns}")
            print(f"    -> Examples: {examples}")
            print()
        
        print(f"Summary: {len(frequent_terms)} frequent variations need further analysis")
        print("These may represent mixed categories (e.g., 'parvulo' = age + marital status)")
    else:
        print(f"No frequent unmapped variations found for {category_name}")
    
    return analysis_results

print("=== TEXTUAL VARIATION ANALYSIS ===")
print("Examining mixed categories and complex terminology usage...")

# Analyze each category for mixed categories
legitimacy_mixed = analyze_mixed_categories(legitimacy_unmapped, "legitimacy_status")
social_mixed = analyze_mixed_categories(social_unmapped, "social_condition")
marital_mixed = analyze_mixed_categories(marital_unmapped, "marital_status")

=== TEXTUAL VARIATION ANALYSIS ===
Examining mixed categories and complex terminology usage...

--- MIXED CATEGORY ANALYSIS FOR LEGITIMACY_STATUS ---
Frequent unmapped variations (freq >= 2):
  'hijo legitimo' (freq: 1495)
    -> Found in: entierros, matrimonios, bautismos
    -> Columns: deceased_legitimacy_status, groom_legitimacy_status, baptized_legitimacy_status
    -> Examples: 'Hijo legitimo'

  'hija legitima' (freq: 1366)
    -> Found in: entierros, matrimonios, bautismos
    -> Columns: deceased_legitimacy_status, bride_legitimacy_status, baptized_legitimacy_status
    -> Examples: 'Hija legitima'

  'hijo legítimo' (freq: 996)
    -> Found in: entierros, bautismos
    -> Columns: deceased_legitimacy_status, baptized_legitimacy_status
    -> Examples: 'Hijo legítimo', 'hijo legítimo'

  'hija legítima' (freq: 917)
    -> Found in: entierros, bautismos
    -> Columns: deceased_legitimacy_status, baptized_legitimacy_status
    -> Examples: 'hija legítima', 'Hija legítima'

  'h

## Export Textual Variations Catalog

Save comprehensive textual variations catalog to `data/interim` for further analysis.

In [9]:
# Create comprehensive textual variations catalog
variations_catalog = {
    "metadata": {
        "generated_date": pd.Timestamp.now().isoformat(),
        "source_notebook": "3_termExtraction.ipynb",
        "analysis_type": "condition_textual_variations",
        "note": "Comprehensive catalog of textual variations in condition-related columns. May contain mixed categories (e.g., marital status + age descriptors).",
        "minimum_frequency": 2
    },
    "mixed_category_analysis": {
        "legitimacy_status": legitimacy_mixed,
        "social_condition": social_mixed,
        "marital_status": marital_mixed
    },
    "all_textual_variations": {
        "legitimacy_status": legitimacy_variations,
        "social_condition": social_variations,
        "marital_status": marital_variations
    },
    "summary_statistics": {
        "legitimacy_status": {
            "total_variations": len(legitimacy_variations),
            "mapped_variations": len(legitimacy_mapped),
            "unmapped_variations": len(legitimacy_unmapped)
        },
        "social_condition": {
            "total_variations": len(social_variations),
            "mapped_variations": len(social_mapped),
            "unmapped_variations": len(social_unmapped)
        },
        "marital_status": {
            "total_variations": len(marital_variations),
            "mapped_variations": len(marital_mapped),
            "unmapped_variations": len(marital_unmapped)
        }
    }
}

# Export to interim data folder
output_file = "../data/interim/condition_textual_variations.json"

with open(output_file, 'w', encoding='utf-8') as f:
    json.dump(variations_catalog, f, indent=2, ensure_ascii=False)

print(f"\n=== EXPORT COMPLETE ===")
print(f"Textual variations catalog exported to: {output_file}")
print(f"Summary:")
print(f"  - Legitimacy status: {len(legitimacy_variations)} variations ({len(legitimacy_unmapped)} unmapped)")
print(f"  - Social condition: {len(social_variations)} variations ({len(social_unmapped)} unmapped)")
print(f"  - Marital status: {len(marital_variations)} variations ({len(marital_unmapped)} unmapped)")
print(f"\nKey findings:")
print(f"  - Mixed categories detected (e.g., 'parvulo' in marital status columns)")
print(f"  - Multiple spelling variations and abbreviations identified")
print(f"  - Historical terminology diversity documented")
print(f"\nNext steps:")
print(f"1. Review mixed category analysis in: {output_file}")
print(f"2. Identify patterns in textual variations")
print(f"3. Use insights for data harmonization strategies")


=== EXPORT COMPLETE ===
Textual variations catalog exported to: ../data/interim/condition_textual_variations.json
Summary:
  - Legitimacy status: 315 variations (309 unmapped)
  - Social condition: 956 variations (939 unmapped)
  - Marital status: 35 variations (29 unmapped)

Key findings:
  - Mixed categories detected (e.g., 'parvulo' in marital status columns)
  - Multiple spelling variations and abbreviations identified
  - Historical terminology diversity documented

Next steps:
1. Review mixed category analysis in: ../data/interim/condition_textual_variations.json
2. Identify patterns in textual variations
3. Use insights for data harmonization strategies
