# Prototype Notebook: DV Standardization Tool
This notebook is intended for testing logic, validating schema transformations, and iterating on the naming scheme used to standardize dependent variables in HCI datasets.

## Objectives
- Load example datasets
- Apply DV standardization using the `convert_dv.py` script logic
- Validate outputs and explore edge cases
- Prototype fuzzy matching and schema extension

In [None]:
# Standard libraries
import pandas as pd
import yaml
from pathlib import Path

# Optional: fuzzy matching (if used)
# from rapidfuzz import fuzz

## Load Example Dataset

In [None]:
# Adjust the path as needed
raw_path = Path('../data/raw/sample_dataset.csv')
df_raw = pd.read_csv(raw_path)
df_raw.head()

## Load Standardized DV Mapping Schema

In [None]:
with open('../schemas/standard_dv_mapping.yaml', 'r') as file:
    dv_schema = yaml.safe_load(file)
dv_schema

## Apply Renaming Logic

In [None]:
# Create a reverse alias map for demonstration purposes
alias_map = {alias: std for std, aliases in dv_schema.items() for alias in aliases}

# Rename columns in the dataset
df_renamed = df_raw.rename(columns=lambda col: alias_map.get(col, col))
df_renamed.head()

## Save Transformed Dataset

In [None]:
output_path = Path('../data/processed/standardized_sample.csv')
df_renamed.to_csv(output_path, index=False)

## [Optional] Prototype Fuzzy Matching Logic

In [None]:
# from rapidfuzz import process
# Example fuzzy matching implementation here

## Summary
This notebook demonstrates the use of a schema-based DV standardization tool in an HCI context. You can use this to test how robust your schema is and iterate on edge cases or unexpected naming variants.