# Duplicate Column Diagnosis - ORDER_LIST Transform Pipeline

## Problem Analysis
The ORDER_LIST transformation pipeline is failing with duplicate column errors:
- **Error**: "The column name '36X30' is specified more than once in the SET clause"
- **Additional duplicates**: '40X30', 'ONE SIZE'
- **Impact**: SQL INSERT statements cannot execute due to duplicate column names

## Investigation Approach
1. **DDL Schema Analysis** - Parse the DDL file to identify exact duplicate column definitions
2. **YAML Mapping Analysis** - Check for duplicate aliases or mapping conflicts
3. **DataFrame Column Analysis** - Identify where duplicates are introduced during transformation
4. **Root Cause Analysis** - Pinpoint the exact source of duplicate column names
5. **Fix Strategy Generation** - Create actionable solutions to resolve duplicates

## Key Questions
- Are duplicates in the DDL schema itself?
- Are duplicates created by YAML alias mapping?
- Are duplicates introduced during DataFrame consolidation?
- How can we fix the schema to prevent future duplicates?

## 1. Import Required Libraries
Import necessary libraries for analyzing DDL schema, YAML metadata, and generating fixes.

In [None]:
import sys
import re
import yaml
import pandas as pd
from pathlib import Path
from typing import Dict, List, Set, Tuple, Any
from collections import Counter, defaultdict

# Standard import pattern for project utilities
def find_repo_root() -> Path:
    """Find repository root by looking for utils folder"""
    current = Path(__file__).parent if '__file__' in globals() else Path.cwd()
    while current != current.parent:
        if (current / "utils").exists():
            return current
        current = current.parent
    raise FileNotFoundError("Could not find repository root")

repo_root = find_repo_root()
sys.path.insert(0, str(repo_root / "utils"))

# Import project utilities
import db_helper as db
import logger_helper

# Initialize logger
logger = logger_helper.get_logger(__name__)

print("✅ All libraries imported successfully")
print(f"📁 Repository root: {repo_root}")
print(f"📁 Working directory: {Path.cwd()}")
print(f"🔧 Python path updated with utils: {str(repo_root / 'utils')}")