# SEX Code Standardization

This notebook focuses on standardizing the SEX codes in the AMR dataset. The transformation involves mapping 'f'/'m' to 'Female'/'Male' for consistency and readability.

## Objectives
- Map 'f' → 'Female'
- Map 'm' → 'Male'
- Validate the transformation

In [None]:
# Import required libraries
import pandas as pd

# Load the dataset
input_file = r'c:\NATIONAL AMR DATA ANALYSIS FILES\data\processed\mapped\df_final_with_standardized_sex_2025-06-12_15-32-27.csv'
df = pd.read_csv(input_file)

print("📂 Dataset loaded successfully!")
print(f"   📊 Shape: {df.shape}")
print(f"   📋 Columns: {list(df.columns)}")

In [None]:
# Define mapping for SEX codes
sex_mapping = {"f": "Female", "m": "Male"}

# Apply mapping to the SEX column
if "SEX" in df.columns:
    df["SEX"] = df["SEX"].str.strip().str.lower().map(sex_mapping)
    
    # Validate transformation
    total_records = len(df)
    mapped_records = df["SEX"].notna().sum()
    unmapped_records = df["SEX"].isna().sum()
    
    print(f"📊 Total records: {total_records:,}")
    print(f"✅ Successfully mapped: {mapped_records:,}")
    print(f"❌ Unmapped records: {unmapped_records:,}")
    print(f"📈 Success rate: {(mapped_records / total_records) * 100:.2f}%")
else:
    print("❌ SEX column not found in the dataset!")

In [None]:
# Export the updated dataset
output_file = r'c:\NATIONAL AMR DATA ANALYSIS FILES\data\processed\mapped\df_final_with_standardized_sex_updated.csv'
df.to_csv(output_file, index=False)

print("💾 Dataset exported successfully!")
print(f"   📁 Location: {output_file}")