# Ensuring Consistency

**Activity Overview**: Ensure consistency by identifying and resolving conflicting values across datasets.

## Title: Customer Address Discrepancies

**Task**: Address customer address mismatches between CRM and marketing databases.

**Steps**:
1. Compare customer addresses in the CRM with those in the marketing database.
2. Identify records with conflicting address information.
3. Propose a method to consolidate records with verified addresses.

In [2]:
# Write your code from here
# Merge datasets on customer_id and name to align records
merged_df = crm_df.merge(marketing_df, on=['customer_id', 'name'], suffixes=('_crm', '_marketing'))

# Find rows where addresses differ (case-insensitive comparison)
address_conflicts = merged_df[
    merged_df.apply(lambda row: row['address_crm'].strip().lower() != row['address_marketing'].strip().lower(), axis=1)
]

print(f"⚠️ Found {len(address_conflicts)} address conflicts:")
print(address_conflicts[['customer_id', 'name', 'address_crm', 'address_marketing']])


⚠️ Found 2 address conflicts:
   customer_id     name      address_crm     address_marketing
1            2      Bob  456 Banana Blvd  456 Banana Boulevard
2            3  Charlie   789 Cherry Ave     789 Cherry Avenue


In [3]:
def normalize_address(addr):
    if not isinstance(addr, str):
        return addr
    replacements = {
        'blvd': 'boulevard',
        'ave': 'avenue',
        'st': 'street',
        # add more rules as needed
    }
    addr = addr.lower()
    for abbr, full in replacements.items():
        addr = addr.replace(abbr, full)
    return addr.strip()

merged_df['norm_address_crm'] = merged_df['address_crm'].apply(normalize_address)
merged_df['norm_address_marketing'] = merged_df['address_marketing'].apply(normalize_address)

# Check conflicts again after normalization
normalized_conflicts = merged_df[
    merged_df['norm_address_crm'] != merged_df['norm_address_marketing']
]

print(f"⚠️ Address conflicts after normalization: {len(normalized_conflicts)}")
print(normalized_conflicts[['customer_id', 'name', 'address_crm', 'address_marketing']])


⚠️ Address conflicts after normalization: 1
   customer_id     name     address_crm  address_marketing
2            3  Charlie  789 Cherry Ave  789 Cherry Avenue


In [1]:
import pandas as pd
from io import StringIO

# CRM customer data
crm_csv = StringIO("""
customer_id,name,address
1,Alice,123 Apple St
2,Bob,456 Banana Blvd
3,Charlie,789 Cherry Ave
4,David,1010 Date Dr
""")

# Marketing customer data
marketing_csv = StringIO("""
customer_id,name,address
1,Alice,123 Apple St
2,Bob,456 Banana Boulevard
3,Charlie,789 Cherry Avenue
4,David,1010 Date Dr
""")

# Load dataframes
crm_df = pd.read_csv(crm_csv)
marketing_df = pd.read_csv(marketing_csv)
