# Ensuring Consistency

**Activity Overview**: Ensure consistency by identifying and resolving conflicting values across datasets.

## Title: Customer Address Discrepancies

**Task**: Address customer address mismatches between CRM and marketing databases.

**Steps**:
1. Compare customer addresses in the CRM with those in the marketing database.
2. Identify records with conflicting address information.
3. Propose a method to consolidate records with verified addresses.

In [1]:
import pandas as pd

# Create sample data files (run this block once)
crm_data = {
    "customer_id": [1, 2, 3, 4],
    "address": [
        "123 Main St, CityA",
        "456 Oak Rd, CityB",
        "789 Pine Ln, CityC",
        "101 Maple Dr, CityD"
    ]
}

marketing_data = {
    "customer_id": [1, 2, 3, 4],
    "address": [
        "123 Main St, CityA",
        "456 Oak Road, CityB",  # formatting difference
        "789 Pine Lane, CityC", # formatting difference
        "102 Maple Dr, CityD"   # discrepancy
    ]
}

pd.DataFrame(crm_data).to_csv("crm_customers.csv", index=False)
pd.DataFrame(marketing_data).to_csv("marketing_customers.csv", index=False)

# Load the datasets
crm_df = pd.read_csv("crm_customers.csv")
marketing_df = pd.read_csv("marketing_customers.csv")

# Merge on customer_id
merged_df = pd.merge(crm_df, marketing_df, on="customer_id", how="inner", suffixes=('_crm', '_marketing'))

# Identify mismatches
merged_df['address_mismatch'] = merged_df['address_crm'] != merged_df['address_marketing']

# Filter conflicts
conflicts = merged_df[merged_df['address_mismatch']]

print("Address Discrepancies Found:")
print(conflicts[['customer_id', 'address_crm', 'address_marketing']])

Address Discrepancies Found:
   customer_id          address_crm     address_marketing
1            2    456 Oak Rd, CityB   456 Oak Road, CityB
2            3   789 Pine Ln, CityC  789 Pine Lane, CityC
3            4  101 Maple Dr, CityD   102 Maple Dr, CityD
