# Measuring Completeness

**Activity Overview**: Evaluate data completeness by checking missing data rates and handling partially available records.

## Title: Customer Profiles

**Task**: Calculate the missing data rate for customer profiles.

**Steps**:
1. List all required fields for a complete customer profile (e.g., name, address, email,
phone number).
2. Analyze the dataset to count how many profiles have missing fields.
3. Calculate the percentage of missing data fields across all profiles.

In [1]:
# Write your code from here
import pandas as pd
from io import StringIO

# --- Configuration ---
customer_profiles_file = 'customer_profiles.csv'  # Replace with your actual file path
required_fields = ['name', 'address', 'email', 'phone_number']  # List of required fields

# --- Sample Data (for demonstration - replace with your file) ---
customer_data = """customer_id,name,address,email,phone_number,age,last_purchase
1,John Doe,123 Main St,john.doe@example.com,555-1234,30,2025-05-01
2,Jane Smith,456 Oak Ave,,555-5678,25,2025-05-05
3,Peter Jones,,peter.jones@mail.com,,40,2025-04-28
4,Alice Brown,789 Pine Ln,alice.brown@work.net,555-9012,,2025-05-10
5,Bob White,101 Elm Rd,bob.white@home.org,555-3456,35,
"""
customer_profiles_df = pd.read_csv(StringIO(customer_data))

# --- Step 1: List Required Fields (already configured above) ---

# --- Step 2: Analyze Missing Fields ---
missing_field_counts = customer_profiles_df[required_fields].isnull().sum()
total_profiles = len(customer_profiles_df)

# Count profiles with at least one missing required field
profiles_with_missing = customer_profiles_df[customer_profiles_df[required_fields].isnull().any(axis=1)]
num_profiles_with_missing = len(profiles_with_missing)

# --- Step 3: Calculate Percentage of Missing Data Fields ---
total_required_fields_across_profiles = total_profiles * len(required_fields)
total_missing_fields = missing_field_counts.sum()

percentage_missing = (total_missing_fields / total_required_fields_across_profiles) * 100 if total_required_fields_across_profiles > 0 else 0

# --- Output ---
print("Missing Data Analysis for Customer Profiles:")
print(f"\nTotal Number of Customer Profiles: {total_profiles}")
print("\nNumber of Profiles with Missing Required Fields:")
print(f"{num_profiles_with_missing} ({(num_profiles_with_missing / total_profiles) * 100:.2f}% of total profiles)")
print("\nCount of Missing Values per Required Field:")
print(missing_field_counts)
print(f"\nTotal Number of Missing Required Fields Across All Profiles: {total_missing_fields}")
print(f"Percentage of Missing Required Data Fields: {percentage_missing:.2f}%")

Missing Data Analysis for Customer Profiles:

Total Number of Customer Profiles: 5

Number of Profiles with Missing Required Fields:
2 (40.00% of total profiles)

Count of Missing Values per Required Field:
name            0
address         1
email           1
phone_number    1
dtype: int64

Total Number of Missing Required Fields Across All Profiles: 3
Percentage of Missing Required Data Fields: 15.00%
