# Measuring Completeness

**Activity Overview**: Evaluate data completeness by checking missing data rates and handling partially available records.

## Title: Customer Profiles

**Task**: Calculate the missing data rate for customer profiles.

**Steps**:
1. List all required fields for a complete customer profile (e.g., name, address, email,
phone number).
2. Analyze the dataset to count how many profiles have missing fields.
3. Calculate the percentage of missing data fields across all profiles.

In [1]:
# Write your code from here
# Write your code from here
import pandas as pd

def calculate_missing_profile_rate(file_path, required_fields):
    """
    Calculates the missing data rate for required fields in customer profiles.

    Args:
        file_path (str): Path to the CSV file containing customer profile data.
        required_fields (list): A list of strings representing the names of the
                                 required fields for a complete profile.
    Returns:
        tuple: A tuple containing:
            - int: The number of profiles with at least one missing required field.
            - float: The overall percentage of missing required fields across all profiles.
                   Returns None, None if the file is not found or DataFrame is empty.
    """
    try:
        df = pd.read_csv(file_path)
    except FileNotFoundError:
        print(f"Error: File not found at '{file_path}'")
        return None, None
    if df.empty:
        print("Error: The DataFrame is empty.")
        return None, None
    missing_profile_count = 0
    total_required_fields_count = 0
    total_missing_fields_count = 0

    for index, row in df.iterrows():
        has_missing = False
        for field in required_fields:
            if field not in df.columns:
                print(f"Warning: Required field '{field}' not found in the dataset.")
                continue  # Skip this field for this analysis

            if pd.isnull(row[field]):
                has_missing = True
                total_missing_fields_count += 1
            total_required_fields_count += 1

        if has_missing:
            missing_profile_count += 1

    overall_missing_percentage = (total_missing_fields_count / total_required_fields_count) * 100 if total_required_fields_count > 0 else 0

    return missing_profile_count, overall_missing_percentage

# Example Usage
file_path = 'customer_profiles.csv'
required_fields = ['name', 'address', 'email', 'phone_number']

# Create a dummy CSV for demonstration
data = {'customer_id': [1, 2, 3, 4, 5],
        'name': ['Alice', 'Bob', None, 'Charlie', 'David'],
        'address': ['123 Main St', None, '789 Pine Ln', '456 Oak Ave', '101 Elm Rd'],
        'email': ['alice@example.com', 'bob@example.com', 'charlie@example.com', None, 'david@example.com'],
        'phone_number': ['123-456-7890', None, '555-123-4567', '987-654-3210', None],
        'age': [25, 30, None, 35, 28]}
df = pd.DataFrame(data)
df.to_csv(file_path, index=False)

missing_profiles, missing_percentage = calculate_missing_profile_rate(file_path, required_fields)

if missing_profiles is not None:
    print(f"Number of profiles with at least one missing required field: {missing_profiles}")
    print(f"Overall percentage of missing required fields: {missing_percentage:.2f}%")

Number of profiles with at least one missing required field: 4
Overall percentage of missing required fields: 25.00%
