# Measuring Completeness

**Activity Overview**: Evaluate data completeness by checking missing data rates and handling partially available records.

## Title: Customer Profiles

**Task**: Calculate the missing data rate for customer profiles.

**Steps**:
1. List all required fields for a complete customer profile (e.g., name, address, email,
phone number).
2. Analyze the dataset to count how many profiles have missing fields.
3. Calculate the percentage of missing data fields across all profiles.

In [1]:
import pandas as pd

# Create sample customer data with missing values
data = {
    "name": ["Alice", "Bob", None, "David", "Eva"],
    "address": ["123 Lane", None, "789 Road", "456 Street", "101 Blvd"],
    "email": ["alice@example.com", None, "charlie@example.com", "david@example.com", None],
    "phone_number": ["123-4567", "234-5678", None, "345-6789", None]
}

df = pd.DataFrame(data)

# Define required fields for a complete profile
required_fields = ["name", "address", "email", "phone_number"]

# Count missing fields for each profile
df["missing_fields_count"] = df[required_fields].isnull().sum(axis=1)

# Count incomplete profiles
incomplete_profiles = df[df["missing_fields_count"] > 0]
num_incomplete_profiles = len(incomplete_profiles)

# Calculate percentage of missing data per field
missing_percentage_per_field = df[required_fields].isnull().mean() * 100

# Output results
print("Number of incomplete profiles:", num_incomplete_profiles)
print("\nMissing Data Percentage per Field:")
print(missing_percentage_per_field)

Number of incomplete profiles: 3

Missing Data Percentage per Field:
name            20.0
address         20.0
email           40.0
phone_number    40.0
dtype: float64
