# Measuring Completeness

**Activity Overview**: Evaluate data completeness by checking missing data rates and handling partially available records.

## Title: Customer Profiles

**Task**: Calculate the missing data rate for customer profiles.

**Steps**:
1. List all required fields for a complete customer profile (e.g., name, address, email,
phone number).
2. Analyze the dataset to count how many profiles have missing fields.
3. Calculate the percentage of missing data fields across all profiles.

In [5]:
# Write your code from here
import pandas as pd

# Load customer data
df = pd.read_csv("customer_profiles.csv")

# Define required fields
required_fields = ["name", "address", "email", "phone_number"]

# Count missing fields per record
df["missing_fields_count"] = df[required_fields].isnull().sum(axis=1)

# Count profiles with any missing required fields
incomplete_profiles = df[df["missing_fields_count"] > 0]

# Calculate total missing fields
total_missing = df[required_fields].isnull().sum().sum()

# Calculate total fields (rows x columns)
total_expected_fields = df.shape[0] * len(required_fields)

# Calculate missing data rate
missing_rate = (total_missing / total_expected_fields) * 100

print(f"Total Profiles: {df.shape[0]}")
print(f"Incomplete Profiles: {len(incomplete_profiles)}")
print(f"Missing Data Rate: {missing_rate:.2f}%")

# Optional: Save incomplete profiles
incomplete_profiles.to_csv("incomplete_customer_profiles.csv", index=False)


Total Profiles: 5
Incomplete Profiles: 3
Missing Data Rate: 30.00%
