# Measuring Completeness

**Activity Overview**: Evaluate data completeness by checking missing data rates and handling partially available records.

## Title: Customer Profiles

**Task**: Calculate the missing data rate for customer profiles.

**Steps**:
1. List all required fields for a complete customer profile (e.g., name, address, email,
phone number).
2. Analyze the dataset to count how many profiles have missing fields.
3. Calculate the percentage of missing data fields across all profiles.

In [1]:
# Write your code from here
import pandas as pd
from io import StringIO

# Sample customer profiles CSV with missing values
customer_profiles_csv = StringIO("""
customer_id,name,address,email,phone
1,Alice,123 Apple St,alice@example.com,555-1234
2,Bob,,bob@example.com,
3,Charlie,789 Cherry Ave,,555-5678
4,,456 Banana Blvd,charlie@example.com,555-8765
5,Eve,1010 Date Dr,eve@example.com,555-0000
""")

# Load dataset
df = pd.read_csv(customer_profiles_csv)

# Required fields for completeness
required_fields = ['name', 'address', 'email', 'phone']

# Calculate missing count per field
missing_per_field = df[required_fields].isnull().sum()

# Total number of fields to check (profiles * required fields)
total_fields = df.shape[0] * len(required_fields)

# Total missing fields across all profiles
total_missing = missing_per_field.sum()

# Percentage of missing data fields
missing_percentage = (total_missing / total_fields) * 100

# Number of profiles with any missing required field
profiles_with_missing = df[required_fields].isnull().any(axis=1).sum()

# Output results
print("📊 Data Completeness Report")
print("--------------------------")
print(f"Total profiles: {df.shape[0]}")
print(f"Required fields: {required_fields}")
print(f"Missing values per field:\n{missing_per_field}")
print(f"Total missing fields: {total_missing} out of {total_fields} ({missing_percentage:.2f}%)")
print(f"Profiles with any missing required field: {profiles_with_missing}")


📊 Data Completeness Report
--------------------------
Total profiles: 5
Required fields: ['name', 'address', 'email', 'phone']
Missing values per field:
name       1
address    1
email      1
phone      1
dtype: int64
Total missing fields: 4 out of 20 (20.00%)
Profiles with any missing required field: 3
