# Measuring Completeness

**Activity Overview**: Evaluate data completeness by checking missing data rates and handling partially available records.

## Title: Customer Profiles

**Task**: Calculate the missing data rate for customer profiles.

**Steps**:
1. List all required fields for a complete customer profile (e.g., name, address, email,
phone number).
2. Analyze the dataset to count how many profiles have missing fields.
3. Calculate the percentage of missing data fields across all profiles.

In [1]:
# Write your code from here
import pandas as pd
import numpy as np

# Example customer profiles dataset
data = {
    'customer_id': [1, 2, 3, 4, 5],
    'name': ['Alice', 'Bob', 'Charlie', None, 'Eve'],
    'address': ['123 Main St', None, '789 Park Ave', '456 Elm St', '321 Oak St'],
    'email': ['alice@example.com', 'bob@example.com', None, 'dave@example.com', 'eve@example.com'],
    'phone_number': ['555-1234', None, '555-7890', '555-4567', None]
}

df = pd.DataFrame(data)

# 1. List all required fields
required_fields = ['name', 'address', 'email', 'phone_number']

# 2. Count missing fields per profile
df['missing_fields_count'] = df[required_fields].isnull().sum(axis=1)

# Identify profiles with any missing required fields
df['has_missing_fields'] = df['missing_fields_count'] > 0

# 3. Calculate the percentage of missing data fields across all profiles
total_fields = df.shape[0] * len(required_fields)
total_missing = df[required_fields].isnull().sum().sum()
missing_percentage = (total_missing / total_fields) * 100

print("Customer Profiles Data:")
print(df)

print(f"\nTotal required fields: {total_fields}")
print(f"Total missing fields: {total_missing}")
print(f"Percentage of missing data fields: {missing_percentage:.2f}%")

# Additional insight: number and percentage of profiles with missing fields
num_profiles_with_missing = df['has_missing_fields'].sum()
percent_profiles_with_missing = (num_profiles_with_missing / df.shape[0]) * 100

print(f"\nProfiles with missing required fields: {num_profiles_with_missing} out of {df.shape[0]} "
      f"({percent_profiles_with_missing:.2f}%)")


Customer Profiles Data:
   customer_id     name       address              email phone_number  \
0            1    Alice   123 Main St  alice@example.com     555-1234   
1            2      Bob          None    bob@example.com         None   
2            3  Charlie  789 Park Ave               None     555-7890   
3            4     None    456 Elm St   dave@example.com     555-4567   
4            5      Eve    321 Oak St    eve@example.com         None   

   missing_fields_count  has_missing_fields  
0                     0               False  
1                     2                True  
2                     1                True  
3                     1                True  
4                     1                True  

Total required fields: 20
Total missing fields: 5
Percentage of missing data fields: 25.00%

Profiles with missing required fields: 4 out of 5 (80.00%)
