## Check Uniqueness & Validity

**Objective**: Evaluate data quality by checking for uniqueness and validity of data entries.

For this activity, you will use a sample dataset students.csv that contains the following
columns: ID , Name , Age , Grade , Email .

**Steps**:
1. Check Uniqueness
    - Unique IDs
    - Unique Email Addresses
    - Unique Combination

2. Check Validity
    - Validate Age Range
    - Validate Grade Scale
    - Validate Name Format

In [2]:
import pandas as pd
import re

# Load the dataset
df = pd.read_csv('students.csv')

# ----- Uniqueness Checks -----

# 1.1 Unique IDs
unique_ids = df['ID'].is_unique
print("Are all IDs unique?", unique_ids)

# 1.2 Unique Email Addresses
unique_emails = df['Email'].is_unique
print("Are all Email addresses unique?", unique_emails)

# 1.3 Unique Combination of ID and Email
unique_combo = df[['ID', 'Email']].drop_duplicates().shape[0] == df.shape[0]
print("Are ID and Email combinations unique?", unique_combo)

# ----- Validity Checks -----

# 2.1 Validate Age Range (Assume valid age is between 5 and 100)
valid_ages = df['Age'].between(5, 100)
print("\nInvalid Age entries:")
print(df[~valid_ages])

# 2.2 Validate Grade Scale (Assume grades are A to F)
valid_grades = df['Grade'].isin(['A', 'B', 'C', 'D', 'E', 'F'])
print("\nInvalid Grade entries:")
print(df[~valid_grades])

# 2.3 Validate Name Format (Only alphabetic characters and spaces allowed)
def is_valid_name(name):
    return bool(re.fullmatch(r"[A-Za-z ]+", str(name)))

df['Valid_Name'] = df['Name'].apply(is_valid_name)
print("\nInvalid Name entries:")
print(df[~df['Valid_Name']])

# ----- Summary -----
print("\n--- Summary ---")
print("Unique IDs:", unique_ids)
print("Unique Emails:", unique_emails)
print("Unique ID+Email Combo:", unique_combo)
print("Invalid Ages Count:", (~valid_ages).sum())
print("Invalid Grades Count:", (~valid_grades).sum())
print("Invalid Names Count:", (~df['Valid_Name']).sum())


Are all IDs unique? True
Are all Email addresses unique? True
Are ID and Email combinations unique? True

Invalid Age entries:
   ID       Name  Age Grade                  Email
5   6  Eva Green  101     B  eva.green@example.com
9  10  Sarah Lee    4     B  sarah.lee@example.com

Invalid Grade entries:
   ID       Name  Age Grade                  Email
7   8  Chris Red   16     G  chris.red@example.com

Invalid Name entries:
   ID     Name  Age Grade                Email  Valid_Name
6   7  Mike123   19     A  mike123@example.com       False

--- Summary ---
Unique IDs: True
Unique Emails: True
Unique ID+Email Combo: True
Invalid Ages Count: 2
Invalid Grades Count: 1
Invalid Names Count: 1
