## Check Uniqueness & Validity

**Objective**: Evaluate data quality by checking for uniqueness and validity of data entries.

For this activity, you will use a sample dataset students.csv that contains the following
columns: ID , Name , Age , Grade , Email .

**Steps**:
1. Check Uniqueness
    - Unique IDs
    - Unique Email Addresses
    - Unique Combination

2. Check Validity
    - Validate Age Range
    - Validate Grade Scale
    - Validate Name Format

In [2]:
# Write your code from here
import pandas as pd
import re

# Sample data - replace with: df = pd.read_csv('students.csv')
data = {
    'ID': [101, 102, 103, 104, 105],
    'Name': ['Alice Smith', 'Bob Johnson', 'Charlie Rose', 'David', 'Eva Green'],
    'Age': [20, 19, 21, 25, 122],  # Note: Eva's age invalid (122)
    'Grade': [88.5, 92.0, 85.0, 90.0, 105.0],  # Grade 105 invalid
    'Email': [
        'alice@example.com',
        'bob@example.com',
        'charlie@example.com',
        'david@example.com',
        'alice@example.com'  # Duplicate email for Eva
    ]
}

df = pd.DataFrame(data)

print("=== Uniqueness Checks ===")

# 1. Unique IDs
unique_ids = df['ID'].is_unique
print(f"Are all IDs unique? {unique_ids}")

# 2. Unique Email Addresses (ignoring missing)
emails_no_null = df['Email'].dropna()
unique_emails = emails_no_null.is_unique
print(f"Are all Emails unique? {unique_emails}")

# 3. Unique Combination of Name + Email
combo_unique = df.duplicated(subset=['Name', 'Email'], keep=False)
if combo_unique.any():
    print("Non-unique (Name, Email) combinations found:")
    print(df[combo_unique][['Name', 'Email']])
else:
    print("All (Name, Email) combinations are unique.")

print("\n=== Validity Checks ===")

# Age between 0 and 120
age_validity = df['Age'].between(0, 120)
print("Age validity per row:")
print(age_validity)

# Grade between 0 and 100
grade_validity = df['Grade'].between(0, 100)
print("\nGrade validity per row:")
print(grade_validity)

# Name format validation (only letters and spaces)
name_pattern = re.compile(r'^[A-Za-z ]+$')
name_validity = df['Name'].apply(lambda x: bool(name_pattern.match(x)))
print("\nName format validity per row:")
print(name_validity)

# Rows failing any validity check
invalid_rows = df[
    ~(age_validity & grade_validity & name_validity)
]

if not invalid_rows.empty:
    print("\nRows failing validity checks:")
    print(invalid_rows)
else:
    print("\nAll rows passed validity checks.")


=== Uniqueness Checks ===
Are all IDs unique? True
Are all Emails unique? False
All (Name, Email) combinations are unique.

=== Validity Checks ===
Age validity per row:
0     True
1     True
2     True
3     True
4    False
Name: Age, dtype: bool

Grade validity per row:
0     True
1     True
2     True
3     True
4    False
Name: Grade, dtype: bool

Name format validity per row:
0    True
1    True
2    True
3    True
4    True
Name: Name, dtype: bool

Rows failing validity checks:
    ID       Name  Age  Grade              Email
4  105  Eva Green  122  105.0  alice@example.com
