## Check Uniqueness & Validity

**Objective**: Evaluate data quality by checking for uniqueness and validity of data entries.

For this activity, you will use a sample dataset students.csv that contains the following
columns: ID , Name , Age , Grade , Email .

**Steps**:
1. Check Uniqueness
    - Unique IDs
    - Unique Email Addresses
    - Unique Combination

2. Check Validity
    - Validate Age Range
    - Validate Grade Scale
    - Validate Name Format

In [1]:
# Write your code from here
import pandas as pd
import re

data = {
    'ID': [1, 2, 3, 4, 5, 6, 1],  
    'Name': ['John Doe', 'Jane Smith', 'Bob Johnson', 'Alice Brown', 'Tom Wilson', 'Sarah Connor', 'Duplicate John'],
    'Age': [20, 21, 19, 22, 23, 150, 20],  
    'Grade': ['A', 'B', 'C', 'D', 'F', 'X', 'A'], 
    'Email': ['john@example.com', 'jane@example.com', 'bob@example.com', 
              'alice@example.com', 'tom@example.com', 'sarah@example.com', 'john@example.com']  
}
df = pd.DataFrame(data)
def check_uniqueness(df):
    print("\n=== UNIQUENESS CHECKS ===")
    duplicate_ids = df[df.duplicated('ID', keep=False)]
    print("\nDuplicate IDs:")
    print(duplicate_ids[['ID', 'Name']] if not duplicate_ids.empty else "No duplicate IDs found")
    duplicate_emails = df[df.duplicated('Email', keep=False)]
    print("\nDuplicate Emails:")
    print(duplicate_emails[['Email', 'Name']] if not duplicate_emails.empty else "No duplicate emails found")
    duplicate_combos = df[df.duplicated(['Name', 'Email'], keep=False)]
    print("\nDuplicate Name-Email Combinations:")
    print(duplicate_combos[['Name', 'Email']] if not duplicate_combos.empty else "No duplicate combinations found")
def check_validity(df):
    print("\n=== VALIDITY CHECKS ===")
    invalid_ages = df[(df['Age'] < 15) | (df['Age'] > 30)]
    print("\nInvalid Ages (outside 15-30 range):")
    print(invalid_ages[['ID', 'Name', 'Age']] if not invalid_ages.empty else "All ages valid")
    valid_grades = ['A', 'B', 'C', 'D', 'F']
    invalid_grades = df[~df['Grade'].isin(valid_grades)]
    print("\nInvalid Grades (not A-F):")
    print(invalid_grades[['ID', 'Name', 'Grade']] if not invalid_grades.empty else "All grades valid")
    invalid_names = df[~df['Name'].str.contains(r'^\w+\s+\w+', na=True)]
    print("\nInvalid Name Formats (missing last name):")
    print(invalid_names[['ID', 'Name']] if not invalid_names.empty else "All names valid")
check_uniqueness(df)
check_validity(df)


=== UNIQUENESS CHECKS ===

Duplicate IDs:
   ID            Name
0   1        John Doe
6   1  Duplicate John

Duplicate Emails:
              Email            Name
0  john@example.com        John Doe
6  john@example.com  Duplicate John

Duplicate Name-Email Combinations:
No duplicate combinations found

=== VALIDITY CHECKS ===

Invalid Ages (outside 15-30 range):
   ID          Name  Age
5   6  Sarah Connor  150

Invalid Grades (not A-F):
   ID          Name Grade
5   6  Sarah Connor     X

Invalid Name Formats (missing last name):
All names valid
