**Task 1**: Checking Null Values for Completeness

**Description**: Verify if there are any null values in a dataset, which indicate incomplete data.

In [2]:
import pandas as pd

# ----------------------------
# Part 1: Detect & Handle Missing Data
# ----------------------------

# Sample dataset
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Age": [25, None, 30, None, 22],
    "Salary": [50000, 60000, None, 70000, None]
}

# Task 1: Detect Missing Data
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Detect missing values
print("\nMissing Data (True indicates missing):")
print(df.isnull())

# Count of missing values per column
print("\nMissing Value Count per Column:")
print(df.isnull().sum())

Original DataFrame:
      Name   Age   Salary
0    Alice  25.0  50000.0
1      Bob   NaN  60000.0
2  Charlie  30.0      NaN
3    David   NaN  70000.0
4      Eva  22.0      NaN

Missing Data (True indicates missing):
    Name    Age  Salary
0  False  False   False
1  False   True   False
2  False  False    True
3  False   True   False
4  False  False    True

Missing Value Count per Column:
Name      0
Age       2
Salary    2
dtype: int64


**Task 2**: Checking Data Type Validity

**Description**: Ensure that columns contain data of expected types, e.g., ages are integers.

In [3]:
# Drop rows with any missing values
df_dropped = df.dropna()
print("\nDataFrame after Dropping Rows with Missing Values:")
print(df_dropped)


DataFrame after Dropping Rows with Missing Values:
    Name   Age   Salary
0  Alice  25.0  50000.0


**Task 3**: Verify Uniqueness of Identifiers

**Description**: Check if a dataset has unique identifiers (e.g., emails).

In [4]:
# Fill missing Age with mean
age_mean = df["Age"].mean()
df["Age"] = df["Age"].fillna(age_mean)

# Fill missing Salary with mean
salary_mean = df["Salary"].mean()
df["Salary"] = df["Salary"].fillna(salary_mean)

print("\nDataFrame after Imputation (Filling Missing Values):")
print(df)


DataFrame after Imputation (Filling Missing Values):
      Name        Age   Salary
0    Alice  25.000000  50000.0
1      Bob  25.666667  60000.0
2  Charlie  30.000000  60000.0
3    David  25.666667  70000.0
4      Eva  22.000000  60000.0


Task 4: Validate Email Format Using Regex

Description: Validate if email addresses in a dataset have the correct format.

In [None]:
# Write your code from here

Task 5: Check for Logical Age Validity

Description: Ensure ages are within a reasonable human range (e.g., 0-120).

In [None]:
# Write your code from here

Task 6: Identify and Handle Missing Data

Description: Identify missing values in a dataset and impute them using a simple strategy (e.g., mean).

In [None]:
# Write your code from here

Task 7: Detect Duplicates

Description: Detect duplicate rows in the dataset.

In [None]:
# Write your code from here

Task 8: Validate Correctness of Numerical Values

Description: Ensure numerical columns are within a specified range.

In [None]:
# Write your code from here

Task 9: Custom Completeness Rule Violation Report

Description: Create a report showing which rows violate specific completeness rules, such as mandatory fields being empty.

In [None]:
# Write your code from here

Task 10: Advanced Regex for Data Validity Check

Description: Check for validity with advanced regex patterns, such as validating complex fields with multi-level rules.

In [None]:
# Write your code from here