## Detect Schema Mismatches in Data Pipelines
**Objective**: Identify and resolve schema mismatches that commonly occur in data pipelines.

**Task**: Missing Column

1. Load the source DataFrame with the below schema:
    - id : Integer
    - email : String
    - signup_date : Date
2. Load the target DataFrame with the below schema:
    - id : Integer
    - email : String
3. Implement a check to identify any columns that are present in the source DataFrame but missing in the target.
4. Add the missing `signup_date` column to the target DataFrame.

In [1]:
# write your code from here
import pandas as pd

# Step 1: Load source DataFrame
source_df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35]
})

# Step 2: Load target DataFrame
target_df = pd.DataFrame({
    'id': [1, 2, 3],
    'fullname': ['Alice A.', 'Bob B.', 'Charlie C.'],
    'age': [25, 30, 35]
})

# Step 3: Function to detect column name mismatches
def detect_column_mismatches(source_df, target_df):
    source_cols = set(source_df.columns)
    target_cols = set(target_df.columns)

    missing_in_target = source_cols - target_cols
    extra_in_target = target_cols - source_cols

    print(f"Columns in source but missing in target: {missing_in_target}")
    print(f"Columns in target but missing in source: {extra_in_target}")

    return missing_in_target, extra_in_target

# Detect mismatches
missing_cols, extra_cols = detect_column_mismatches(source_df, target_df)

# Step 4: Resolve mismatch by renaming 'fullname' to 'name' in target_df
if 'fullname' in target_df.columns:
    target_df.rename(columns={'fullname': 'name'}, inplace=True)
    print("\nRenamed 'fullname' to 'name' in target DataFrame.")

# Verify after renaming
print("\nColumns in target DataFrame after renaming:")
print(target_df.columns.tolist())

# Optional: Re-run mismatch detection after fix
detect_column_mismatches(source_df, target_df)


Columns in source but missing in target: {'name'}
Columns in target but missing in source: {'fullname'}

Renamed 'fullname' to 'name' in target DataFrame.

Columns in target DataFrame after renaming:
['id', 'name', 'age']
Columns in source but missing in target: set()
Columns in target but missing in source: set()


(set(), set())