## Detect Schema Mismatches in Data Pipelines
**Objective**: Identify and resolve schema mismatches that commonly occur in data pipelines.

**Task**: Missing Column

1. Load the source DataFrame with the below schema:
    - id : Integer
    - email : String
    - signup_date : Date
2. Load the target DataFrame with the below schema:
    - id : Integer
    - email : String
3. Implement a check to identify any columns that are present in the source DataFrame but missing in the target.
4. Add the missing `signup_date` column to the target DataFrame.

In [2]:
import pandas as pd
import numpy as np

# Step 1: Load source DataFrame
df_source = pd.DataFrame({
    'id': [1, 2, 3],
    'email': ['a@example.com', 'b@example.com', 'c@example.com'],
    'signup_date': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-01-03'])
})

# Step 2: Load target DataFrame (missing 'signup_date')
df_target = pd.DataFrame({
    'id': [1, 2, 3],
    'email': ['a@example.com', 'b@example.com', 'c@example.com']
})

# Step 3: Function to identify missing columns
def find_missing_columns(source_df, target_df):
    return [col for col in source_df.columns if col not in target_df.columns]

# Step 4: Function to add missing columns with NaN values or default values
def reconcile_missing_columns(source_df, target_df):
    if source_df.empty or target_df.empty:
        raise ValueError("One of the DataFrames is empty. Cannot proceed with schema reconciliation.")

    missing_cols = find_missing_columns(source_df, target_df)
    print(f"Missing columns in target: {missing_cols}")
    
    for col in missing_cols:
        target_df[col] = np.nan  # or use source_df[col].dtype-compatible default
        print(f"Added missing column '{col}' with NaN values.")
    
    return target_df

# Step 5: Reconcile schema
df_target = reconcile_missing_columns(df_source, df_target)

# Final Output
print("\n✅ Final Reconciled Target DataFrame:")
print(df_target)

Missing columns in target: ['signup_date']
Added missing column 'signup_date' with NaN values.

✅ Final Reconciled Target DataFrame:
   id          email  signup_date
0   1  a@example.com          NaN
1   2  b@example.com          NaN
2   3  c@example.com          NaN
