## Detect Schema Mismatches in Data Pipelines
**Objective**: Identify and resolve schema mismatches that commonly occur in data pipelines.

**Task**: Missing Column

1. Load the source DataFrame with the below schema:
    - id : Integer
    - email : String
    - signup_date : Date
2. Load the target DataFrame with the below schema:
    - id : Integer
    - email : String
3. Implement a check to identify any columns that are present in the source DataFrame but missing in the target.
4. Add the missing `signup_date` column to the target DataFrame.

In [2]:
import pandas as pd
import numpy as np

# Step 1: Load source
source_df = pd.DataFrame({
    'id': [1, 2, 3],
    'email': ['a@example.com', 'b@example.com', 'c@example.com'],
    'signup_date': pd.to_datetime(['2023-01-01', '2023-02-01', '2023-03-01'])
})

# Step 2: Load target DataFrame
target_df = pd.DataFrame({
    'id': [4, 5, 6],
    'email': ['d@example.com', 'e@example.com', 'f@example.com']
})

# Step 3: Identify missing columns
missing_columns = set(source_df.columns) - set(target_df.columns)

# Step 4: Add missing columns to target DataFrame with NaN or default values
for col in missing_columns:
    target_df[col] = np.nan
    if source_df[col].dtype == 'datetime64[ns]':
        target_df[col] = pd.to_datetime(target_df[col])

# Output the aligned 
print("Source DataFrame:\n", source_df)
print("\nTarget DataFrame after column alignment:\n", target_df)


Source DataFrame:
    id          email signup_date
0   1  a@example.com  2023-01-01
1   2  b@example.com  2023-02-01
2   3  c@example.com  2023-03-01

Target DataFrame after column alignment:
    id          email signup_date
0   4  d@example.com         NaT
1   5  e@example.com         NaT
2   6  f@example.com         NaT
