## Detect Schema Mismatches in Data Pipelines
**Objective**: Identify and resolve schema mismatches that commonly occur in data pipelines.

**Task**: Column Name Mismatch

**Steps**:
1. Load the source DataFrame with the below schema:
    - id : Integer
    - name : String
    - age : Integer
2. Load the target DataFrame with the below schema:
    - id : Integer
    - fullname : String
    - age : Integer
3. Use a schema comparison tool or write a simple function to detect mismatches in column names.
4. Resolve the mismatch by renaming the `fullname` column in the target DataFrame to `name` .

In [2]:
import pandas as pd

# Step 1: Load source DataFrame
source_df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35]
})

# Step 2: Load target DataFrame
target_df = pd.DataFrame({
    'id': [4, 5, 6],
    'fullname': ['David', 'Eva', 'Frank'],
    'age': [40, 45, 50]
})

# Step 3: Detect column name mismatches
source_columns = set(source_df.columns)
target_columns = set(target_df.columns)

mismatches = source_columns.symmetric_difference(target_columns)
print("Column mismatches:", mismatches)

# Step 4: Resolve mismatch by renaming 'fullname' to 'name' in target DataFrame
if 'fullname' in target_df.columns:
    target_df.rename(columns={'fullname': 'name'}, inplace=True)

# Verify final column 
print("\nSource columns:", source_df.columns.tolist())
print("Target columns:", target_df.columns.tolist())


Column mismatches: {'fullname', 'name'}

Source columns: ['id', 'name', 'age']
Target columns: ['id', 'name', 'age']
