## Detect Schema Mismatches in Data Pipelines
**Objective**: Identify and resolve schema mismatches that commonly occur in data pipelines.

**Task**: Column Name Mismatch

**Steps**:
1. Load the source DataFrame with the below schema:
    - id : Integer
    - name : String
    - age : Integer
2. Load the target DataFrame with the below schema:
    - id : Integer
    - fullname : String
    - age : Integer
3. Use a schema comparison tool or write a simple function to detect mismatches in column names.
4. Resolve the mismatch by renaming the `fullname` column in the target DataFrame to `name` .

In [1]:
import pandas as pd

# Step 1: Create source DataFrame with schema: id, name, age
source_data = {
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35]
}
source_df = pd.DataFrame(source_data)

# Step 2: Create target DataFrame with schema: id, fullname, age
target_data = {
    'id': [1, 2, 3],
    'fullname': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35]
}
target_df = pd.DataFrame(target_data)

# Step 3: Detect mismatches in column names
source_cols = set(source_df.columns)
target_cols = set(target_df.columns)

mismatched_cols = source_cols.symmetric_difference(target_cols)
print("Mismatched columns between source and target:", mismatched_cols)

# Step 4: Resolve mismatch by renaming 'fullname' to 'name' in target DataFrame
if 'fullname' in target_df.columns:
    target_df.rename(columns={'fullname': 'name'}, inplace=True)

print("\nTarget DataFrame columns after renaming:")
print(target_df.columns)

# Optional: Verify if schemas now match
if set(source_df.columns) == set(target_df.columns):
    print("\nSchemas match after renaming.")
else:
    print("\nSchemas still mismatch.")


Mismatched columns between source and target: {'fullname', 'name'}

Target DataFrame columns after renaming:
Index(['id', 'name', 'age'], dtype='object')

Schemas match after renaming.
