## Detect Schema Mismatches in Data Pipelines
**Objective**: Identify and resolve schema mismatches that commonly occur in data pipelines.

**Task**: Column Name Mismatch

**Steps**:
1. Load the source DataFrame with the below schema:
    - id : Integer
    - name : String
    - age : Integer
2. Load the target DataFrame with the below schema:
    - id : Integer
    - fullname : String
    - age : Integer
3. Use a schema comparison tool or write a simple function to detect mismatches in column names.
4. Resolve the mismatch by renaming the `fullname` column in the target DataFrame to `name` .

In [None]:
# write your code from here
import pandas as pd

# Step 1: Load the source DataFrame
source_df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35]
})

# Step 2: Load the target DataFrame with a mismatched column name
target_df = pd.DataFrame({
    'id': [4, 5, 6],
    'fullname': ['David', 'Eve', 'Frank'],
    'age': [28, 32, 38]
})

# Step 3: Detect column name mismatches
def detect_column_mismatches(df1, df2):
    cols_df1 = set(df1.columns)
    cols_df2 = set(df2.columns)
    only_in_df1 = cols_df1 - cols_df2
    only_in_df2 = cols_df2 - cols_df1
    return only_in_df1, only_in_df2

source_only, target_only = detect_column_mismatches(source_df, target_df)

print("Columns only in source:", source_only)
print("Columns only in target:", target_only)

# Step 4: Resolve the mismatch
# Assuming 'fullname' should be renamed to 'name'
if 'fullname' in target_df.columns:
    target_df.rename(columns={'fullname': 'name'}, inplace=True)

# Step 5: Verify schema alignment
updated_source_only, updated_target_only = detect_column_mismatches(source_df, target_df)

print("\nAfter resolving:")
print("Columns only in source:", updated_source_only)
print("Columns only in target:", updated_target_only)

# Step 6: Combine or align the DataFrames (optional)
combined_df = pd.concat([source_df, target_df], ignore_index=True)
print("\nCombined DataFrame:")
print(combined_df)


Columns only in source: {'name'}
Columns only in target: {'fullname'}

After resolving:
Columns only in source: set()
Columns only in target: set()

Combined DataFrame:
   id     name  age
0   1    Alice   25
1   2      Bob   30
2   3  Charlie   35
3   4    David   28
4   5      Eve   32
5   6    Frank   38
