## SCD Type 3 – Limited History Tracking (Previous Value Only)

**Definition:**  
SCD Type 3 tracks historical data in a limited way by **storing the previous value of a selected attribute** in a separate column. This way, only the **current value** and **one previous value** are kept in the same row.

### Use Case:
This method is useful when you only care about **one level of change**, for example:
- Previous and current city of a customer
- Last and current department of an employee

### Business Rule:
- If a new record has a different value in the tracked column:
  - Move the current value into the `Previous_<Column>` field
  - Overwrite the `Current_<Column>` with the new value
- If the record is new, insert it with `Previous_<Column> = NULL`

### Example Columns (for City tracking):
- `CustomerID`
- `Name`
- `CurrentCity`
- `PreviousCity`
- `Email`
- `LastUpdated`

### Technique:
1. Load the current SCD Type 3 table (or transform the base table to SCD3 structure).
2. Load the new incoming dataset.
3. For existing customers:
   - If the tracked column (e.g., `City`) has changed:
     - Move current `City` to `PreviousCity`
     - Set new value as `CurrentCity`
     - Update `LastUpdated`
   - If not changed, do nothing.
4. For new customers:
   - Insert the record with `PreviousCity = NULL` and `CurrentCity = incoming value`

### Summary:
- New records: **Inserted**
- Changed records: **Update current + shift old to previous**
- Unchanged: **No action**
- Only **one previous value is retained**
- No separate rows; all history is within a single row


In [43]:
import urllib
from sqlalchemy import create_engine
import pandas as pd
from datetime import datetime

server='DESKTOP-HJVSCEN\MSSQLSERVER1'
database='Python ETL'
username='sa'
password='Ka@12345678'


ConnectionString = f"""
    DRIVER={{ODBC Driver 18 for SQL Server}};
    SERVER={server};
    DATABASE={database};
    UID={username};
    PWD={password};
    TrustServerCertificate=yes;
"""
# URL-encode the connection string for SQLAlchemy
params=urllib.parse.quote_plus(ConnectionString)

engine=create_engine(f"mssql+pyodbc:///?odbc_connect={params}")

In [44]:
base_df=pd.read_sql_table('customers_base',con=engine)
base_df=base_df.sort_values(by='customerid').reset_index(drop=True)
scd3_df=base_df.copy()
scd3_df.rename(columns={'city':'currentcity'},inplace=True)
scd3_df['previouscity']=None
scd3_df=scd3_df[['customerid', 'name', 'currentcity','previouscity', 'email', 'lastupdated']]
scd3_df

scd3_df.to_sql('scd3_customers',con=engine,index=False,if_exists='replace')

8

In [45]:
scd3_df=pd.read_sql_table('scd3_customers',con=engine)
scd3_df

Unnamed: 0,customerid,name,currentcity,previouscity,email,lastupdated
0,101,Tanuj,Hyderabad,,rangatanuj@gmail.com,2025-01-20
1,102,Meenu,Hyderabad,,meenu@gmail.com,2025-02-22
2,103,John,Pune,,john@gmail.com,2025-03-24
3,104,Smrithi,Mumbai,,smrithi@gmail.com,2025-04-26
4,105,Chiru,Banglore,,chiru@gmail.com,2025-05-28
5,106,Jaaaanu,Delhi,,jaaanu@gmail.com,2025-06-24
6,107,Ravi,Delhi,,ravi@gmail.com,2025-06-20
7,108,Jack,Delhi,,jack@gmail.com,2025-06-20


In [46]:
incoming_df = pd.DataFrame([
    {"customerid": 101, "name": "Tanuj", "city": "Bangalore", "email": "tanuj@gmail.com", "lastupdated": datetime(2025, 6, 20)},  # Changed City
    {"customerid": 102, "name": "Meenu", "city": "Hyderabad", "email": "meenu@gmail.com", "lastupdated": datetime(2025, 6, 20)}, # No change
    {"customerid": 104, "name": "Smrithi", "city": "Chennai", "email": "smrithi@gmail.com", "lastupdated": datetime(2025, 6, 20)}, # Changed City
    {"customerid": 111, "name": "Riya", "city": "Nagpur", "email": "riya@gmail.com", "lastupdated": datetime(2025, 6, 20)}        # New Record
])

In [47]:
existing_ids=scd3_df['customerid'].unique()
incoming_ids=incoming_df['customerid'].unique()
new_ids=set(incoming_ids)-set(existing_ids)
print(new_ids)
new_df=incoming_df[incoming_df['customerid'].isin(new_ids)]
new_df['currentcity']=new_df['city']
new_df['previouscity']=None
new_df=new_df[['customerid','name','currentcity','previouscity','email','lastupdated']].copy()
new_df

{np.int64(111)}


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['currentcity']=new_df['city']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['previouscity']=None


Unnamed: 0,customerid,name,currentcity,previouscity,email,lastupdated
3,111,Riya,Nagpur,,riya@gmail.com,2025-06-20


**common rows we feetched from existing and incoming records**

In [48]:
existing_df=pd.merge(scd3_df,incoming_df,on='customerid',suffixes=('_old',''))
existing_df

Unnamed: 0,customerid,name_old,currentcity,previouscity,email_old,lastupdated_old,name,city,email,lastupdated
0,101,Tanuj,Hyderabad,,rangatanuj@gmail.com,2025-01-20,Tanuj,Bangalore,tanuj@gmail.com,2025-06-20
1,102,Meenu,Hyderabad,,meenu@gmail.com,2025-02-22,Meenu,Hyderabad,meenu@gmail.com,2025-06-20
2,104,Smrithi,Mumbai,,smrithi@gmail.com,2025-04-26,Smrithi,Chennai,smrithi@gmail.com,2025-06-20


get only the changed ones from the common rows we fetched from existing and incoming records

In [49]:
changed_df=existing_df[
    existing_df['currentcity']!=existing_df['city']
].copy()
changed_df

Unnamed: 0,customerid,name_old,currentcity,previouscity,email_old,lastupdated_old,name,city,email,lastupdated
0,101,Tanuj,Hyderabad,,rangatanuj@gmail.com,2025-01-20,Tanuj,Bangalore,tanuj@gmail.com,2025-06-20
2,104,Smrithi,Mumbai,,smrithi@gmail.com,2025-04-26,Smrithi,Chennai,smrithi@gmail.com,2025-06-20


In [50]:
# Create a lookup dictionary from changed_df
change_dict=changed_df.set_index('customerid')[['city','currentcity','lastupdated']].to_dict('index')

# Define a function that applies updates to each matching row
def update_scd3_row(row):
    cust_id=row['customerid']
    if cust_id in change_dict:
        row['previouscity']=row['currentcity']
        row['currentcity']=change_dict[cust_id]['city']
        row['lastupdated']=change_dict[cust_id]['lastupdated']
    return row

# Apply the update function row-wise to df_scd3
scd3_df=scd3_df.apply(update_scd3_row,axis=1)
scd3_df

Unnamed: 0,customerid,name,currentcity,previouscity,email,lastupdated
0,101,Tanuj,Bangalore,Hyderabad,rangatanuj@gmail.com,2025-06-20
1,102,Meenu,Hyderabad,,meenu@gmail.com,2025-02-22
2,103,John,Pune,,john@gmail.com,2025-03-24
3,104,Smrithi,Chennai,Mumbai,smrithi@gmail.com,2025-06-20
4,105,Chiru,Banglore,,chiru@gmail.com,2025-05-28
5,106,Jaaaanu,Delhi,,jaaanu@gmail.com,2025-06-24
6,107,Ravi,Delhi,,ravi@gmail.com,2025-06-20
7,108,Jack,Delhi,,jack@gmail.com,2025-06-20


In [51]:
scd3_final_df=pd.concat([scd3_df,new_df],ignore_index=True)
scd3_final_df

Unnamed: 0,customerid,name,currentcity,previouscity,email,lastupdated
0,101,Tanuj,Bangalore,Hyderabad,rangatanuj@gmail.com,2025-06-20
1,102,Meenu,Hyderabad,,meenu@gmail.com,2025-02-22
2,103,John,Pune,,john@gmail.com,2025-03-24
3,104,Smrithi,Chennai,Mumbai,smrithi@gmail.com,2025-06-20
4,105,Chiru,Banglore,,chiru@gmail.com,2025-05-28
5,106,Jaaaanu,Delhi,,jaaanu@gmail.com,2025-06-24
6,107,Ravi,Delhi,,ravi@gmail.com,2025-06-20
7,108,Jack,Delhi,,jack@gmail.com,2025-06-20
8,111,Riya,Nagpur,,riya@gmail.com,2025-06-20


In [52]:
scd3_final_df.to_sql('scd3_customers',con=engine,index=False,if_exists='replace')

9

In [53]:
df=pd.read_sql_table('scd3_customers',con=engine)
df

Unnamed: 0,customerid,name,currentcity,previouscity,email,lastupdated
0,101,Tanuj,Bangalore,Hyderabad,rangatanuj@gmail.com,2025-06-20
1,102,Meenu,Hyderabad,,meenu@gmail.com,2025-02-22
2,103,John,Pune,,john@gmail.com,2025-03-24
3,104,Smrithi,Chennai,Mumbai,smrithi@gmail.com,2025-06-20
4,105,Chiru,Banglore,,chiru@gmail.com,2025-05-28
5,106,Jaaaanu,Delhi,,jaaanu@gmail.com,2025-06-24
6,107,Ravi,Delhi,,ravi@gmail.com,2025-06-20
7,108,Jack,Delhi,,jack@gmail.com,2025-06-20
8,111,Riya,Nagpur,,riya@gmail.com,2025-06-20
