## SCD Type 4 – History Table (Separate Table for Historical Data)

**Definition:**  
SCD Type 4 maintains a **current table** with only the latest data, and a **separate history table** where each prior version of a record is inserted when changes occur.

### Use Case:
Use SCD Type 4 when:
- You want to keep the current table lightweight for performance
- You want full history but stored separately (e.g., for audit, reports)
- You want to avoid multiple rows per record in the main dimension

### Table Structure:
1. **Current Table (`customers_base`)**  
   Stores only the latest version of each record
2. **History Table (`customers_history`)**  
   Stores old versions when a change is detected, with:
   - `CustomerID`
   - `Name`
   - `City`
   - `Email`
   - `StartDate` (when this version became active)
   - `EndDate` (when this version was replaced)
   - `ChangeCapturedDate` (when it was moved to history)

### Logic:
1. Load the current table (`customers_base`)
2. Load the incoming dataset
3. For existing records:
   - If fields have changed:
     - Copy the current version into `customers_history`, with `EndDate = today`
     - Overwrite the record in `customers_base` with the new data
4. For new records:
   - Simply insert into `customers_base`

### Summary:
- Main table contains **only current data**
- Separate table keeps **complete history**
- Avoids multiple rows in main table (unlike SCD 2)


In [2]:
import urllib
from sqlalchemy import create_engine
import pandas as pd
from datetime import datetime

server='DESKTOP-HJVSCEN\MSSQLSERVER1'
database='Python ETL'
username='sa'
password='Ka@12345678'


ConnectionString = f"""
    DRIVER={{ODBC Driver 18 for SQL Server}};
    SERVER={server};
    DATABASE={database};
    UID={username};
    PWD={password};
    TrustServerCertificate=yes;
"""
# URL-encode the connection string for SQLAlchemy
params=urllib.parse.quote_plus(ConnectionString)

engine=create_engine(f"mssql+pyodbc:///?odbc_connect={params}")

In [8]:
base_df=pd.read_sql_table('customers_base',con=engine)
base_df=base_df.sort_values(by='customerid').reset_index(drop=True)
base_df

Unnamed: 0,customerid,name,city,email,lastupdated
0,101,Tanuj,Hyderabad,rangatanuj@gmail.com,2025-01-20
1,102,Meenu,Hyderabad,meenu@gmail.com,2025-02-22
2,103,John,Pune,john@gmail.com,2025-03-24
3,104,Smrithi,Mumbai,smrithi@gmail.com,2025-04-26
4,105,Chiru,Banglore,chiru@gmail.com,2025-05-28
5,106,Jaaaanu,Delhi,jaaanu@gmail.com,2025-06-24
6,107,Ravi,Delhi,ravi@gmail.com,2025-06-20
7,108,Jack,Delhi,jack@gmail.com,2025-06-20


In [9]:
base_df.to_sql('scd4_customers',con=engine,index=False,if_exists='replace')

8

In [22]:
scd4_df=pd.read_sql_table('scd4_customers',con=engine)
scd4_df

Unnamed: 0,customerid,name,city,email,lastupdated
0,101,Tanuj,Hyderabad,rangatanuj@gmail.com,2025-01-20
1,102,Meenu,Hyderabad,meenu@gmail.com,2025-02-22
2,103,John,Pune,john@gmail.com,2025-03-24
3,104,Smrithi,Mumbai,smrithi@gmail.com,2025-04-26
4,105,Chiru,Banglore,chiru@gmail.com,2025-05-28
5,106,Jaaaanu,Delhi,jaaanu@gmail.com,2025-06-24
6,107,Ravi,Delhi,ravi@gmail.com,2025-06-20
7,108,Jack,Delhi,jack@gmail.com,2025-06-20


In [23]:
incoming_df = pd.DataFrame([
    {"customerid": 101, "name": "Tanuj", "city": "Bangalore", "email": "tanuj.new@gmail.com", "lastupdated": datetime(2025, 6, 24)},  # Changed City & Email
    {"customerid": 102, "name": "Meenu", "city": "Hyderabad", "email": "meenu@gmail.com", "lastupdated": datetime(2025, 6, 24)},     # No change
    {"customerid": 104, "name": "Smrithi", "city": "Chennai", "email": "smrithi@gmail.com", "lastupdated": datetime(2025, 6, 24)},   # Changed City
    {"customerid": 109, "name": "Aman", "city": "Delhi", "email": "aman@gmail.com", "lastupdated": datetime(2025, 6, 24)}            # New
])
incoming_df

Unnamed: 0,customerid,name,city,email,lastupdated
0,101,Tanuj,Bangalore,tanuj.new@gmail.com,2025-06-24
1,102,Meenu,Hyderabad,meenu@gmail.com,2025-06-24
2,104,Smrithi,Chennai,smrithi@gmail.com,2025-06-24
3,109,Aman,Delhi,aman@gmail.com,2025-06-24


In [24]:
existing_df=pd.merge(scd3_df,incoming_df,how='inner',on='customerid',suffixes=('_old',''))
existing_df

Unnamed: 0,customerid,name_old,city_old,email_old,lastupdated_old,name,city,email,lastupdated
0,101,Tanuj,Hyderabad,rangatanuj@gmail.com,2025-01-20,Tanuj,Bangalore,tanuj.new@gmail.com,2025-06-24
1,102,Meenu,Hyderabad,meenu@gmail.com,2025-02-22,Meenu,Hyderabad,meenu@gmail.com,2025-06-24
2,104,Smrithi,Mumbai,smrithi@gmail.com,2025-04-26,Smrithi,Chennai,smrithi@gmail.com,2025-06-24


In [25]:
changed_df=existing_df[
    (existing_df['name_old']!=existing_df['name']) | 
    (existing_df['city_old']!=existing_df['city']) | 
    (existing_df['email_old']!=existing_df['email'])
]
changed_df

Unnamed: 0,customerid,name_old,city_old,email_old,lastupdated_old,name,city,email,lastupdated
0,101,Tanuj,Hyderabad,rangatanuj@gmail.com,2025-01-20,Tanuj,Bangalore,tanuj.new@gmail.com,2025-06-24
2,104,Smrithi,Mumbai,smrithi@gmail.com,2025-04-26,Smrithi,Chennai,smrithi@gmail.com,2025-06-24


In [26]:
scd4_history=changed_df[['customerid','name_old','city_old','email_old','lastupdated_old']].copy()
scd4_history.rename(columns={
    'name_old':'name',
    'city_old':'city',
    'email_old':'email',
    'lastupdated_old':'start_date'
},inplace=True)
today=pd.to_datetime('today').normalize()
scd4_history['end_date']=today
scd4_history.reset_index(drop=True)


Unnamed: 0,customerid,name,city,email,start_date,end_date
0,101,Tanuj,Hyderabad,rangatanuj@gmail.com,2025-01-20,2025-06-24
1,104,Smrithi,Mumbai,smrithi@gmail.com,2025-04-26,2025-06-24


In [27]:
scd4_history.to_sql('scd4_customer_history',con=engine,index=False,if_exists='replace')

2

In [28]:
unchanged_df=scd4_df[~scd4_df['customerid'].isin(changed_df['customerid'])]
unchanged_df

Unnamed: 0,customerid,name,city,email,lastupdated
1,102,Meenu,Hyderabad,meenu@gmail.com,2025-02-22
2,103,John,Pune,john@gmail.com,2025-03-24
4,105,Chiru,Banglore,chiru@gmail.com,2025-05-28
5,106,Jaaaanu,Delhi,jaaanu@gmail.com,2025-06-24
6,107,Ravi,Delhi,ravi@gmail.com,2025-06-20
7,108,Jack,Delhi,jack@gmail.com,2025-06-20


In [30]:
new_changed_vals_df=incoming_df[incoming_df['customerid'].isin(changed_df['customerid'])]
new_changed_vals_df

Unnamed: 0,customerid,name,city,email,lastupdated
0,101,Tanuj,Bangalore,tanuj.new@gmail.com,2025-06-24
2,104,Smrithi,Chennai,smrithi@gmail.com,2025-06-24


In [33]:
old_val_updated_df=pd.concat([unchanged_df,new_changed_vals_df],ignore_index=True)
old_val_updated_df=old_val_updated_df.sort_values(by='customerid').reset_index(drop=True)
old_val_updated_df

Unnamed: 0,customerid,name,city,email,lastupdated
0,101,Tanuj,Bangalore,tanuj.new@gmail.com,2025-06-24
1,102,Meenu,Hyderabad,meenu@gmail.com,2025-02-22
2,103,John,Pune,john@gmail.com,2025-03-24
3,104,Smrithi,Chennai,smrithi@gmail.com,2025-06-24
4,105,Chiru,Banglore,chiru@gmail.com,2025-05-28
5,106,Jaaaanu,Delhi,jaaanu@gmail.com,2025-06-24
6,107,Ravi,Delhi,ravi@gmail.com,2025-06-20
7,108,Jack,Delhi,jack@gmail.com,2025-06-20


In [34]:
only_new_df=incoming_df[~incoming_df['customerid'].isin(scd4_df['customerid'])]
only_new_df

Unnamed: 0,customerid,name,city,email,lastupdated
3,109,Aman,Delhi,aman@gmail.com,2025-06-24


In [35]:
final_df=pd.concat([old_val_updated_df,only_new_df],ignore_index=True)
final_df

Unnamed: 0,customerid,name,city,email,lastupdated
0,101,Tanuj,Bangalore,tanuj.new@gmail.com,2025-06-24
1,102,Meenu,Hyderabad,meenu@gmail.com,2025-02-22
2,103,John,Pune,john@gmail.com,2025-03-24
3,104,Smrithi,Chennai,smrithi@gmail.com,2025-06-24
4,105,Chiru,Banglore,chiru@gmail.com,2025-05-28
5,106,Jaaaanu,Delhi,jaaanu@gmail.com,2025-06-24
6,107,Ravi,Delhi,ravi@gmail.com,2025-06-20
7,108,Jack,Delhi,jack@gmail.com,2025-06-20
8,109,Aman,Delhi,aman@gmail.com,2025-06-24


In [36]:
final_df.to_sql('scd4_customers',con=engine,index=False,if_exists='replace')

9

In [37]:
df=pd.read_sql_table('scd4_customers',con=engine)
df

Unnamed: 0,customerid,name,city,email,lastupdated
0,101,Tanuj,Bangalore,tanuj.new@gmail.com,2025-06-24
1,102,Meenu,Hyderabad,meenu@gmail.com,2025-02-22
2,103,John,Pune,john@gmail.com,2025-03-24
3,104,Smrithi,Chennai,smrithi@gmail.com,2025-06-24
4,105,Chiru,Banglore,chiru@gmail.com,2025-05-28
5,106,Jaaaanu,Delhi,jaaanu@gmail.com,2025-06-24
6,107,Ravi,Delhi,ravi@gmail.com,2025-06-20
7,108,Jack,Delhi,jack@gmail.com,2025-06-20
8,109,Aman,Delhi,aman@gmail.com,2025-06-24
