# Automatic Change Tracking

This notebook demonstrates how pandalchemy automatically tracks all changes:
- Tracking inserts, updates, and deletes
- Getting change summaries
- Understanding tracked operations
- Inspecting what will be executed

In [1]:
import pandas as pd
from sqlalchemy import create_engine
import pandalchemy as pa

## Setup
Create a database with an employees table

In [2]:
engine = create_engine('sqlite:///:memory:')
db = pa.DataBase(engine)

initial_data = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'department': ['Engineering', 'Sales', 'Engineering'],
    'salary': [80000, 60000, 75000]
}, index=[1, 2, 3])

employees = pa.TableDataFrame('employees', initial_data, 'id', engine)
employees.push()

print("Initial state:")
employees.to_pandas()

Initial state:


Unnamed: 0_level_0,name,department,salary
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Alice,Engineering,80000
2,Bob,Sales,60000
3,Charlie,Engineering,75000


### Check initial state

In [3]:
print(f"Has changes: {employees.has_changes()}")

Has changes: False


## Making Changes
Let's make various types of changes and see how they're tracked

### Add new employees (inserts)

In [4]:
employees.add_row({'id': 4, 'name': 'Diana', 'department': 'Marketing', 'salary': 70000})
employees.add_row({'id': 5, 'name': 'Eve', 'department': 'Engineering', 'salary': 85000})
print("✓ Added Diana and Eve")

✓ Added Diana and Eve


### Give raises (updates)

In [5]:
employees.update_row(1, {'salary': 85000})
employees.update_row(3, {'salary': 80000})
print("✓ Updated Alice and Charlie's salaries")

✓ Updated Alice and Charlie's salaries


### Remove employee (delete)

In [6]:
employees.delete_row(2)
print("✓ Deleted Bob")

✓ Deleted Bob


### Department reorganization (bulk update via pandas)

In [7]:
employees._data.loc[employees._data['department'] == 'Engineering', 'department'] = 'Product'
print("✓ Renamed Engineering to Product department")

✓ Renamed Engineering to Product department


## Change Summary
Get a summary of all tracked changes

In [8]:
summary = employees.get_changes_summary()
print(f"Has changes: {employees.has_changes()}\n")
print("Change summary:")
for key, value in summary.items():
    if value:  # Only show non-zero values
        print(f"  {key}: {value}")

Has changes: True

Change summary:
  total_operations: 5
  inserts: 2
  updates: 2
  deletes: 1
  has_changes: True


## Tracked Operations
Inspect the detailed tracking information

In [9]:
tracker = employees.get_tracker()

print(f"Total operations tracked: {len(tracker.operations)}")
print(f"Has changes: {employees.has_changes()}\n")

print("Row-level changes:")
print(f"  Inserted rows: {len(tracker.get_inserts())}")
print(f"  Updated rows: {len(tracker.get_updates())}")
print(f"  Deleted rows: {len(tracker.get_deletes())}")

Total operations tracked: 5
Has changes: True

Row-level changes:
  Inserted rows: 2
  Updated rows: 2
  Deleted rows: 1


## Preview Before Push
See what the data looks like before committing

In [10]:
print("Current state (before push):")
employees.to_pandas()

Current state (before push):


Unnamed: 0_level_0,name,department,salary
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Alice,Product,85000
3,Charlie,Product,80000
4,Diana,Marketing,70000
5,Eve,Product,85000


### Statistics

In [11]:
print(f"Total rows in memory: {len(employees._data)}")
print(f"Rows to insert: {summary.get('inserts', 0)}")
print(f"Rows to update: {summary.get('updates', 0)}")
print(f"Rows to delete: {summary.get('deletes', 0)}")

Total rows in memory: 4
Rows to insert: 2
Rows to update: 2
Rows to delete: 1


## Push Changes
Commit all changes to the database in one transaction

In [12]:
employees.push()
print("✓ Changes pushed to database")
print(f"Has changes after push: {employees.has_changes()}")

# Verify from database
employees.pull()
print("\nVerified state from database:")
employees.to_pandas()

✓ Changes pushed to database
Has changes after push: False

Verified state from database:


Unnamed: 0_level_0,name,department,salary
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Alice,Product,85000
3,Charlie,Product,80000
4,Diana,Marketing,70000
5,Eve,Product,85000


## More Complex Tracking
Making multiple changes in sequence

### Add another employee

In [13]:
employees.add_row({'id': 6, 'name': 'Frank', 'department': 'Sales', 'salary': 65000})
print("✓ Added Frank")

✓ Added Frank


### Update multiple employees

In [14]:
# Give raises to Product department
employees.update_where(
    employees._data['department'] == 'Product',
    {'salary': lambda x: x + 5000}
)
print("✓ Gave raises to Product department")

✓ Gave raises to Product department


### Check changes before pushing

In [15]:
summary2 = employees.get_changes_summary()
print("Change summary:")
for key, value in summary2.items():
    if value:
        print(f"  {key}: {value}")

Change summary:
  total_operations: 2
  inserts: 1
  updates: 3
  has_changes: True


### Push all changes

In [16]:
employees.push()
print("✓ All changes committed")

employees.to_pandas()

✓ All changes committed


Unnamed: 0_level_0,name,department,salary
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Alice,Product,90000
3,Charlie,Product,85000
4,Diana,Marketing,70000
5,Eve,Product,90000
6,Frank,Sales,65000


## Change Tracking Lifecycle

### After pull, tracker is reset

In [17]:
employees.pull()
print(f"Has changes after pull: {employees.has_changes()}")
print(f"Change summary: {employees.get_changes_summary()}")

Has changes after pull: False
Change summary: {'total_operations': 0, 'inserts': 0, 'updates': 0, 'deletes': 0, 'columns_added': 0, 'columns_dropped': 0, 'columns_renamed': 0, 'columns_type_changed': 0, 'has_changes': False}


### Make new changes

In [18]:
employees.update_row(1, {'salary': 90000})
print(f"\nAfter new change:")
print(f"  Has changes: {employees.has_changes()}")
print(f"  Changes: {employees.get_changes_summary()}")


After new change:
  Has changes: False
  Changes: {'total_operations': 1, 'inserts': 0, 'updates': 0, 'deletes': 0, 'columns_added': 0, 'columns_dropped': 0, 'columns_renamed': 0, 'columns_type_changed': 0, 'has_changes': False}


## Working with Change Details

In [19]:
# Make multiple operations
employees.update_row(4, {'department': 'Sales'})
employees.update_row(5, {'salary': 88000})
employees.delete_row(6)

tracker = employees.get_tracker()

# Show affected IDs
update_ids = [rc.primary_key_value for rc in tracker.get_updates()]
delete_ids = [rc.primary_key_value for rc in tracker.get_deletes()]

print(f"Inserted rows: {len(tracker.get_inserts())}")
print(f"Updated rows: {len(tracker.get_updates())}")
print(f"Deleted rows: {len(tracker.get_deletes())}")

if update_ids:
    print(f"\nUpdated row IDs: {', '.join(map(str, update_ids))}")
if delete_ids:
    print(f"Deleted row IDs: {', '.join(map(str, delete_ids))}")

employees.push()

Inserted rows: 0
Updated rows: 2
Deleted rows: 1

Updated row IDs: 4, 5
Deleted row IDs: 6


## Summary

**Key Takeaways:**
- All DataFrame operations are automatically tracked
- Use `get_changes_summary()` to see what will be executed
- Use `has_changes()` to check if push is needed
- Tracker is reset after `push()` or `pull()`
- Access detailed change information via `get_tracker()`
- Changes are organized by type: inserts, updates, deletes, schema changes