# SQLAlchemy ORM: Database Interactions with Python

This notebook covers **SQLAlchemy** - the most popular Python SQL toolkit and Object-Relational Mapper (ORM).

Topics covered:
1. **Core vs ORM**: Two ways to use SQLAlchemy
2. **Creating tables** with ORM models
3. **CRUD operations**: Create, Read, Update, Delete
4. **Querying** with filters, joins, and aggregations
5. **Relationships**: One-to-many, many-to-many
6. **Pandas integration**: read_sql, to_sql
7. **Connection pooling** and session management
8. **Migrations** with Alembic (briefly)

We'll use **SQLite** for simplicity (no server required), but concepts apply to PostgreSQL, MySQL, etc.

In [None]:
import sqlalchemy as sa
from sqlalchemy import create_engine, Column, Integer, String, Float, DateTime, ForeignKey, Table
from sqlalchemy.orm import declarative_base, Session, relationship
from sqlalchemy.pool import StaticPool
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from pathlib import Path

print(f"SQLAlchemy version: {sa.__version__}")

# Create output directory
output_dir = Path('../fixtures/output')
output_dir.mkdir(exist_ok=True, parents=True)

## 1. Create Database Engine

The **Engine** is the starting point for any SQLAlchemy application.

In [None]:
# SQLite in-memory database (for testing)
engine = create_engine(
    'sqlite:///:memory:',
    echo=False,  # Set to True to see SQL queries
    poolclass=StaticPool
)

# For persistent storage, use:
# engine = create_engine('sqlite:///my_database.db')

# For PostgreSQL:
# engine = create_engine('postgresql://user:password@localhost/dbname')

# For MySQL:
# engine = create_engine('mysql+pymysql://user:password@localhost/dbname')

print(f"Engine created: {engine}")

## 2. Define ORM Models

ORM models are Python classes that map to database tables.

In [None]:
# Create base class for declarative models
Base = declarative_base()

# Define Customer model
class Customer(Base):
    __tablename__ = 'customers'
    
    id = Column(Integer, primary_key=True, autoincrement=True)
    name = Column(String(100), nullable=False)
    email = Column(String(100), unique=True, nullable=False)
    country = Column(String(50))
    signup_date = Column(DateTime, default=datetime.utcnow)
    loyalty_tier = Column(String(20), default='Bronze')
    
    # Relationship to orders (one-to-many)
    orders = relationship('Order', back_populates='customer')
    
    def __repr__(self):
        return f"<Customer(id={self.id}, name='{self.name}', email='{self.email}')>"


# Define Product model
class Product(Base):
    __tablename__ = 'products'
    
    id = Column(Integer, primary_key=True, autoincrement=True)
    name = Column(String(200), nullable=False)
    category = Column(String(50))
    price = Column(Float, nullable=False)
    stock_quantity = Column(Integer, default=0)
    
    # Relationship to order items
    order_items = relationship('OrderItem', back_populates='product')
    
    def __repr__(self):
        return f"<Product(id={self.id}, name='{self.name}', price={self.price})>"


# Define Order model
class Order(Base):
    __tablename__ = 'orders'
    
    id = Column(Integer, primary_key=True, autoincrement=True)
    customer_id = Column(Integer, ForeignKey('customers.id'), nullable=False)
    order_date = Column(DateTime, default=datetime.utcnow)
    status = Column(String(20), default='Pending')
    shipping_cost = Column(Float, default=0.0)
    
    # Relationships
    customer = relationship('Customer', back_populates='orders')
    items = relationship('OrderItem', back_populates='order', cascade='all, delete-orphan')
    
    def __repr__(self):
        return f"<Order(id={self.id}, customer_id={self.customer_id}, status='{self.status}')>"


# Define OrderItem model (many-to-many relationship between Orders and Products)
class OrderItem(Base):
    __tablename__ = 'order_items'
    
    id = Column(Integer, primary_key=True, autoincrement=True)
    order_id = Column(Integer, ForeignKey('orders.id'), nullable=False)
    product_id = Column(Integer, ForeignKey('products.id'), nullable=False)
    quantity = Column(Integer, nullable=False, default=1)
    unit_price = Column(Float, nullable=False)
    
    # Relationships
    order = relationship('Order', back_populates='items')
    product = relationship('Product', back_populates='order_items')
    
    def __repr__(self):
        return f"<OrderItem(order_id={self.order_id}, product_id={self.product_id}, qty={self.quantity})>"


print("ORM Models defined:")
print(f"  - Customer")
print(f"  - Product")
print(f"  - Order")
print(f"  - OrderItem")

## 3. Create Tables

Use `Base.metadata.create_all()` to create all tables defined in our models.

In [None]:
# Create all tables
Base.metadata.create_all(engine)

print("Tables created successfully!")
print(f"\nTable names: {list(Base.metadata.tables.keys())}")

## 4. CRUD Operations: Create (Insert)

Insert data using ORM instances.

In [None]:
# Create a session
session = Session(engine)

# Insert customers
customers = [
    Customer(name='Alice Johnson', email='alice@example.com', country='USA', loyalty_tier='Gold'),
    Customer(name='Bob Smith', email='bob@example.com', country='UK', loyalty_tier='Silver'),
    Customer(name='Charlie Davis', email='charlie@example.com', country='Canada', loyalty_tier='Platinum'),
    Customer(name='Diana Lee', email='diana@example.com', country='USA', loyalty_tier='Bronze'),
    Customer(name='Eve Martinez', email='eve@example.com', country='Germany', loyalty_tier='Gold'),
]

session.add_all(customers)
session.commit()

print(f"Inserted {len(customers)} customers")
for customer in customers:
    print(f"  {customer}")

In [None]:
# Insert products
products = [
    Product(name='Laptop Pro 15', category='Electronics', price=1299.99, stock_quantity=50),
    Product(name='Wireless Mouse', category='Accessories', price=29.99, stock_quantity=200),
    Product(name='Mechanical Keyboard', category='Accessories', price=89.99, stock_quantity=150),
    Product(name='27" Monitor', category='Electronics', price=349.99, stock_quantity=75),
    Product(name='USB-C Hub', category='Accessories', price=49.99, stock_quantity=120),
    Product(name='Webcam HD', category='Electronics', price=79.99, stock_quantity=60),
    Product(name='Headphones', category='Accessories', price=199.99, stock_quantity=90),
]

session.add_all(products)
session.commit()

print(f"Inserted {len(products)} products")

In [None]:
# Create orders with items
np.random.seed(42)

for i in range(20):
    # Create order
    customer = np.random.choice(customers)
    order = Order(
        customer=customer,
        order_date=datetime.utcnow() - timedelta(days=np.random.randint(0, 90)),
        status=np.random.choice(['Pending', 'Shipped', 'Delivered', 'Cancelled'], p=[0.1, 0.3, 0.5, 0.1]),
        shipping_cost=round(np.random.uniform(5, 30), 2)
    )
    
    # Add items to order
    n_items = np.random.randint(1, 4)
    selected_products = np.random.choice(products, n_items, replace=False)
    
    for product in selected_products:
        item = OrderItem(
            order=order,
            product=product,
            quantity=np.random.randint(1, 4),
            unit_price=product.price
        )
        session.add(item)
    
    session.add(order)

session.commit()
print("Created 20 orders with items")

## 5. CRUD Operations: Read (Query)

In [None]:
# Get all customers
all_customers = session.query(Customer).all()
print(f"Total customers: {len(all_customers)}")
for customer in all_customers[:3]:
    print(f"  {customer}")

In [None]:
# Get customer by ID
customer = session.query(Customer).get(1)
print(f"Customer #1: {customer}")

# Alternative using filter
customer = session.query(Customer).filter(Customer.id == 1).first()
print(f"Customer #1 (using filter): {customer}")

In [None]:
# Filter queries
gold_customers = session.query(Customer).filter(Customer.loyalty_tier == 'Gold').all()
print(f"Gold tier customers: {len(gold_customers)}")
for customer in gold_customers:
    print(f"  {customer.name} - {customer.country}")

In [None]:
# Multiple filters (AND)
usa_gold = session.query(Customer).filter(
    Customer.country == 'USA',
    Customer.loyalty_tier == 'Gold'
).all()

print(f"USA Gold customers: {len(usa_gold)}")
for customer in usa_gold:
    print(f"  {customer}")

In [None]:
# OR conditions
from sqlalchemy import or_

premium_customers = session.query(Customer).filter(
    or_(
        Customer.loyalty_tier == 'Gold',
        Customer.loyalty_tier == 'Platinum'
    )
).all()

print(f"Premium customers (Gold or Platinum): {len(premium_customers)}")

In [None]:
# LIKE query
products_with_usb = session.query(Product).filter(Product.name.like('%USB%')).all()
print("Products containing 'USB':")
for product in products_with_usb:
    print(f"  {product}")

In [None]:
# Ordering
expensive_products = session.query(Product).order_by(Product.price.desc()).limit(5).all()
print("Top 5 most expensive products:")
for product in expensive_products:
    print(f"  {product.name}: ${product.price}")

In [None]:
# Count
total_orders = session.query(Order).count()
delivered_orders = session.query(Order).filter(Order.status == 'Delivered').count()

print(f"Total orders: {total_orders}")
print(f"Delivered orders: {delivered_orders}")

## 6. Working with Relationships

In [None]:
# Access related objects (one-to-many)
customer = session.query(Customer).first()
print(f"Customer: {customer.name}")
print(f"Number of orders: {len(customer.orders)}")
print("\nOrders:")
for order in customer.orders:
    print(f"  Order #{order.id}: {order.status}, {len(order.items)} items")

In [None]:
# Navigate through relationships
order = session.query(Order).first()
print(f"Order #{order.id}")
print(f"Customer: {order.customer.name}")
print(f"Items in order:")
for item in order.items:
    print(f"  {item.product.name}: {item.quantity} x ${item.unit_price}")

## 7. Joins and Aggregations

In [None]:
# Join tables
from sqlalchemy import func

results = session.query(
    Customer.name,
    func.count(Order.id).label('order_count')
).join(Order).group_by(Customer.id, Customer.name).all()

print("Orders per customer:")
for name, count in results:
    print(f"  {name}: {count} orders")

In [None]:
# Complex join with aggregation
results = session.query(
    Customer.name,
    func.count(Order.id).label('num_orders'),
    func.sum(OrderItem.quantity * OrderItem.unit_price).label('total_spent')
).join(Order).join(OrderItem).filter(
    Order.status == 'Delivered'
).group_by(Customer.id, Customer.name).order_by(
    func.sum(OrderItem.quantity * OrderItem.unit_price).desc()
).all()

print("Customer spending (delivered orders only):")
for name, num_orders, total in results:
    print(f"  {name}: {num_orders} orders, ${total:.2f} total")

## 8. CRUD Operations: Update

In [None]:
# Update single record
customer = session.query(Customer).filter(Customer.email == 'alice@example.com').first()
print(f"Before: {customer.name} - {customer.loyalty_tier}")

customer.loyalty_tier = 'Platinum'
session.commit()

# Verify
customer = session.query(Customer).filter(Customer.email == 'alice@example.com').first()
print(f"After: {customer.name} - {customer.loyalty_tier}")

In [None]:
# Bulk update
affected = session.query(Product).filter(Product.category == 'Accessories').update(
    {Product.stock_quantity: Product.stock_quantity + 50}
)
session.commit()

print(f"Updated stock for {affected} accessories")

## 9. CRUD Operations: Delete

In [None]:
# Delete single record
# First, create a test product
test_product = Product(name='Test Product', category='Test', price=1.0, stock_quantity=0)
session.add(test_product)
session.commit()

product_id = test_product.id
print(f"Created test product with ID: {product_id}")

# Delete it
session.delete(test_product)
session.commit()

# Verify deletion
deleted = session.query(Product).get(product_id)
print(f"Product after deletion: {deleted}")

In [None]:
# Bulk delete
affected = session.query(Order).filter(Order.status == 'Cancelled').delete()
session.commit()

print(f"Deleted {affected} cancelled orders")

## 10. Pandas Integration

SQLAlchemy works seamlessly with pandas.

In [None]:
# Read SQL query into DataFrame
df_customers = pd.read_sql_query(
    'SELECT * FROM customers',
    engine
)

print(f"Customers DataFrame:")
df_customers.head()

In [None]:
# Complex query to DataFrame
query = """
    SELECT 
        c.name,
        c.country,
        c.loyalty_tier,
        COUNT(o.id) as num_orders,
        SUM(oi.quantity * oi.unit_price) as total_spent
    FROM customers c
    LEFT JOIN orders o ON c.id = o.customer_id
    LEFT JOIN order_items oi ON o.id = oi.order_id
    GROUP BY c.id, c.name, c.country, c.loyalty_tier
    ORDER BY total_spent DESC
"""

df_customer_stats = pd.read_sql_query(query, engine)
print("Customer statistics:")
df_customer_stats.head()

In [None]:
# Write DataFrame to database
new_products = pd.DataFrame([
    {'name': 'Laptop Stand', 'category': 'Accessories', 'price': 39.99, 'stock_quantity': 100},
    {'name': 'Cable Organizer', 'category': 'Accessories', 'price': 12.99, 'stock_quantity': 200},
    {'name': 'Desk Lamp', 'category': 'Accessories', 'price': 45.99, 'stock_quantity': 80},
])

# Append to products table
new_products.to_sql('products', engine, if_exists='append', index=False)

print(f"Added {len(new_products)} new products")

# Verify
total_products = session.query(Product).count()
print(f"Total products now: {total_products}")

## 11. Using SQLAlchemy with ORM Queries

In [None]:
# Convert ORM query to DataFrame
query = session.query(Product).filter(Product.price > 50)
df_expensive = pd.read_sql(query.statement, engine)

print(f"Products over $50:")
df_expensive.head(10)

## 12. Connection Pooling

SQLAlchemy automatically manages connection pools for efficiency.

In [None]:
# Create engine with connection pool settings
from sqlalchemy.pool import QueuePool

# Example with PostgreSQL (commented out)
# pooled_engine = create_engine(
#     'postgresql://user:password@localhost/dbname',
#     poolclass=QueuePool,
#     pool_size=5,          # Number of connections to maintain
#     max_overflow=10,      # Max additional connections beyond pool_size
#     pool_timeout=30,      # Timeout for getting connection from pool
#     pool_recycle=3600,    # Recycle connections after 1 hour
# )

print("Connection pool configuration:")
print(f"  Pool class: {engine.pool.__class__.__name__}")
print(f"  Pool size: {getattr(engine.pool, '_pool_size', 'N/A')}")

## 13. Session Management Best Practices

In [None]:
# Context manager for automatic session cleanup
from sqlalchemy.orm import sessionmaker

SessionLocal = sessionmaker(bind=engine)

# Good practice: use context manager
def get_customer_orders(customer_id):
    with SessionLocal() as session:
        customer = session.query(Customer).get(customer_id)
        if customer:
            return [(order.id, order.status) for order in customer.orders]
        return []

orders = get_customer_orders(1)
print(f"Customer #1 orders: {orders}")

## 14. Advanced Querying Techniques

In [None]:
# Subquery
from sqlalchemy import select

# Find customers who spent more than average
avg_spending = session.query(
    func.avg(OrderItem.quantity * OrderItem.unit_price)
).join(Order).scalar()

print(f"Average spending per order item: ${avg_spending:.2f}")

high_spenders = session.query(Customer).join(Order).join(OrderItem).group_by(
    Customer.id
).having(
    func.sum(OrderItem.quantity * OrderItem.unit_price) > avg_spending * 2
).all()

print(f"\nHigh spenders (>2x average):")
for customer in high_spenders:
    print(f"  {customer.name}")

In [None]:
# IN clause with list
target_countries = ['USA', 'UK', 'Canada']
customers = session.query(Customer).filter(Customer.country.in_(target_countries)).all()

print(f"Customers from {target_countries}:")
for customer in customers:
    print(f"  {customer.name} ({customer.country})")

In [None]:
# EXISTS subquery
from sqlalchemy import exists

# Find customers who have placed at least one order
customers_with_orders = session.query(Customer).filter(
    exists().where(Order.customer_id == Customer.id)
).all()

print(f"Customers with orders: {len(customers_with_orders)}")

## 15. Summary and Best Practices

### Key Concepts:

1. **Engine**: Database connection factory
2. **Session**: Unit of work for database operations
3. **Models**: Python classes mapped to tables
4. **Relationships**: Define how tables relate to each other
5. **Queries**: Type-safe, composable database queries

### Best Practices:

1. **Use context managers** for sessions
   ```python
   with Session(engine) as session:
       # Do work
       session.commit()
   ```

2. **Always commit or rollback**
   - `session.commit()` - save changes
   - `session.rollback()` - discard changes

3. **Use eager loading** for related objects to avoid N+1 queries
   ```python
   from sqlalchemy.orm import joinedload
   session.query(Customer).options(joinedload(Customer.orders))
   ```

4. **Connection pooling** for production
   - Configure `pool_size` and `max_overflow`
   - Set `pool_recycle` for long-running apps

5. **Use migrations** (Alembic) for schema changes
   - Track schema evolution
   - Safe upgrades/downgrades

6. **Index frequently queried columns**
   ```python
   email = Column(String(100), index=True)
   ```

7. **Pandas integration**
   - Use `pd.read_sql()` for queries
   - Use `df.to_sql()` for bulk inserts

8. **Choose Core vs ORM appropriately**
   - **ORM**: Application logic, relationships, complex models
   - **Core**: Performance-critical bulk operations, analytics

In [None]:
# Final statistics
stats = {
    'Customers': session.query(Customer).count(),
    'Products': session.query(Product).count(),
    'Orders': session.query(Order).count(),
    'Order Items': session.query(OrderItem).count(),
}

print("Database Statistics:")
for table, count in stats.items():
    print(f"  {table}: {count}")

# Close session
session.close()
print("\nSession closed. Notebook complete!")