# SQLAlchemy ORM

![ORM Layer](./images/orm_layer.png)

ORM stands for Object Relational Mapper, and is a layer that maps database rows to python objects. When programming, we often prefer to work with objects, rather than primitive types, at the cost of some flexibility and transparency into the underlying SQL

As a general rule of thumb 
- Core is better suited for analytical queries where we expect to get back many rows 
- ORM is better suited for applications where we often only need to work with one to a handful of rows at a time

## Defining the tables

As we're using the ORM, we need to define the classes (**O**bjects) that will map to the database. There are a few different ways to do this [mapping](https://docs.sqlalchemy.org/en/14/orm/mapping_styles.html#mapping-python-classes) in SQLAlchemy, but the classic way is to create a Base class, and inherit from that. 

You'll see this in lots of pre-2.0 codebases so it's important to recognize and understand what that means

In [None]:
import sqlalchemy as sa
from sqlalchemy.orm import declarative_base

Base = declarative_base() # This does not make typechecker and linters happy

class MyClass(Base):
    __tablename__ = "demo_table"
    
    # Note that ORM classes must define at least one primary_key
    class_id: int = sa.Column(sa.Integer, primary_key=True)
    name: str = sa.Column(sa.String)

One of the many changes in SQLAlchemy 2.0 is the ability to register classes through a decorator, which can feel more inline with `dataclass` and `attrs` based classes

In [None]:
from sqlalchemy.orm import registry

mapper_registry = registry()

Given the registry, we can now define classes to define the data models. The typehints are optional (for now) unlike `dataclasses` and `attrs`, but gives some extra type safety

These are regular classes (with extras), so we can add things like a `__repr__` to be able to print it nicely. In fact, they look a lot like `dataclasses` and `attrs` classes, long before either existed!

In [None]:
@mapper_registry.mapped
class Address:
    __tablename__ = "addresses"
    
    address_id: int = sa.Column(sa.Integer, primary_key=True)
    street_name: str = sa.Column(sa.VARCHAR(50))
    street_number: int = sa.Column(sa.Integer)
    postnr: str = sa.Column(sa.VARCHAR(4))
    
    # Add a nice string representation - it's just a class!
    def __repr__(self):
        return f"<Address street_name={self.street_name} street_number={self.street_number} postnr={self.postnr}>"

One extra is that the ORM layer autogenerates a SQLAlchemy Table and sets it to the `__table__` attribute, the same `Table` instance that we saw in Core

In [None]:
Address.__table__

Note that in both instances, we're not defining an `__init__` - SQLAlchemy will automatically generate one, though we can always add one if we want to, usually to be able to run some extra logic.

Let's finish our models - we can add a Purchase object and a Customer object and relate them:

In [None]:
import decimal
from sqlalchemy.orm import relationship

@mapper_registry.mapped
class Purchase:
    __tablename__ = "purchases"
    __table_args__ = {"extend_existing": True}
    
    purchase_id: int = sa.Column(sa.Integer, primary_key=True)
    item_name: str = sa.Column(sa.VARCHAR(200))
    price: decimal.Decimal = sa.Column(sa.Numeric(19, 4))
    user_id: int = sa.Column(sa.Integer, sa.ForeignKey("customers.customer_id"))
    
    def __repr__(self):
        return f"<Purchase item_name={self.item_name}>"

We can also use other native python types such as enums and decimals - SQLAlchemy will handle converting to and from SQL <-> Python datatypes

In [None]:
import enum

class StatusEnum(str, enum.Enum):
    gold = "gold"
    silver = "silver"
    bronze = "bronze"

In [None]:
@mapper_registry.mapped
class Customer:
    __tablename__ = "customers"
    __table_args__ = {"extend_existing": True}
    
    customer_id: int = sa.Column(sa.Integer, primary_key=True)
    name: str = sa.Column(sa.VARCHAR(50), unique=True)
    status: str = sa.Column(sa.Enum(StatusEnum))
    address_id: int = sa.Column(sa.Integer, sa.ForeignKey("addresses.address_id"))
    
    # One-to-one relationship
    address: Address = relationship("Address", backref="customer")
    
    # One-to-many
    purchases: list[Purchase] = relationship("Purchase", backref="customer")
    
    def __repr__(self):
        return f"<Customer name={self.name}>"

# Relationships
Here we are taking advantage of one of the main benefits of using an ORM - we can setup attributes to represent foreign key relationships!

A relationship allows us to issue SQL to select a related collection - essentially selecting the relevant rows from the other table.

To demonstrate, let's start by creating the tables and inserting some data

In [None]:
# If you have docker installed and haven't already run this - uncomment these lines
# !docker run -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:14-alpine
conn_string = "postgresql://postgres:postgres@localhost:5432"

# Otherwise, use the sqlite conn_string
# conn_string = "sqlite:///parking.db"

Since we are now "hiding" away the SQL, let's turn on `echo` to see what SQL statements are being issued under the hood - this is a great way to build understanding of what your ORM is doing for you and catch it when it does something you weren't expecting!

In [None]:
engine = sa.create_engine(conn_string, future=True, echo=True)

Since ORM builds on top of Core, we still use the engine and the metadata as we did before

In [None]:
mapper_registry.metadata.create_all(engine)

Let's create two instances of Customer - note that they're plain Python objects at this point

In [None]:
john = Customer(name="John", status=StatusEnum.gold)
jane = Customer(name="Jane", status=StatusEnum.bronze)

## Session

When working with the ORM, we use a Session instead of a connection. The session knows how to work with ORM-enabled classes, and serves as a local map of the various instances, keeping track of which instances have changes to be sent to the database, which instances are new and which are current. 

In [None]:
from sqlalchemy.orm import Session

In [None]:
with Session(engine) as session:
    # Unit-of-work in action
    session.add(john)
    session.add(jane)
    # Still have to actively commit
    session.commit()

Let's add an address to John's account

In [None]:
address = Address(street_name="Bogholder Allè", street_number=15, postnr=2720)

In [None]:
# Assign the Address instance to the attribute defined as a relationship
john.address = address

In [None]:
with Session(engine) as session:
    session.add(john)
    session.commit()

John now goes shopping

In [None]:
potion = Purchase(item_name="Magic Potion", price=20.00, customer=john)

In [None]:
with Session(engine) as session:
    session.add(potion)
    session.commit()

Let's add one more purchase:

In [None]:
magic_hat = Purchase(item_name="Magic Hat", price=100)

In [None]:
with Session(engine) as session:
    # Need to connect john to this session
    session.add(john)
    # purchases is a one-to-many relationship, so SQLAlchemy represents it as a list
    john.purchases.append(magic_hat)
    session.add(john)
    session.commit()

Now we have some data, how do we select from the database? The same way as for Core!

In [None]:
sql = sa.select(Customer).filter_by(name="Jane")
print(sql)

In [None]:
with Session(engine) as session:
    jane = session.execute(sql).one_or_none()

In [None]:
jane

The result of our query is a `Row` object, same as in Core - but usually in ORM mode, we're often interested in the `scalars` result - the value in the first column for each row.

SQLAlchemy supports this through the `scalars` modifier, as well as `scalars` helpers

In [None]:
with Session(engine) as session:
    jane = session.execute(sql).scalars().one_or_none()

In [None]:
jane

In [None]:
with Session(engine) as session:
    jane = session.execute(sql).scalar_one_or_none()

In [None]:
jane

If we know the primary key, SQLAlchemy provides an efficient method of looking up by primary key

In [None]:
with Session(engine) as session:
    jane2 = session.get(Customer, jane.customer_id)

In [None]:
jane2

Now that we fetched our customer, we can ask questions about related attributes

In [None]:
with Session(engine) as session:
    session.add(john)
    print("Purchases:\t", john.purchases)
    print("Address:\t", john.address)

(Notice what happens before each print statement in the SQL logs)

If we check a regular attribute, there's no SQL being emitted

In [None]:
john.status

## Relationship loading
To access the relationship attributes we need to be inside a session since by default, SQLAlchemy relationships are `lazy-loading`. 

Lazy-loaded queries generate additional SQL queries when accessed to prevent loading all related data into memory at once. The relationship can be configured to be loaded in [different](https://docs.sqlalchemy.org/en/14/orm/loading_relationships.html#relationship-loading-techniques) ways, defined either in the relationship constructor, or as `select` options

In [None]:
from sqlalchemy.orm import joinedload, selectinload

with Session(engine) as session:
    sql = sa.select(Customer).options(joinedload(Customer.address), selectinload(Customer.purchases)).where(Customer.name == "John")
    john = session.execute(sql).unique().scalar_one()

In [None]:
john

Alternatively, we can define the relationship to be something other than `lazy` - let's add a `loyalty_points` table that will make a record of how many loyalty points a given purchase has

In [None]:
@mapper_registry.mapped
class LoyaltyPoints:
    __tablename__ = "loyalty_points"
    __table_args__ = {"extend_existing": True}
    
    loyalty_point_id: int = sa.Column(sa.Integer, primary_key=True)
    customer_id: int = sa.Column(sa.Integer, sa.ForeignKey("customers.customer_id"))
    purchase_id: int = sa.Column(sa.Integer, sa.ForeignKey("purchases.purchase_id"))
    total_points: int = sa.Column(sa.Integer)
    
    # One-to-one relationship
    purchase: Purchase = relationship("Purchase", backref="points", lazy="joined")
    
    # One-to-one
    customer: Customer = relationship("Customer", backref="points", lazy="selectin")

First we have to create our new table

In [None]:
mapper_registry.metadata.create_all(engine)

Let's add some loyalty points

In [None]:
with Session(engine) as session:
    sql = sa.select(Customer).filter_by(name="John")
    john = session.execute(sql).scalar_one()
    loyalty_purchase = john.purchases[0]
    loyalty_points = LoyaltyPoints(customer=john, purchase=loyalty_purchase, total_points=1000)
    session.add(loyalty_points)
    session.commit()

Let's see what happens when we select John again

In [None]:
with Session(engine) as session:
    sql = sa.select(LoyaltyPoints).where(LoyaltyPoints.customer.has(Customer.name == "John"))
    john_points = session.execute(sql).scalar_one()

In [None]:
john_points.purchase

Relationship loading is one of the biggest benefits of ORMs, but can also be the easiest way to shoot yourself in the foot. Be mindful of your relationship loading strategies!

## ORMs are classes

The nice thing about working with ORM's is that they're just classes - you can add whatever methods you want, use inheritance through MixIns and other similar patterns

We want to enforce that all our models have a `last_updated` and `created_at` columns, but it can be pretty repetitive to add them manually

In [None]:
import datetime as dt

class CreatedMixin:
    last_updated: dt.datetime = sa.Column(sa.DateTime, default=sa.func.now(), onupdate=sa.func.now())
    created_at: dt.datetime = sa.Column(sa.DateTime, default=sa.func.now())

If you haven't seen a mixin before - it's a name for a pattern that allows classes to inherit functionality, but only extends instead of being meant to be overwritten

In [None]:
@mapper_registry.mapped
class User(CreatedMixin):
    __tablename__ = "users"
    __table_args__ = {"extend_existing": True} 
    
    primary: int = sa.Column(sa.Integer, primary_key=True)
    name: str = sa.Column(sa.String)
    role: str = sa.Column(sa.String)
    purchases: int = sa.Column(sa.Integer, default=0)

In [None]:
list(User.__table__.columns)

The User table has inherited all the columns, as we expected. This pattern can reduce boilerplate when using ORMs

In [None]:
user = User(name="Jade", role="admin")

Since `User` is a regular class, we can define a `classmethod` constructor to provide an alternative constructor, from a JSON payload for example

In [None]:
@mapper_registry.mapped
class User(CreatedMixin):
    __tablename__ = "users"
    __table_args__ = {"extend_existing": True} 
    
    primary: int = sa.Column(sa.Integer, primary_key=True)
    name: str = sa.Column(sa.String)
    role: str = sa.Column(sa.String)
    purchases: int = sa.Column(sa.Integer, default=0) # Note that default here only applies to the table - not to the generated __init__ function
    
    @classmethod
    def from_dict(cls, data):
        return cls(name=data["UserName"], role="public", purchases=0)

In [None]:
user = User.from_dict({"UserName": "Jarvis"})

We can add properties to our class

In [None]:
@mapper_registry.mapped
class User(CreatedMixin):
    __tablename__ = "users"
    __table_args__ = {"extend_existing": True} 
    
    primary: int = sa.Column(sa.Integer, primary_key=True)
    name: str = sa.Column(sa.String)
    role: str = sa.Column(sa.String)
    purchases: int = sa.Column(sa.Integer, default=0)
    
    @classmethod
    def from_dict(cls, data):
        return cls(name=data["UserName"], role="public", purchases=0)
    
    @property
    def is_admin(self):
        return self.role == "admin"

The instances have the defined property, just like we're used to

In [None]:
user = User(name="Jade", role="public")
user.is_admin

If we want to, we can also use the property in our queries, by defining it as a `hybrid_property`. This lets us write `User.is_admin` to generate a SQL expression`

In [None]:
from sqlalchemy.ext.hybrid import hybrid_property

@mapper_registry.mapped
class User(CreatedMixin):
    __tablename__ = "users"
    __table_args__ = {"extend_existing": True} 
    
    primary: int = sa.Column(sa.Integer, primary_key=True)
    name: str = sa.Column(sa.String)
    role: str = sa.Column(sa.String)
    purchases: int = sa.Column(sa.Integer, default=0)
    
    @classmethod
    def from_dict(cls, data):
        return cls(name=data["UserName"], role="public")
    
    @hybrid_property
    def is_admin(self):
        return self.role == "admin"

Let's create the tables, and try out with some SQL

In [None]:
mapper_registry.metadata.create_all(engine)

First, we create and add a user we can play with

In [None]:
user = User(name="Jade", role="admin")

In [None]:
session = Session(engine)
session.add(user)
session.commit() # using an open connection for ease of demoing - always use context managers where possible!

In [None]:
sql = sa.select(User).where(User.is_admin)

In [None]:
print(sql)

Notice that the SQL statement includes our property statement in the WHERE clause.

Let's also verify the mixin defaults, while we're at it

In [None]:
admin_user = session.execute(sql).scalar_one_or_none()
print(f"Last updated: {admin_user.last_updated:%H:%M:%S}")

In [None]:
admin_user.name = "Jade Smith"
session.add(admin_user)
session.commit()
print(f"Last updated: {admin_user.last_updated:%H:%M:%S}")

Sometimes the SQL logic and the Python logic differ, and need to be written two different ways. Each hybrid_property can define an expression to be run when used inside a SQL statement.

In [None]:
@mapper_registry.mapped
class User(CreatedMixin):
    __tablename__ = "users"
    __table_args__ = {"extend_existing": True} 
    
    primary: int = sa.Column(sa.Integer, primary_key=True)
    name: str = sa.Column(sa.String)
    role: str = sa.Column(sa.String)
    purchases: int = sa.Column(sa.Integer, default=0)
    
    @classmethod
    def from_dict(cls, data):
        return cls(name=data["UserName"], role="public")
    
    @hybrid_property
    def is_admin(self):
        return self.role == "admin"
    
    @hybrid_property
    def is_validated(self):
        return self.role in ["public", "admin"]
        
    @is_validated.expression # This provides the SQL definition of is_validated
    def is_validated(cls):
        return cls.role.in_(["public", "admin"])


In [None]:
user = User(name="Jade", role="public")

In [None]:
user.is_validated

In [None]:
sql = sa.select(User).where(User.is_validated)
print(sql)

In [None]:
validated_users = session.execute(sql).scalars().all()

In [None]:
validated_users[0].name

So we now have Python logic mapped to both SQL and our local python instance. So far, it's been a simple property, what about logic?

In [None]:
from sqlalchemy.ext.hybrid import hybrid_method

@mapper_registry.mapped
class User(CreatedMixin):
    __tablename__ = "users"
    __table_args__ = {"extend_existing": True} 
    
    primary: int = sa.Column(sa.Integer, primary_key=True)
    name: str = sa.Column(sa.String)
    role: str = sa.Column(sa.String)
    purchases: int = sa.Column(sa.Integer, default=0)
    
    @classmethod
    def from_dict(cls, data):
        return cls(name=data["UserName"], role="public")
    
    @hybrid_property
    def is_admin(self):
        return self.role == "admin"
    
    @hybrid_property
    def is_validated(self):
        return self.role in ["public", "admin"]
        
    @is_validated.expression
    def is_validated(cls):
        return cls.role.in_(["public", "admin"])

    # We can define arbitrary methods - handy for logic encapsulation!
    def purchase(self, session: Session, item_cost: int) -> int:
        self.purchases += item_cost
        session.add(self)
        return self.purchases
    
    @hybrid_method # Define arbitrary methods
    def calculate_roi(self, total_cost: int) -> float:
        return (self.purchases - total_cost) / total_cost

In [None]:
admin_user = session.execute(sql).scalar_one_or_none()
print(f"Last updated: {admin_user.last_updated:%H:%M:%S}")

We've added a regular `purchase` method, so let's try that first:

In [None]:
user = User(name="Jane", role="admin", purchases=0)

In [None]:
user.purchase(session, 2_000)
session.commit()

In [None]:
user.purchases

What if we want to use a calculation inside our SQL query? that's what the hybrid_method does. Follows the same rules as the property, but works with arguments

In [None]:
user.calculate_roi(total_cost=1000)

In [None]:
total_cost = 1000
sql = sa.select(User, User.calculate_roi(total_cost=total_cost)).where(User.calculate_roi(total_cost=total_cost) >= 1)

print(sql)

In [None]:
print([(user.name, user.calculate_roi(total_cost=total_cost)) for user in session.execute(sql).scalars()])

In [None]:
# Since we didn't use Session in a context manager - we need to close it when we're done
session.close()