# Object Relational Mapping: SQLAlchemy

SQLAlchemy provides an Object Relational Mapper (ORM) that allows us to interact with databases using pythonic object-oriented code rather than writing SQL queries. Out of the box it only works with SQLite, which is based on providing the entire database in a single file and only allows access via a single connection at a time. It is thus not recommended to use SQLite for any production web-based applications or any larger projects. For those cases we will cover PostreSQL support in the next lecture.

We use the [Chinook](https://github.com/lerocha/chinook-database) database as our example. The Chinook data model represents a digital media store, including tables for artists, albums, media tracks, invoices and customers.

In [25]:
import sqlalchemy as db

In [29]:
engine = db.create_engine('sqlite:///chinook.db')
connection = engine.connect()
metadata = db.MetaData()

# List tables in database
print(engine.table_names())

['albums', 'artists', 'customers', 'employees', 'genres', 'invoice_items', 'invoices', 'media_types', 'playlist_track', 'playlists', 'sqlite_sequence', 'sqlite_stat1', 'tracks']


In [30]:
employees = db.Table('employees', metadata, autoload=True, autoload_with=engine)

The table 'employees' contains the following columns:

In [3]:
print(employees.columns.keys())

['EmployeeId', 'LastName', 'FirstName', 'Title', 'ReportsTo', 'BirthDate', 'HireDate', 'Address', 'City', 'State', 'Country', 'PostalCode', 'Phone', 'Fax', 'Email']


The full metadata for the table contains additional information on for example the data types contained in the columns.

In [4]:
print(repr(metadata.tables['employees']))

Table('employees', MetaData(bind=None), Column('EmployeeId', INTEGER(), table=<employees>, primary_key=True, nullable=False), Column('LastName', NVARCHAR(length=20), table=<employees>, nullable=False), Column('FirstName', NVARCHAR(length=20), table=<employees>, nullable=False), Column('Title', NVARCHAR(length=30), table=<employees>), Column('ReportsTo', INTEGER(), ForeignKey('employees.EmployeeId'), table=<employees>), Column('BirthDate', DATETIME(), table=<employees>), Column('HireDate', DATETIME(), table=<employees>), Column('Address', NVARCHAR(length=70), table=<employees>), Column('City', NVARCHAR(length=40), table=<employees>), Column('State', NVARCHAR(length=40), table=<employees>), Column('Country', NVARCHAR(length=40), table=<employees>), Column('PostalCode', NVARCHAR(length=10), table=<employees>), Column('Phone', NVARCHAR(length=24), table=<employees>), Column('Fax', NVARCHAR(length=24), table=<employees>), Column('Email', NVARCHAR(length=60), table=<employees>), schema=None)

### Querying the DB

Now let's see what some common SQL queries look like:

SQL:

- SELECT * FROM employees

In [5]:
query = db.select([employees])
ResultProxy = connection.execute(query)
ResultSet = ResultProxy.fetchall()
ResultSet[:3]

[(1, 'Adams', 'Andrew', 'General Manager', None, datetime.datetime(1962, 2, 18, 0, 0), datetime.datetime(2002, 8, 14, 0, 0), '11120 Jasper Ave NW', 'Edmonton', 'AB', 'Canada', 'T5K 2N1', '+1 (780) 428-9482', '+1 (780) 428-3457', 'andrew@chinookcorp.com'),
 (2, 'Edwards', 'Nancy', 'Sales Manager', 1, datetime.datetime(1958, 12, 8, 0, 0), datetime.datetime(2002, 5, 1, 0, 0), '825 8 Ave SW', 'Calgary', 'AB', 'Canada', 'T2P 2T3', '+1 (403) 262-3443', '+1 (403) 262-3322', 'nancy@chinookcorp.com'),
 (3, 'Peacock', 'Jane', 'Sales Support Agent', 2, datetime.datetime(1973, 8, 29, 0, 0), datetime.datetime(2002, 4, 1, 0, 0), '1111 6 Ave SW', 'Calgary', 'AB', 'Canada', 'T2P 5M5', '+1 (403) 262-3443', '+1 (403) 262-6712', 'jane@chinookcorp.com')]

```ResultProxy``` here is the _object_ returned by the ```execute()``` method. It can be used to extract data in several ways, while ```ResultSet``` contains the actual data we asked for from ```ResultProxy``` (in this case just everything).

In [11]:
# Use fetchmany to fetch a specific number of entries (useful for large db)
ResultProxy = connection.execute(query)
# fetch first 3 entries
ResultSet = ResultProxy.fetchmany(3)
print(ResultSet)
# fetch next 3 entries
ResultSet = ResultProxy.fetchmany(3)
print(ResultSet)

[(1, 'Adams', 'Andrew', 'General Manager', None, datetime.datetime(1962, 2, 18, 0, 0), datetime.datetime(2002, 8, 14, 0, 0), '11120 Jasper Ave NW', 'Edmonton', 'AB', 'Canada', 'T5K 2N1', '+1 (780) 428-9482', '+1 (780) 428-3457', 'andrew@chinookcorp.com'), (2, 'Edwards', 'Nancy', 'Sales Manager', 1, datetime.datetime(1958, 12, 8, 0, 0), datetime.datetime(2002, 5, 1, 0, 0), '825 8 Ave SW', 'Calgary', 'AB', 'Canada', 'T2P 2T3', '+1 (403) 262-3443', '+1 (403) 262-3322', 'nancy@chinookcorp.com'), (3, 'Peacock', 'Jane', 'Sales Support Agent', 2, datetime.datetime(1973, 8, 29, 0, 0), datetime.datetime(2002, 4, 1, 0, 0), '1111 6 Ave SW', 'Calgary', 'AB', 'Canada', 'T2P 5M5', '+1 (403) 262-3443', '+1 (403) 262-6712', 'jane@chinookcorp.com')]
[(4, 'Park', 'Margaret', 'Sales Support Agent', 2, datetime.datetime(1947, 9, 19, 0, 0), datetime.datetime(2003, 5, 3, 0, 0), '683 10 Street SW', 'Calgary', 'AB', 'Canada', 'T2P 5G3', '+1 (403) 263-4423', '+1 (403) 263-4289', 'margaret@chinookcorp.com'), (5

### Converting to pandas Dataframe

For those familiar with the ```pandas``` package, a table can easily be turned into a Dataframe:

In [12]:
import pandas as pd
ResultSet = ResultProxy.fetchall()
df = pd.DataFrame(ResultSet)
df.columns = ResultSet[0].keys()
print(df)

   EmployeeId  LastName FirstName     Title  ReportsTo  BirthDate   HireDate  \
0           7      King    Robert  IT Staff          6 1970-05-29 2004-01-02   
1           8  Callahan     Laura  IT Staff          6 1968-01-09 2004-03-04   

                       Address        City State Country PostalCode  \
0  590 Columbia Boulevard West  Lethbridge    AB  Canada    T1K 5N8   
1                  923 7 ST NW  Lethbridge    AB  Canada    T1H 1Y8   

               Phone                Fax                   Email  
0  +1 (403) 456-9986  +1 (403) 456-8485  robert@chinookcorp.com  
1  +1 (403) 467-3351  +1 (403) 467-8772   laura@chinookcorp.com  


### Filtering

To find all cities in the database:

SQL:

- SELECT City FROM employees

In [13]:
query = db.select([employees.columns.City])
ResultProxy = connection.execute(query)
ResultSet = ResultProxy.fetchall()
ResultSet

[('Edmonton',),
 ('Calgary',),
 ('Calgary',),
 ('Calgary',),
 ('Calgary',),
 ('Calgary',),
 ('Lethbridge',),
 ('Lethbridge',)]

SQL:

- SELECT * FROM employees WHERE City = 'Lethbridge';

In [14]:
query = db.select([employees]).where(employees.columns.City == 'Lethbridge')
ResultProxy = connection.execute(query)
ResultSet = ResultProxy.fetchall()
ResultSet

[(7, 'King', 'Robert', 'IT Staff', 6, datetime.datetime(1970, 5, 29, 0, 0), datetime.datetime(2004, 1, 2, 0, 0), '590 Columbia Boulevard West', 'Lethbridge', 'AB', 'Canada', 'T1K 5N8', '+1 (403) 456-9986', '+1 (403) 456-8485', 'robert@chinookcorp.com'),
 (8, 'Callahan', 'Laura', 'IT Staff', 6, datetime.datetime(1968, 1, 9, 0, 0), datetime.datetime(2004, 3, 4, 0, 0), '923 7 ST NW', 'Lethbridge', 'AB', 'Canada', 'T1H 1Y8', '+1 (403) 467-3351', '+1 (403) 467-8772', 'laura@chinookcorp.com')]

SQL:

- SELECT Title, ReportsTo FROM employees WHERE Title IN ('Sales Manager', 'IT Staff');

In [15]:
query = db.select([employees.columns.Title, employees.columns.ReportsTo]).where(employees.columns.Title.in_(['Sales Manager', 'IT Staff']))
ResultProxy = connection.execute(query)
ResultSet = ResultProxy.fetchall()
ResultSet

[('Sales Manager', 1), ('IT Staff', 6), ('IT Staff', 6)]

SQL:

- SELECT * FROM employees WHERE City = 'Calgary' AND NOT Title = 'IT Manager'

query = db.select([employees]).where(db.and_(employees.columns.City == 'Calgary', employees.columns.Title != 'IT Manager'))
ResultProxy = connection.execute(query)
ResultSet = ResultProxy.fetchall()
ResultSet

### Ordering

SQL:
    
- SELECT * FROM employees ORDER BY HireDate, LastName

In [20]:
query = db.select([employees]).order_by(db.asc(employees.columns.HireDate), employees.columns.LastName)
ResultProxy = connection.execute(query)
ResultSet = ResultProxy.fetchall()
ResultSet

[(3, 'Peacock', 'Jane', 'Sales Support Agent', 2, datetime.datetime(1973, 8, 29, 0, 0), datetime.datetime(2002, 4, 1, 0, 0), '1111 6 Ave SW', 'Calgary', 'AB', 'Canada', 'T2P 5M5', '+1 (403) 262-3443', '+1 (403) 262-6712', 'jane@chinookcorp.com'),
 (2, 'Edwards', 'Nancy', 'Sales Manager', 1, datetime.datetime(1958, 12, 8, 0, 0), datetime.datetime(2002, 5, 1, 0, 0), '825 8 Ave SW', 'Calgary', 'AB', 'Canada', 'T2P 2T3', '+1 (403) 262-3443', '+1 (403) 262-3322', 'nancy@chinookcorp.com'),
 (1, 'Adams', 'Andrew', 'General Manager', None, datetime.datetime(1962, 2, 18, 0, 0), datetime.datetime(2002, 8, 14, 0, 0), '11120 Jasper Ave NW', 'Edmonton', 'AB', 'Canada', 'T5K 2N1', '+1 (780) 428-9482', '+1 (780) 428-3457', 'andrew@chinookcorp.com'),
 (4, 'Park', 'Margaret', 'Sales Support Agent', 2, datetime.datetime(1947, 9, 19, 0, 0), datetime.datetime(2003, 5, 3, 0, 0), '683 10 Street SW', 'Calgary', 'AB', 'Canada', 'T2P 5G3', '+1 (403) 263-4423', '+1 (403) 263-4289', 'margaret@chinookcorp.com'),


Let's have a look at another table for the next set of examples:

In [31]:
invoices = db.Table('invoices', metadata, autoload=True, autoload_with=engine)
print(invoices.columns.keys())

['InvoiceId', 'CustomerId', 'InvoiceDate', 'BillingAddress', 'BillingCity', 'BillingState', 'BillingCountry', 'BillingPostalCode', 'Total']


### Functions

SQL:

- SELECT SUM(Total) FROM invoices

In [32]:
query = db.select([db.func.sum(invoices.columns.Total)])
ResultProxy = connection.execute(query)
ResultSet = ResultProxy.fetchall()
ResultSet

  util.warn(


[(Decimal('2328.60'),)]

### Group by

SQL:

- SELECT SUM(Total) as Total, BillingCity FROM invoices

In [34]:
query = db.select([db.func.sum(invoices.columns.Total).label('Total'), invoices.columns.BillingState]).group_by(invoices.columns.BillingState)
ResultProxy = connection.execute(query)
ResultSet = ResultProxy.fetchall()
ResultSet

[(Decimal('1150.00'), None),
 (Decimal('37.62'), 'AB'),
 (Decimal('37.62'), 'AZ'),
 (Decimal('38.62'), 'BC'),
 (Decimal('115.86'), 'CA'),
 (Decimal('37.62'), 'DF'),
 (Decimal('45.62'), 'Dublin'),
 (Decimal('39.62'), 'FL'),
 (Decimal('43.62'), 'IL'),
 (Decimal('37.62'), 'MA'),
 (Decimal('37.62'), 'MB'),
 (Decimal('37.62'), 'NS'),
 (Decimal('37.62'), 'NSW'),
 (Decimal('37.62'), 'NT'),
 (Decimal('37.62'), 'NV'),
 (Decimal('37.62'), 'NY'),
 (Decimal('75.24'), 'ON'),
 (Decimal('39.62'), 'QC'),
 (Decimal('37.62'), 'RJ'),
 (Decimal('37.62'), 'RM'),
 (Decimal('114.86'), 'SP'),
 (Decimal('47.62'), 'TX'),
 (Decimal('43.62'), 'UT'),
 (Decimal('40.62'), 'VV'),
 (Decimal('39.62'), 'WA'),
 (Decimal('42.62'), 'WI')]