<a href="https://colab.research.google.com/github/LUMC/EfDS_RelDB_SQL/blob/main/orm_practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Object Relational Mapper

## Summary

### The main goal

The goal of this session is to learn how to provide a Python object oriented interface to a database.  
Since this is a complex task, specialised libraries help to provide such functionality.  
Here, we will use a part of the `SQLAlchemy` library called [Object Relational Mapper](https://docs.sqlalchemy.org/en/14/orm/index.html) (ORM).

*Note:* [The PYSheet cheatsheet](https://www.pythonsheets.com/notes/python-sqlalchemy.html) is useful once you understand the basic SQLAlchemy and Object Relational Mapper concepts.


### Overview of the steps

This lecture describes a growing code and is intended to be followed sequentially. Subsequent steps have the following goals:
- Create a class corresponding to a single database table (relations are ignored).
- Query the database and get object(s) of the class. Build simple and more complex queries.
- Extend the class with a method for nicer content printing.
- Create another class for another related table. Learn how do declare foreign keys and add methods representing relations.
- In a new transaction add new content to the database. Commit.
- In a new transaction show the newly added content.

### Some details

Below there is (again) the diagram from [the SQLite Tutorial](https://www.sqlitetutorial.net) of the `chinook` database.

![chinook scheme](https://www.sqlitetutorial.net/wp-content/uploads/2015/11/sqlite-sample-database-color.jpg)

We will implement classes providing access to a small part of the database:
- It should be possible to have classes like `Album` or `Track`.
- The class `Album` should have a field like `title`.
- The class `Album` should have a method `tracks` which returns respective `Track` objects for "the current" `Album` object.
- When a field is changed, the database should get updated.

## Preparation


The following tools/sources will be used and they need to get installed:
- `chinook.db` is the database (a file in `SQLite` database format)
- `SQLAlchemy` is the Python library which:
    - provides unified SQL access to databases of many formats, including `SQLite`
    - provides Object Relational Mapper functionality


### Download of the database

#### (For Jupyter or Colab) Download chinook.zip from SQLiteTutorial and unpack it

The following lines have been tested in Jupyter (works) and Colab (mostly works):

In [None]:
import urllib.request    # needed for download of the example database
import shutil            # needed for unziping of the example database
import os                # for checking existence/removing of a file

if not os.path.isfile("chinook.db"):
    urllib.request.urlretrieve("https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip", "chinook.zip")
    shutil.unpack_archive("chinook.zip")
    os.remove("chinook.zip")

#### (For Colab only) Download chinook.zip from your Google Drive to Colab temp drive

If you use Google Colab and the above method fails you may first manually download the `chinook.zip` file to your Google drive and then try this approach:

In [None]:
from google.colab import drive    # needed to access your Google drive
import shutil            # needed for unziping of the example database
import os                # for checking existence/removing of a file

if not os.path.isfile("chinook.db"):
    drive.mount('/content/gdrive')
    shutil.unpack_archive("gdrive/MyDrive/chinook.zip")

### SQLAlchemy components

The demonstrations will use multiple functions and classes of the SQLAlchemy library.

In [None]:
from sqlalchemy import create_engine, Column, Integer, String, ForeignKey, MetaData
from sqlalchemy.orm import relationship, declarative_base, sessionmaker

### Database connection and transaction

When using the SQLAlchemy library, the following objects are used to interact with the database. 
- `engine`: Provides a main connection to a database.
- `sessionMaker`: An object which provides a method to start a new transaction.
- `session`: A newly created transaction. Database changes executed through this object will get written when the object gets destroyed (so therefore the typical use is: `with sessionMaker() as session`, but at the moment let's create a global `session`).

In [None]:
engine = create_engine("sqlite:///chinook.db",echo=False)
sessionMaker = sessionmaker(bind=engine)
session = sessionMaker()

## Class describing a table

This is a code pattern to describe a (row of a) single table in the database.  
This step is independent on the database (does not load the table column names/types from the database).  
This code needs to be adjusted for each table which is going to be created or modified.

Let's study the following description referring to the table `albums` from the `chinook` database:
- `Base`: This object normally is created once and it internally stores all information about the structure of the database. (*Note:* In this notebook we will recreate this object multiple times to allow changes in descriptions of the tables.)
- `Album`: This is a newly created class and it will conceptually represent a single row of the table.
- `__tablename__`: This field defines the name of the table as in the database.
- `AlbumId`: This is a name of a `Column` as in the database. Moreover, it specifies that the column keeps `Integer` numbers, and that the column belogs to the `primary_key` of this table.
- `Title`: As above, but the column keeps texts (`String` of max. 160 characters).

For more info check: [Column and Data Types](https://docs.sqlalchemy.org/en/14/core/type_basics.html).

In [None]:
Base = declarative_base()

class Album(Base):
  __tablename__ = "albums"
  
  AlbumId = Column(Integer,primary_key=True)
  Title = Column(String(160))
  ArtistId = Column(Integer)

## Table queries returning objects

The SQLAlchemy library provides functionality to automatically build SQL queries, execute them, and convert the results to objects of the declared tables. Study the examples below...

### SQL query is built by the library

The `session` object provides a function `query`. The arguments of this function define which table is to be asked for. Observe, how `session.query(Album)` builds automatically the SQL query based on the table description provided in the `Album` class:

In [None]:
print( session.query(Album) )

Using additionally the `limit` function we can obtain an SQL query referring to a few rows only:

In [None]:
print( session.query(Album).limit(3) )

Or, using the `filter` function we may generate SQL `where` clauses to select rows:

In [None]:
print( session.query(Album).filter(Album.AlbumId == 5) )

### Getting a single object

We know that `AlbumId` is the primary key of the `albums` table. So, there should be just one row for an album with (for example) `AlbumId` of `5`.  
The following code will run the SQL query and produce the `one` corresponding `Album` object:

In [None]:
resA = session.query(Album).filter(Album.AlbumId == 5).one()
print(type(resA))
print(resA)

You can see that indeed an object of the class `Album` was produced, although its content are not shown (will be discussed next). Note though, that the values from the table columns can be accessed conveniently as the class fields:

In [None]:
resA.AlbumId

In [None]:
resA.Title

### Nice printing of object fields

Redefine the `Album` object a add a method `__repr__`. This is a function used to get the text representation of an object when it gets `print`ed:

In [None]:
Base = declarative_base() # normally present once in a script!

class Album(Base):
  __tablename__ = "albums"
  
  AlbumId = Column(Integer,primary_key=True)
  Title = Column(String(160))
  ArtistId = Column(Integer)

  def __repr__(self):
    return "Album(AlbumId='%s', Title='%s', ArtistId='%s')" % (self.AlbumId, self.Title, self.ArtistId)

Let's again get the object and print it:

In [None]:
resA = session.query(Album).filter(Album.AlbumId == 5).one()
print(resA)

### Getting many objects (list)

When a query may return any number of objects use the method `.all()` instead of `.one()`. The returned value will be a list with all query results. That elements of the list will be objects representing the table rows:

In [None]:
resAs = session.query(Album).limit(3).all()
print( type( resAs ) )
print( resAs )

Let's consider the first list element in more detail. Check its `type`, print it, and practice access to data from each table column:

In [None]:
resA = resAs[1]
print( type( resA ) )
print( resA )
print( resA.Title )

### Getting many objects (for loop)

The following code might be used to access resulting objects one by one:

In [None]:
for r in session.query(Album).limit(3):
  print( r )

Or using list comprehension:

In [None]:
[a.Title for a in session.query(Album).limit(3)]

### Practice



1.   Have a look into [the documentation](https://docs.sqlalchemy.org/en/14/orm/tutorial.html#returning-lists-and-scalars) and understand the functions: `one()`, `first()`, `all()`, `one_or_none()`.
1.   Calculate the number of `Album`s in the database. Construct the query and use the [`count()`](https://docs.sqlalchemy.org/en/14/orm/tutorial.html#counting) function.



## Another class describing a related table

### Class (partially) describing another table

In [None]:
Base = declarative_base() # normally present once in a script!

class Album(Base):
  __tablename__ = "albums"
  
  AlbumId = Column(Integer,primary_key=True)
  Title = Column(String(160))
  ArtistId = Column(Integer)

  def __repr__(self):
    return "Album(AlbumId='%s', Title='%s', ArtistId='%s')" % (self.AlbumId, self.Title, self.ArtistId)

class Track(Base):
  __tablename__ = "tracks"

  TrackId = Column(Integer,primary_key=True)
  Name = Column(String(200))
  AlbumId = Column(Integer)

  def __repr__(self):
    return "Track(TrackId='%s', Name='%s', AlbumId='%s')" % (self.TrackId, self.Name, self.AlbumId)

### Observing a relation between two tables

Check in the database diagramm:
- a `Track` refers to zero or one album
- an `Album` refers to zero or more tracks

Let's study the `Album` with `AlbumId` of 5:

In [None]:
resA = session.query(Album).filter(Album.AlbumId == 5).one()
print(resA)

These are the `Track`s which refer to this album:

In [None]:
for resT in session.query(Track).filter(Track.AlbumId == 5).all():
  print(resT)

In [None]:
resT = session.query(Track).filter(Track.TrackId == 105).one()
resT

### Generating methods representing relations

From the object oriented programming point of view it would be useful to:
- Have a method `album()` in the class `Track`.
- Have a method `tracks()` in the class `Album`.

To define such a relation the following two changes are needed (look at the code below):
- In the `Track` class:

  the field `AlbumId` gets declared as a `ForeginKey` to `albums.AlbumId` (so a foreign key to the table `albums` where the key is in column `AlbumId`). With `nullable` you can specify whether it is allowed that a `Track` does not refer to any `Album`.
- In the `Album` class:

  the field `tracks` gets declared as a relationship to the class `Track`. In the class `Track` the reverse relationship should be called `album`.

See [relationship patterns](https://docs.sqlalchemy.org/en/14/orm/basic_relationships.html#relationship-patterns) for more details.

In [None]:
Base = declarative_base() # normally present once in a script!

class Album(Base):
  __tablename__ = "albums"
  
  AlbumId = Column(Integer,primary_key=True)
  Title = Column(String(160))
  ArtistId = Column(Integer)
  tracks = relationship("Track", backref="album")               # <<<< HERE

  def __repr__(self):
    return "Album(AlbumId='%s', Title='%s', ArtistId='%s')" % (self.AlbumId, self.Title, self.ArtistId)

class Track(Base):
  __tablename__ = "tracks"

  TrackId = Column(Integer,primary_key=True)
  Name = Column(String(200))
  AlbumId = Column(ForeignKey('albums.AlbumId'), nullable=True) # <<<< HERE

  def __repr__(self):
    return "Track(TrackId='%s', Name='%s', AlbumId='%s')" % (self.TrackId, self.Name, self.AlbumId)

Let's get an `Album` object `resA` now:

In [None]:
resA = session.query(Album).filter(Album.AlbumId == 5).one()
print(resA)

And see the value of the new field `resA.tracks`:

In [None]:
for resT in resA.tracks:
  print(resT)

In [None]:
resA.tracks[1]

In [None]:
resA.tracks[1].album

## Adding objects with new data to database

Let's consider the goal of adding a new `Album` to the database. Note (check on the database diagramm) that each album must refer to exactly one artist. So it is impossible to add an album without a proper reference to an artist. Consequently, we first need to properly describe the relationship between `albums` and `artists` tables. Then, we will choose a random artist and add a new fictional album of that artist.

### Another table, another relationship

The following modifications are made here (try yourself to introduce these modifications without looking into the code below):
- New table description class `Artist`
- The `Album` gets a proper foregin key to `Artist`
- The `Artist` gets a relationship to `Album` with field names `Artist.albums` and `Album.artist`.

In [None]:
Base = declarative_base() # normally present once in a script!

class Album(Base):
  __tablename__ = "albums"
  
  AlbumId = Column(Integer,primary_key=True)
  Title = Column(String(160))
  ArtistId = Column(ForeignKey('artists.ArtistId'), nullable=False) # <<<< HERE
  tracks = relationship("Track", backref="album")

  def __repr__(self):
    return "Album(AlbumId='%s', Title='%s', ArtistId='%s')" % (self.AlbumId, self.Title, self.ArtistId)

class Track(Base):
  __tablename__ = "tracks"

  TrackId = Column(Integer,primary_key=True)
  Name = Column(String(200))
  AlbumId = Column(ForeignKey('albums.AlbumId'), nullable=True)

  def __repr__(self):
    return "Track(TrackId='%s', Name='%s', AlbumId='%s')" % (self.TrackId, self.Name, self.AlbumId)

class Artist(Base):
  __tablename__ = "artists"

  ArtistId = Column(Integer,primary_key=True)
  Name = Column(String(120))
  albums = relationship("Album", backref="artist")                  # <<<< HERE

  def __repr__(self):
    return "Artist(ArtistId='%s', Name='%s')" % (self.ArtistId, self.Name)


Let's check the new Artist-Album relationship:

In [None]:
print( session.query(Artist).count() )
print( session.query(Artist).first() )

### Adding a new object (related to existing data/objects)

So far the lifetime of the session object was ignored. But it defines the transactions. When changes to objects are made they get written when `commit()` of the session is called. Note, that if multiple session objects exist simulatnously, the data between them are not synchronized. Consequently, checking newly added (modified) content after `commit()` cannot be done through another session object which existed before `commit()`.

Let's list the `Album`s of Metallica existing in the database (through a context-managed session object `querySession` created in a `with` block):

In [None]:
with sessionMaker() as querySession:
  resArtist = querySession.query(Artist).filter(Artist.Name=="Metallica").one()
  print(resArtist, "\n") 
  for a in resArtist.albums: print( a )

Here is an example of adding a new `Album`. Note, that the `AlbumId` will get automatically generated and that the `ArtistId` will be automatically collected when `theArtist` is assigned to the `artist` relationship field. The changes will get written to the database the moment of `commit()`.

In [None]:
with sessionMaker() as addSession:
  theArtist = addSession.query(Artist).filter(Artist.Name=="Metallica").one()
  newAlbum = Album(Title='Hardwired... to Self-Destruct') # new Album object, only in memory
  newAlbum.artist = theArtist                             # building relation to theArtist
  addSession.add(newAlbum)                                # adding Album to "be written"
  addSession.commit()                                     # writing to the database
  print(newAlbum)

In [None]:
with sessionMaker() as querySession:
  resArtist = querySession.query(Artist).filter(Artist.Name=="Metallica").one()
  print(resArtist, "\n") 
  for a in resArtist.albums: print( a )

## Multistep exercise

The final goal of this session is to build a class providing access to a database.  
This class should provide all needed database content through a well defined interface.  

Follow the points below in the provided order (later points depend on earlier).  
Try to implement each step without checking the solution provided below.  
Try to type. Do not copy-paste.

1. Define a new class `Chinook`. This class will contain all top-level functions to access the database. The functions will be added gradually.
1. In the class `Chinook`: 
  - Define the class constructor `__init__(self, fileName)`.
  - The `fileName` argument will provide the name of the file with the database.
1. In the class `Chinook`:
  - In the constructor create the engine and store it in attribute `self._engine`. 
  - Similarly, in `self._sessionMaker` store the result of `sessionmaker` call.
1. Outside the class:
  - Create an object of the class `Chinook`. 
  - Use the proper `fileName` argument.
  - Store the new object in `db`.
1. In the class `Chinook`:
  - Define a new method `newSession(self)`. 
  - Use `self._sessionMaker` to create a new session and return it.
1. Outside the class:
  - Recreate the object in `db` with the updated class. 
  - Test whether you get a session object when you call the `newSession` method.
1. In the class `Chinook`:
  - Define a new method `addArtist(self, name)`. 
  - The method should add a new artist of `name` to the database. 
  - Perform adding within a new local session. 
  - Remember to call `commit` in the session. 
  - The function should return the `ArtistId` of the newly added artist.
1. In the class `Chinook`:
  - Define a new method `getArtist(self, artistId)`. 
  - The method should return an `Artist` object of the provided `artistId`.
1. Outside the class:
  - Recreate the object in `db` with the updated class. 
  - Write the code to `addArtist` with a randomly chosen name.
  - Store the returned `artistId` value in a variable.
  - Check whether you can get the artist back with `getArtist`.
1. In the class `Chinook`:
  - Define a new method `allArtists(self)`. 
  - The method should return a list of all `Artist` objects present in the database.
1. Outside the class:
  - Recreate the object in `db` with the updated class. 
  - Use the result of `db.allArtists()` in a `for` loop to print all artists from the database.
  - Find how to print only the last 10 of the returned objects.
1. Outside the class:
  - In `firstNames` create a vector with several random popular first names.
  - Similar for `surNames`.
  - From package `random` use `choice` method. It randomly selects an element from a list.
  - Create a random name by concatenating a random first name and a random surname from your lists.
  - Insert above into a loop which adds to the database 10 artists with randomly generated names.
  - Print last 10 artists to check whether they were indeed added.
1. Finally, copy the code to a new Jupyter/Colab notebook:
  - Does the code run?
  - Add all necessary `import` and class definitions (e.g. `Artist`).

In [None]:
class Chinook:
  def __init__(self, fileName):
    addr = "sqlite:///" + fileName
    self._engine = create_engine(addr,echo=False)
    self._sessionMaker = sessionmaker(bind=engine)

  def newSession(self):
    return self._sessionMaker()

  def addArtist(self, name):
    with self.newSession() as ses:
      a = Artist(Name=name)
      ses.add(a)
      ses.commit()
      return a.ArtistId

  def getArtist(self,artistId):
    with self.newSession() as ses:
      return ses.query(Artist).where(Artist.ArtistId == artistId).one()

  def allArtists(self):
    with self.newSession() as ses:
      return ses.query(Artist).all()

db = Chinook("chinook.db")    
aId = db.addArtist(name="The Singer")
print( "The ArtistId of the new artist is: ", aId)
print( db.getArtist(artistId=aId))

from random import choice

firstNames = [ "John", "Johan", "Jan", "Ivan" ]
lastNames = [ "Smith", "Kowalski", "Kovalsky" ]

for i in range(5):
  n = choice(firstNames) + " " + choice(lastNames)
  db.addArtist(name=n)

for a in db.allArtists()[-8:-1]:
  print(a)