# Mardi 02 Avril

# SQLAlchemy

## What you will learn in this course 🧐🧐

If you want to manipulate Databases, it is good for you to know the Python library: `sql-alchemy`.

In this course, you'll learn:

- Read & Write Data with SQL commands
- Read data from your datalake and load it to a proper database
- Read data from your database and load it to datalake
- Use Pandas features to read & write sql database

## Introduction to SQLAlchemy 🧙‍♂️🧙‍♂️

SQLAlchemy is the best way to handle relational databases using Python code. As you grow in your data career, you should know the basic principles as you will be handling databases.

### Structure of the API

SQLAlchemy is organized in layers:

![](https://docs.sqlalchemy.org/en/13/_images/sqla_arch_small.png)

You have two ways of handling SQL databases with SQLAlchemy:

- ORM: It stands for Object Relation Mapper where you'll be able to communicate with the database and create very flexible models.
- SQLAlchemy Core: This layer is more schema-centric and will allow you to very simply access and query your tables.

Let's check-out core functionnalities of both these layers.

### Nota Bene ➤ SQLlite

For the rest of this course, we'll be using SQLlite which is a very light version of a SQL database that is built-in. Therefore, we won't have to create any production database like PostgreSQL or MySQL. However, please note that SQLlite is NOT made for production and that you will be using other kind of DB (like PostgreSQL or MySQL) in a production environment.

### ORM

ORM uses python classes and instances to create and manipulate databases. Let's see how it works. Before diving into creating tables, let's create a db by connecting to it.

#### Create a connection

In [1]:
# Contient les infos de connection
import my_api_id as key

In [None]:
## Libraries to install if other connections than sqlite
# !pip install pymysql # For MySQL engines
# !pip install psycopg2-binary # For PostgreSQL engines
# !pip install sqlalchemy==2.0.0

In [2]:
# Install the right version of sqlalchemy
# !pip install sqlalchemy==2.0.0

# Import sqlalchemy
from sqlalchemy import create_engine, text

# Create engine will create a connection between a SQLlite DB and python
# engine = create_engine("sqlite:///:memory:", echo=True)
# engine = create_engine(f"mysql+pymysql://{DBUSER}:{DBPASS}@{DBHOST}:{PORT}/{DBNAME}", echo=True)
engine = create_engine(f"postgresql+psycopg2://{key.USERNAME}:{key.PASSWORD}@{key.HOSTNAME}/{key.DBNAME}", echo=True)
# engine = create_engine(f"postgresql+psycopg2://postgres:XXXXX@YYYYYY/postgres", echo=True)

In the above code, we created a "fake" database where we'll be using our computer's memory to mimic what a real database would be.

> NB: if you were to use a production database such as PostgreSQL, you would be using a connection just like this: `postgresql+psycopg2://{dbuser}:{dbpass}@{dbhost}/{dbname}`
>
> NB: if you were to use a MySQL production database, you would be using a connection just like this: `mysql+pymysql://{DBUSER}:{DBPASS}@{DBHOST}:{PORT}/{DBNAME}`

where you would need to specify:

```python
DBHOST = "HOST_FROM_AMAZON_RDS"
DBUSER = "USERNAME"
DBPASS = "PASSWORD"
DBNAME = "DBNAME"
PORT = "PORT"
# DBNAME = "postgres" --> If you are using PostgreSQL
```

#### Create a table

Let's now create a table. To do so, we'll be using a python class and instanciate it.

In [3]:
# Let's instanciate a declarative base to be able to use our python class
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base() # on récupère une classe Base qui décrit une base quelconque
                          

# Let's define our table using a class
from sqlalchemy import Column, Integer, String

class User(Base):                   # on defini une classe User qui hérite de Base
    __tablename__ = "users"         # le nom de la table
                                    # c'est toujours __tablename__

    # Each parameter corresponds to a column in our DB table
    # on aura 4 colonnes
    id = Column(Integer, primary_key=True)
    name = Column(String)
    fullname = Column(String)
    nickname = Column(String)

    def __repr__(self):
        return "<User(name='{}', fullname='{}', nickname='{}')>".format(self.name, self.fullname, self.nickname)

  Base = declarative_base() # on récupère une classe Base qui décrit une base quelconque


Here we represented our table `users` by class. As you can see, it contains 4 columns:

- `id` of type Integer
- `name` of type String
- `fullname` of type String
- `nickname` of type String

The `__repr__` method simply states how the output will be formated when we'll be calling our attributes.

Now, we need to create our table within our database. We can do this by using the `create_all` method from `Base.metadata` module.

In [4]:
# C'est ici qu'on crée vraiment la table sur RDS

Base.metadata.create_all(engine)

2024-04-02 13:28:34,795 INFO sqlalchemy.engine.Engine select pg_catalog.version()
2024-04-02 13:28:34,795 INFO sqlalchemy.engine.Engine [raw sql] {}
2024-04-02 13:28:34,817 INFO sqlalchemy.engine.Engine select current_schema()
2024-04-02 13:28:34,818 INFO sqlalchemy.engine.Engine [raw sql] {}
2024-04-02 13:28:34,827 INFO sqlalchemy.engine.Engine show standard_conforming_strings
2024-04-02 13:28:34,827 INFO sqlalchemy.engine.Engine [raw sql] {}
2024-04-02 13:28:34,837 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-04-02 13:28:34,839 INFO sqlalchemy.engine.Engine SELECT pg_catalog.pg_class.relname 
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace 
WHERE pg_catalog.pg_class.relname = %(table_name)s AND pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s, %(param_2)s, %(param_3)s, %(param_4)s, %(param_5)s]) AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname

#### Insert values

Let's insert values in our database:

In [4]:
# Create a new instance of User will allow us to insert a new record later on
ed_user = User(id=1, name='ed', fullname='Ed Jones', nickname='edsnickname')

# Access Full row
print(ed_user)

# Access ed_user name
name = ed_user.name
print("name: {}".format(name))

# Access ed_user nickname
nickname = ed_user.nickname
print("nickname: {}".format(nickname))

<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>
name: ed
nickname: edsnickname


We created data! As you can see we can access each column values simply by declaring a `.column_name`.

#### Persist values in db

Eventhough we created values, we haven't saved it in our database. We can do it by opening a `Session`:

In [5]:
# Initialize a sessionmaker
from sqlalchemy.orm import sessionmaker

Session = sessionmaker(bind=engine)

Here we created a `sessionmaker` which will allow us to talk to our database. The `bind` argument takes an `engine` as parameter which corresponds to our database.

In [6]:
# Create a new instance of User will allow us to insert a new record later on
al_user = User(id=2, name='al', fullname='Al Jones', nickname='alsnickname')

# Access Full row
print(al_user)

<User(name='al', fullname='Al Jones', nickname='alsnickname')>


In [7]:
# Instanciate Session
session = Session()

# Add values to db
session.add(ed_user)
session.add(al_user)

# Commit the results
session.commit()

2024-04-02 13:07:03,544 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-04-02 13:07:03,547 INFO sqlalchemy.engine.Engine INSERT INTO users (id, name, fullname, nickname) VALUES (%(id__0)s, %(name__0)s, %(fullname__0)s, %(nickname__0)s), (%(id__1)s, %(name__1)s, %(fullname__1)s, %(nickname__1)s)
2024-04-02 13:07:03,548 INFO sqlalchemy.engine.Engine [generated in 0.00012s (insertmanyvalues)] {'name__0': 'ed', 'nickname__0': 'edsnickname', 'id__0': 1, 'fullname__0': 'Ed Jones', 'name__1': 'al', 'nickname__1': 'alsnickname', 'id__1': 2, 'fullname__1': 'Al Jones'}
2024-04-02 13:07:03,560 INFO sqlalchemy.engine.Engine COMMIT


Good job! We added our first value inside our db 👏 Note that it is very important to use the `.commit()` method to actually persist the values you inserted when you called the `.add()` method.

#### Query values from a database

Now that we have some data inside our database, we can query it simply by using the `session` instance.

In [9]:
# Query our table users
user = session.query(User)

# Output all the results
user.all()

2023-10-03 08:47:08,197 INFO sqlalchemy.engine.Engine BEGIN (implicit)


INFO:sqlalchemy.engine.Engine:BEGIN (implicit)


2023-10-03 08:47:08,201 INFO sqlalchemy.engine.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname 
FROM users


INFO:sqlalchemy.engine.Engine:SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname 
FROM users


2023-10-03 08:47:08,209 INFO sqlalchemy.engine.Engine [generated in 0.00758s] ()


INFO:sqlalchemy.engine.Engine:[generated in 0.00758s] ()


[<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>,
 <User(name='al', fullname='Al Jones', nickname='alsnickname')>]

You can now also use any SQL statement if you want to run more complex queries:

In [8]:
from sqlalchemy import text

# Create a statement
statement = text("SELECT * FROM users where name=:name")
statement

<sqlalchemy.sql.elements.TextClause object at 0x000001641D946210>

As you can see, the query looks almost as a real SQL query. The only thing is the `:name` parameter in `where name=:name`.

This will allow us to insert values afterwards the following way:

In [9]:
session.query(User).from_statement(statement).params(name="ed").all()

2024-04-02 13:08:04,369 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-04-02 13:08:04,372 INFO sqlalchemy.engine.Engine SELECT * FROM users where name=%(name)s
2024-04-02 13:08:04,373 INFO sqlalchemy.engine.Engine [generated in 0.00089s] {'name': 'ed'}


[<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>]

As you can see, we used this `params` method that allows you to specify an operator from which you could apply your filter.

# SQLAlchemy Core

If you don't like classes and the declarative approach. You can use the SQLAlchemy Core layer of the library. Let's start by creating a new database.

In [2]:
# from sqlalchemy import create_engine
#   engine = create_engine('sqlite:///:memory:', echo=True)

#### Create Tables

You can simply create tables the following way.

In [10]:
from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey

meta = MetaData()

# Define table "students"
students = Table(
    'students', 
    meta,
    Column('id', Integer, primary_key = True),
    Column('name', String),
    Column('lastname', String),
)

# Define table "adresses"
addresses = Table(
    'addresses', meta,
    Column('id', Integer, primary_key = True),
    Column('email_address', String),
    Column("student_id", None, ForeignKey("students.id"))
)

As you can see, I simply declared two tables with the `Table` class. Pay attention to `meta` as well which basically contain all the metadata (additional information) necessary to create an actual table.

Let's now `create_all` tables to actually create them within our database.

In [11]:
meta.create_all(engine)

2024-04-02 13:08:20,420 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-04-02 13:08:20,421 INFO sqlalchemy.engine.Engine SELECT pg_catalog.pg_class.relname 
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace 
WHERE pg_catalog.pg_class.relname = %(table_name)s AND pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s, %(param_2)s, %(param_3)s, %(param_4)s, %(param_5)s]) AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s
2024-04-02 13:08:20,422 INFO sqlalchemy.engine.Engine [cached since 290.9s ago] {'table_name': 'students', 'param_1': 'r', 'param_2': 'p', 'param_3': 'f', 'param_4': 'v', 'param_5': 'm', 'nspname_1': 'pg_catalog'}
2024-04-02 13:08:20,435 INFO sqlalchemy.engine.Engine SELECT pg_catalog.pg_class.relname 
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace 
WHERE pg_catalog.p

#### Insert values

If you need to insert values, you can do it the following way:

In [12]:
ins = students.insert().values(id="1", name="Jack", lastname="Johnson")
ins

<sqlalchemy.sql.dml.Insert object at 0x000001641D95B490>

This hasn't really inserted values just yet. You will need to first:

1. Create a connecion to your db
2. Execute the query

In [13]:
# Connect to the db
conn = engine.connect()

# Execute the query
result = conn.execute(ins)
conn.commit()

2024-04-02 13:09:18,267 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-04-02 13:09:18,268 INFO sqlalchemy.engine.Engine INSERT INTO students (id, name, lastname) VALUES (%(id)s, %(name)s, %(lastname)s)
2024-04-02 13:09:18,268 INFO sqlalchemy.engine.Engine [generated in 0.00137s] {'id': '1', 'name': 'Jack', 'lastname': 'Johnson'}
2024-04-02 13:09:18,278 INFO sqlalchemy.engine.Engine COMMIT


If you need to insert multiple values, you can do it simply by specifying a list of dictionnaries:

In [14]:
values = [
    {'student_id': 1, 'email_address' : 'jack@yahoo.com'},
    {'student_id': 1, 'email_address' : 'jack@msn.com'}
]

conn.execute(addresses.insert(), values)
conn.commit()

2024-04-02 13:09:24,805 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-04-02 13:09:24,806 INFO sqlalchemy.engine.Engine INSERT INTO addresses (email_address, student_id) VALUES (%(email_address__0)s, %(student_id__0)s), (%(email_address__1)s, %(student_id__1)s)
2024-04-02 13:09:24,807 INFO sqlalchemy.engine.Engine [generated in 0.00140s (insertmanyvalues)] {'student_id__0': 1, 'email_address__0': 'jack@yahoo.com', 'student_id__1': 1, 'email_address__1': 'jack@msn.com'}
2024-04-02 13:09:24,817 INFO sqlalchemy.engine.Engine COMMIT


#### Query values

If you need to query your database, you can do it via `text` method and use actual sql queries.

In [15]:
from sqlalchemy.sql import text

# Create a statement
stmt = text("SELECT students.id, addresses.id, students.name, addresses.email_address FROM students "
            "JOIN addresses ON students.id=addresses.student_id "
            "WHERE students.id = 1")

result = conn.execute(stmt)

2024-04-02 13:09:31,493 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-04-02 13:09:31,493 INFO sqlalchemy.engine.Engine SELECT students.id, addresses.id, students.name, addresses.email_address FROM students JOIN addresses ON students.id=addresses.student_id WHERE students.id = 1
2024-04-02 13:09:31,493 INFO sqlalchemy.engine.Engine [generated in 0.00143s] {}


What is important to notice is:

- Each line of the query is represented by a string. If you query needs to be on several lines, you need to add a space (" ") before going to the next line.
- The `.columns()` works like a `.format()` method. This is where you will be specifying the names of the columns, you want to insert.
- Finally, for each columns we specified `table_name.c.column_name`. This is how would select a column from a given table.

Now if you want to check the actual results. You can use the `.fetchall` method that will output the results as a list.

In [16]:
result.fetchall()

[(1, 1, 'Jack', 'jack@yahoo.com'), (1, 2, 'Jack', 'jack@msn.com')]

## Simple things out with Pandas 😌

SQLAlchemy is the fundamental knowledge when dealing with SQL databases with Python as it's broadly used by web-development libraries and others. However, there are simpler way to insert and query databases that would work most of the time: Pandas.

### Read SQL Databases with Pandas

Most of the time, you were using pandas with CSV files or Excel spreadsheets but you can also use this library with SQL databases. You would use the `read_sql` method.

In [17]:
import pandas as pd

# Create a statement
# Within the text() method is a SQL query. Check out our SQL reminder course if you feel a little rusty
stmt = text("SELECT students.id, students.name, addresses.email_address FROM students "
            "JOIN addresses ON students.id=addresses.student_id "
            "WHERE students.id = 1")

df = pd.read_sql_query(con=engine.connect(), sql=stmt)

df.head()

2024-04-02 13:10:02,430 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-04-02 13:10:02,431 INFO sqlalchemy.engine.Engine SELECT students.id, students.name, addresses.email_address FROM students JOIN addresses ON students.id=addresses.student_id WHERE students.id = 1
2024-04-02 13:10:02,431 INFO sqlalchemy.engine.Engine [generated in 0.00116s] {}


Unnamed: 0,id,name,email_address
0,1,Jack,jack@yahoo.com
1,1,Jack,jack@msn.com


Now you have a very nice DataFrame that you can easily manipulate!

Note that we still used SQLAlchemy as we create a statement `stmt` as well as an `engine` for pandas to be able to create it.

### Update your database with Pandas

The same way you can read sql, you can also write sql with pandas.

In [18]:
# Create a new column
df["great_new_column"] = 0
df.head()

Unnamed: 0,id,name,email_address,great_new_column
0,1,Jack,jack@yahoo.com,0
1,1,Jack,jack@msn.com,0


In [19]:
# Push this new dataframe to our sql database
df.to_sql(
    "brand_new_table",
    engine
)

2024-04-02 13:10:15,990 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-04-02 13:10:15,992 INFO sqlalchemy.engine.Engine SELECT pg_catalog.pg_class.relname 
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace 
WHERE pg_catalog.pg_class.relname = %(table_name)s AND pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s, %(param_2)s, %(param_3)s, %(param_4)s, %(param_5)s]) AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s
2024-04-02 13:10:15,993 INFO sqlalchemy.engine.Engine [cached since 406.5s ago] {'table_name': 'brand_new_table', 'param_1': 'r', 'param_2': 'p', 'param_3': 'f', 'param_4': 'v', 'param_5': 'm', 'nspname_1': 'pg_catalog'}
2024-04-02 13:10:16,003 INFO sqlalchemy.engine.Engine 
CREATE TABLE brand_new_table (
	index BIGINT, 
	id BIGINT, 
	name TEXT, 
	email_address TEXT, 
	great_new_column BIGINT
)


2024-04-02 13:10:16,004 INFO sqlalchemy.eng

2

In [20]:
# Let's query it
stmt = text("SELECT * FROM brand_new_table")
result = conn.execute(stmt)
result.fetchall()

2024-04-02 13:10:20,908 INFO sqlalchemy.engine.Engine SELECT * FROM brand_new_table
2024-04-02 13:10:20,908 INFO sqlalchemy.engine.Engine [generated in 0.00067s] {}


[(0, 1, 'Jack', 'jack@yahoo.com', 0), (1, 1, 'Jack', 'jack@msn.com', 0)]

## Resources 📚📚

- <a href="https://www.youtube.com/watch?v=woKYyhLCcnU" target="_blank">Introduction to SQLAlchemy</a>
- <a href="https://docs.sqlalchemy.org/en/13/intro.html" target="_blank">SQLAlchemy Overview</a>
- <a href="https://docs.sqlalchemy.org/en/13/orm/tutorial.html" target="_blank">Object Relational Tutorial</a>
- <a href="https://docs.sqlalchemy.org/en/13/core/tutorial.html" target="_blank">SQL Expression Language Tutorial</a>
- <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html#pandas.read_sql" target="_blank">Pandas Read SQL</a>
- <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html" target="_blank">Pandas To SQL</a>