<a href="https://colab.research.google.com/github/alin2025/My_Code_Example/blob/main/SQLAlchemy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [122]:
import pandas as pd
from sqlalchemy import create_engine, Table, MetaData, Column, Integer, String, Float,inspect
from sqlalchemy import select, desc
from sqlalchemy.sql import func,text
import sqlalchemy

> **Reference:** [Excellent SQLAlchemy tutorial with examples by Vinay Kudari](https://towardsdatascience.com/sqlalchemy-python-tutorial-79a577141a91)

# SQLAlchemy basic concepts

[**SQLAlchemy**](http://docs.sqlalchemy.org/en/latest/core/engines.html) provides a nice “Pythonic” way of interacting with databases. So rather than dealing with the differences between specific dialects of traditional SQL such as MySQL or PostgreSQL or Oracle, you can leverage the Pythonic framework of SQLAlchemy to streamline your workflow and more efficiently query your data.

In this example we will interact with an [**SQLite**](http://www.sqlitetutorial.net/) database, which is a C library that provides a lightweight disk-based database. We will work in front of an existing DB called [`chinook`](http://www.sqlitetutorial.net/sqlite-sample-database).

## Engine

[Engine](http://docs.sqlalchemy.org/en/latest/core/connections.html#sqlalchemy.engine.Engine) is the most fundamental object of SQLAlchemy, and it defines the database we work with.

In [123]:
engine = sqlalchemy.create_engine('sqlite:///chinook.db', echo=False)

In [124]:
Inspector=inspect(engine)
Inspector.get_table_names()

['albums',
 'artists',
 'customers',
 'employees',
 'genres',
 'invoice_items',
 'invoices',
 'media_types',
 'playlist_track',
 'playlists',
 'tracks']

## MetaData

Database tables in SQLAlchemy belong (are linked to) a [metadata](https://docs.sqlalchemy.org/en/13/core/connections.html#connectionless-execution-implicit-execution) object.

In [62]:
metadata = sqlalchemy.MetaData()

In [65]:
metadata.tables.keys()

dict_keys(['albums', 'artists'])

## [`Table`](https://docs.sqlalchemy.org/en/13/core/metadata.html#sqlalchemy.schema.Table)

In [64]:
albums = Table('albums', metadata, autoload_with=engine)

In [66]:
type(albums)

sqlalchemy.sql.schema.Table

In [67]:
albums.c.keys()

['AlbumId', 'Title', 'ArtistId']

In [68]:
metadata.tables

FacadeDict({'albums': Table('albums', MetaData(), Column('AlbumId', INTEGER(), table=<albums>, primary_key=True, nullable=False), Column('Title', NVARCHAR(length=160), table=<albums>, nullable=False), Column('ArtistId', INTEGER(), ForeignKey('artists.ArtistId'), table=<albums>, nullable=False), schema=None), 'artists': Table('artists', MetaData(), Column('ArtistId', INTEGER(), table=<artists>, primary_key=True, nullable=False), Column('Name', NVARCHAR(length=120), table=<artists>), schema=None)})

> **Discussion:** What is the difference between Python's Table and SQL's Table? Mention the concept of DB API.

## Connection

The `connect()` method returns a [*Connection*][con] object, through which we can send commands to the database.

[con]: http://docs.sqlalchemy.org/en/latest/core/connections.html#sqlalchemy.engine.Connection "Connection docs"

In [93]:
conn = engine.connect()

# Executing DB operations

Using SQLAlchemy, the `Connection`'s [`execute(object_)`](https://docs.sqlalchemy.org/en/13/core/connections.html#sqlalchemy.engine.Connection.execute) method executes SQL commands in 2 optional ways, either by sending explicit SQL commands, or by wrapping them with Pythonic objects.

## SQL statements

A straight-forward approach would be to use our connection and "send" SQL commands.

In [70]:
query = text('''
SELECT * FROM albums
WHERE Title LIKE '%the best of%'
''')

In [71]:
result = conn.execute(query)

In [72]:
type(result)

sqlalchemy.engine.cursor.CursorResult

> **Note:** This is the one place Python is case insensitive...

The result is a [`ResultProxy`](https://docs.sqlalchemy.org/en/13/core/connections.html#sqlalchemy.engine.ResultProxy). which is an iterator.

In [73]:
result.fetchmany(5)

[(13, 'The Best Of Billy Cobham', 10),
 (20, 'The Best Of Buddy Guy - The Millenium Collection', 15),
 (47, 'The Best of Ed Motta', 37),
 (61, "Knocking at Your Back Door: The Best Of Deep Purple in the 80's", 58),
 (83, 'My Way: The Best Of Frank Sinatra [Disc 1]', 85)]

> **Note:** `ResultProxy` is an iterator. What will happen if we run `fetchmany()` again?

> **Discussion:** To execute queries this way you have to be proficient in SQL. Pros and cons...

## [`ClauseElement`](https://docs.sqlalchemy.org/en/latest/core/sqlelement.html#sqlalchemy.sql.expression.ClauseElement)

**The power of an API lies in its objects**, and SQLAlchemy provides "Pythonic" objects to represent SQL functionalities. More specifically, we will be interested in  [`FromClause`](https://docs.sqlalchemy.org/en/13/core/selectable.html#sqlalchemy.sql.expression.FromClause) elements.

> **Reference:** More information about the available expressions (virtually most of SQL functionality) can be found in this [SQL expression language tutorial by SQLAlchemy](https://docs.sqlalchemy.org/en/13/core/tutorial.html#sql-expression-language-tutorial).

### Example 1 - select

**Task - Show the names of the employees and their job title.**

We will demonstrate the API with the [`select()`](https://docs.sqlalchemy.org/en/13/core/selectable.html#sqlalchemy.sql.expression.select) method, which returns a [`Select`](https://docs.sqlalchemy.org/en/13/core/selectable.html?highlight=select#sqlalchemy.sql.expression.Select) object.

Let's demonstrate with the *employees* table.

In [74]:
employees = Table('employees', metadata, autoload_with=engine)

In [88]:
metadata.tables.keys()

dict_keys(['albums', 'artists', 'employees'])

In [None]:
type(employees)

sqlalchemy.sql.schema.Table

When we create a `select()` construct, SQLAlchemy looks around at the tables we’ve mentioned and then places them in the FROM clause of the statement. We can select the entire table or specific columns.

In [76]:
query = sqlalchemy.select(employees)


#query = select([employees.c.EmployeeId, employees.c.FirstName, employees.c.LastName, employees.c.Title])

# query = select(
#     employees.c.EmployeeId,
#     employees.c.FirstName,
#     employees.c.LastName,
#     employees.c.Title
# )

In [None]:
type(query)

sqlalchemy.sql.selectable.Select

> **Discussion:** Discuss the [`Select`](https://docs.sqlalchemy.org/en/13/core/selectable.html?highlight=select#sqlalchemy.sql.expression.Select) object and its available properties.

In [77]:
print(str(query))

SELECT employees."EmployeeId", employees."LastName", employees."FirstName", employees."Title", employees."ReportsTo", employees."BirthDate", employees."HireDate", employees."Address", employees."City", employees."State", employees."Country", employees."PostalCode", employees."Phone", employees."Fax", employees."Email" 
FROM employees


In [78]:
result = conn.execute(query)

In [None]:
type(result)

sqlalchemy.engine.cursor.CursorResult

In [79]:
result.fetchall()

[(1, 'Adams', 'Andrew', 'General Manager', None, datetime.datetime(1962, 2, 18, 0, 0), datetime.datetime(2002, 8, 14, 0, 0), '11120 Jasper Ave NW', 'Edmonton', 'AB', 'Canada', 'T5K 2N1', '+1 (780) 428-9482', '+1 (780) 428-3457', 'andrew@chinookcorp.com'),
 (2, 'Edwards', 'Nancy', 'Sales Manager', 1, datetime.datetime(1958, 12, 8, 0, 0), datetime.datetime(2002, 5, 1, 0, 0), '825 8 Ave SW', 'Calgary', 'AB', 'Canada', 'T2P 2T3', '+1 (403) 262-3443', '+1 (403) 262-3322', 'nancy@chinookcorp.com'),
 (3, 'Peacock', 'Jane', 'Sales Support Agent', 2, datetime.datetime(1973, 8, 29, 0, 0), datetime.datetime(2002, 4, 1, 0, 0), '1111 6 Ave SW', 'Calgary', 'AB', 'Canada', 'T2P 5M5', '+1 (403) 262-3443', '+1 (403) 262-6712', 'jane@chinookcorp.com'),
 (4, 'Park', 'Margaret', 'Sales Support Agent', 2, datetime.datetime(1947, 9, 19, 0, 0), datetime.datetime(2003, 5, 3, 0, 0), '683 10 Street SW', 'Calgary', 'AB', 'Canada', 'T2P 5G3', '+1 (403) 263-4423', '+1 (403) 263-4289', 'margaret@chinookcorp.com'),


### Example 2 - select-where

**Task - Show the names of the albums which contain the phrase "The best of"**.

We can use the `whereclause` argument of `select()`

In [80]:
query = select(
    albums.c.Title
).where(
    albums.c.Title.like('%the best of%')
)

Alternatively, we can use the `where()` method of the `Select` object.

In [None]:
query = select(albums).where(albums.columns.Title.like('%the best of%'))

In [81]:
result = conn.execute(query)

In [82]:
result.fetchmany(5)

[('The Best Of Billy Cobham',),
 ('The Best Of Buddy Guy - The Millenium Collection',),
 ('The Best of Ed Motta',),
 ("Knocking at Your Back Door: The Best Of Deep Purple in the 80's",),
 ('My Way: The Best Of Frank Sinatra [Disc 1]',)]

### Example 3 - join

**Task - Show the names of all the albums and their artists.**

In [87]:
artists = Table('artists', metadata, autoload_with=engine)
albums = Table('albums', metadata, autoload_with=engine)

We use the [`join()`](https://docs.sqlalchemy.org/en/13/core/metadata.html?highlight=join#sqlalchemy.schema.Table.join) method.

In [84]:
join_stmt = artists.join(albums, artists.c.ArtistId == albums.c.ArtistId)

In [85]:
print(type(join_stmt))

<class 'sqlalchemy.sql.selectable.Join'>


> **Note:** Both `select()` and `join()` are special cases of the more general [`FromClause`](https://docs.sqlalchemy.org/en/13/core/selectable.html#sqlalchemy.sql.expression.FromClause) class, which basically means they can be used within the FROM clause of a SELECT statement. This can be illustrated by looking at

>> `join_stmt.c.keys()`

>> `conn.execute(select([join_stmt])).fetchmany(5)`

Now we can select from this Join object.

In [None]:
import sqlalchemy
from sqlalchemy import select, join

# Create the engine
engine = sqlalchemy.create_engine('sqlite:///chinook.db', echo=False)

# Define the metadata
metadata = sqlalchemy.MetaData()

# Reflect the tables
metadata.reflect(bind=engine)

# Get the 'albums' and 'artists' tables from metadata
albums = metadata.tables['albums']
artists = metadata.tables['artists']

# Define the join condition
join_stmt = join(albums, artists, albums.c.ArtistId == artists.c.ArtistId)

# Establish a connection
conn = engine.connect()

# Define the query to select columns from both tables and join them
query = select(
    albums.c.Title,
    artists.c.Name
).select_from(
    join_stmt
)

# Execute the query on the connection
result = conn.execute(query)

# Fetch the results
for row in result:
    print(row)

# Close the connection
conn.close()


Alternatively, when we use JOINs we know what FROM clause we want, so here we make use of the [`select_from()`](https://docs.sqlalchemy.org/en/13/core/selectable.html?highlight=select_from#sqlalchemy.sql.expression.Select.select_from) method.

In [90]:
query = select(albums.c.Title, artists.c.Name).select_from(join_stmt)

In [91]:
print(str(query))

SELECT albums."Title", artists."Name" 
FROM albums JOIN artists ON albums."ArtistId" = artists."ArtistId"


In [94]:
result = conn.execute(query)

In [None]:
result.fetchmany(5)

[('For Those About To Rock We Salute You', 'AC/DC'),
 ('Balls to the Wall', 'Accept'),
 ('Restless and Wild', 'Accept'),
 ('Let There Be Rock', 'AC/DC'),
 ('Big Ones', 'Aerosmith')]

### Example 4 - Group By and SQL functions

**Task - Show for each customer (name) the number of invoinces they had.**

> **Reference:** Very often we wish to apply a function on the data. Built-in SQL functions are available through the [`func` module](https://docs.sqlalchemy.org/en/13/core/tutorial.html#functions).

See here details about the [`group_by()`](https://docs.sqlalchemy.org/en/13/core/tutorial.html#ordering-grouping-limiting-offset-ing) method.

In [96]:
invoices = Table('invoices', metadata, autoload_with=engine)
customers = Table('customers', metadata, autoload_with=engine)

In [97]:
metadata.tables.keys()

dict_keys(['albums', 'artists', 'customers', 'employees', 'genres', 'invoice_items', 'tracks', 'media_types', 'invoices', 'playlist_track', 'playlists'])

Let's try first without the names of the customers.

In [99]:
query = select(invoices.c.CustomerId, func.count(invoices.c.InvoiceId))\
            .select_from(invoices)\
            .group_by(invoices.c.CustomerId)

In [100]:
print(str(query))

SELECT invoices."CustomerId", count(invoices."InvoiceId") AS count_1 
FROM invoices GROUP BY invoices."CustomerId"


In [102]:
conn.execute(query).fetchmany(5)

[(1, 7), (2, 7), (3, 7), (4, 7), (5, 7)]

And now with the names.

In [103]:
query = select(customers.c.FirstName + " " + customers.c.LastName, func.count(invoices.c.InvoiceId))\
            .select_from(invoices.join(customers, invoices.c.CustomerId == customers.c.CustomerId))\
            .group_by(invoices.c.CustomerId)

In [104]:
conn.execute(query).fetchmany(5)

[('Luís Gonçalves', 7),
 ('Leonie Köhler', 7),
 ('François Tremblay', 7),
 ('Bjørn Hansen', 7),
 ('František Wichterlová', 7)]

## Relation to pandas

### Tables (SQLAlchemy) vs. DataFrames (pandas)

Very often we would like to save our query result as a data-frame. Luckily, the `ResultProxy.fetchall()` method returns a list of results that can be constructed as a data-frame using the standard `pd.DataFrame()` constructor.

In [127]:
query = select(albums)
result = conn.execute(query)

In [128]:
df_albums = pd.DataFrame(result.fetchall(), columns=result.keys())
df_albums.head()

Unnamed: 0,AlbumId,Title,ArtistId
0,1,For Those About To Rock We Salute You,1
1,2,Balls to the Wall,2
2,3,Restless and Wild,2
3,4,Let There Be Rock,1
4,5,Big Ones,3


### Direct querying

Moreover, *pandas* offers [`pd.read_sql_query(sql, con)`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html) to directly send SQL commands through a given DB connection. SQLAlchemy engine is one of the options for `con`.

In [None]:
import pandas as pd
import sqlalchemy

# Create the engine
engine = sqlalchemy.create_engine('sqlite:///chinook.db', echo=False)

# Establish a connection
conn = engine.connect()

# Define the SQL query
sql = text('''SELECT * FROM employees ORDER BY BirthDate''')

# Use pd.read_sql_query() with the connection object
df_tracks = pd.read_sql_query(sql, con=conn)

# Close the connection
conn.close()

# Display the first few rows of the DataFrame
print(df_tracks.head())


In [None]:
import pandas as pd
import sqlalchemy

# Create the engine
engine = sqlalchemy.create_engine('sqlite:///chinook.db', echo=False)

# Establish a connection
conn = engine.connect()

# Define the SQL query
sql = text('''SELECT * FROM albums''')

# Use pd.read_sql_query() with the connection object
albums = pd.read_sql_query(sql, con=conn)

# Close the connection
conn.close()

# Display the first few rows of the DataFrame
print(df_tracks.head())


In [None]:
import pandas as pd
import sqlalchemy

# Create the engine
engine = sqlalchemy.create_engine('sqlite:///chinook.db', echo=False)

# Establish a connection
conn = engine.connect()

# Define the SQL table
table = "albums"
# Use pd.read_sql_query() with the connection object
albums1 = pd.read_sql_table(table, con=conn)

# Close the connection
conn.close()

# Display the first few rows of the DataFrame
print(df_tracks.head())


In [141]:
import pandas as pd
import sqlalchemy

# Create the engine
engine = sqlalchemy.create_engine('sqlite:///chinook.db', echo=False)

# Establish a connection
conn = engine.connect()

# Define the SQL table
table = "albums"
# Use pd.read_sql_query() with the connection object
albums1 = pd.read_sql_table(table, con=conn)
# Close the connection
#conn.close()

# Display the first few rows of the DataFrame
print(df_tracks.head())


   AlbumId                                  Title  ArtistId
0        1  For Those About To Rock We Salute You         1
1        2                      Balls to the Wall         2
2        3                      Restless and Wild         2
3        4                      Let There Be Rock         1
4        5                               Big Ones         3


In [142]:
albums1.drop('Title', axis=1).to_sql('albums_new', con=engine,if_exists='replace')

347

In [166]:
import pandas as pd
import sqlalchemy

# Create the engine
engine = sqlalchemy.create_engine('sqlite:///chinook.db', echo=False)

# Establish a connection
conn = engine.connect()

# Define the SQL table
query=text("select * from albums_new")
# Use pd.read_sql_query() with the connection object
albums1 = pd.read_sql_query(query, con=conn)
# Close the connection
conn.close()

# Display the first few rows of the DataFrame
print(albums1.head())

   index  AlbumId  ArtistId
0      0        1         1
1      1        2         2
2      2        3         2
3      3        4         1
4      4        5         3


In [167]:
albums.to_sql('new_albums', con=engine, if_exists='append')

AttributeError: 'Table' object has no attribute 'to_sql'

In [None]:
import pandas as pd
import sqlalchemy

# Create the engine
engine = sqlalchemy.create_engine('sqlite:///chinook.db', echo=False)

# Establish a connection
conn = engine.connect()

# Read the data from the 'albums' table into a Pandas DataFrame
df_albums = pd.read_sql_table('albums', con=conn)

# Write the contents of the DataFrame to the 'new_albums' table
df_albums.to_sql('new_albums', con=conn, if_exists='append', index=False)

# Close the connection
#conn.close()


In [177]:
import sqlalchemy

# Create the engine
engine = sqlalchemy.create_engine('sqlite:///chinook.db', echo=False)

# Establish a connection
conn = engine.connect()

# Define the table name you want to count
table_name = 'albums'

# Define the SQL query to count the rows in the table
sql_query = text(f"SELECT COUNT(*) FROM {table_name}")

# Execute the SQL query
result = conn.execute(sql_query)

# Fetch the count value
count = result.scalar()

# Print the count
print(f"The number of rows in the '{table_name}' table is: {count}")

# Close the connection
#conn.close()


The number of rows in the 'albums' table is: 347


In [None]:
new_df['AlbumId'].count()

1041

In [None]:
new_df['AlbumId']

0         1
1         2
2         3
3         4
4         5
       ... 
1036    343
1037    344
1038    345
1039    346
1040    347
Name: AlbumId, Length: 1041, dtype: int64

In [None]:
new_df['AlbumId'].duplicated().sum()

694

> **Your turn:**
* Part 1 - create the tables `tracks`, `albums` and `artists` both as SQLAlchemy Tables and as pandas DataFrames.
* Part 2 - Answer the following questions in two ways - using SQLAlchemy and using pandas.
>> 1. What is the size of the table `tracks`?
>> 2. Which artist has the highest number of tracks?

> Don't hesitate to look for the answers online...

### Solution

#### Part 1

In [179]:
tracks = Table('tracks', metadata, autoload_with=engine)
query = select([tracks])
results = conn.execute(query).fetchall()
df_tracks = pd.DataFrame(results, columns=tracks.c.keys())
df_tracks.head()

ArgumentError: Column expression, FROM clause, or other columns clause element expected, got [Table('tracks', MetaData(), Column('TrackId', INTEGER(), table=<tracks>, primary_key=True, nullable=False), Column('Name', NVARCHAR(length=200), table=<tracks>, nullable=False), Column('AlbumId', INTEGER(), ForeignKey('albums.AlbumId'), table=<tracks>), Column('MediaTypeId', INTEGER(), ForeignKey('media_types.MediaTypeId'), table=<tracks>, nullable=False), Column('GenreId', INTEGER(), ForeignKey('genres.GenreId'), table=<tracks>), Column('Composer', NVARCHAR(length=220), table=<tracks>), Column('Milliseconds', INTEGER(), table=<tracks>, nullable=False), Column('Bytes', INTEGER(), table=<tracks>), Column('UnitPrice', NUMERIC(precision=10, scale=2), table=<tracks>, nullable=False), schema=None)]. Did you mean to say select(Table('tracks', MetaData(), Column('TrackId', INTEGER(), table=<tracks>, primary_key=True, nullable=False), Column('Name', NVARCHAR(length=200), table=<tracks>, nullable=False), Column('AlbumId', INTEGER(), ForeignKey('albums.AlbumId'), table=<tracks>), Column('MediaTypeId', INTEGER(), ForeignKey('media_types.MediaTypeId'), table=<tracks>, nullable=False), Column('GenreId', INTEGER(), ForeignKey('genres.GenreId'), table=<tracks>), Column('Composer', NVARCHAR(length=220), table=<tracks>), Column('Milliseconds', INTEGER(), table=<tracks>, nullable=False), Column('Bytes', INTEGER(), table=<tracks>), Column('UnitPrice', NUMERIC(precision=10, scale=2), table=<tracks>, nullable=False), schema=None))?

In [None]:
albums = Table('albums', metadata, autoload_with=engine)
query = select([albums])
results = conn.execute(query).fetchall()
df_albums = pd.DataFrame(results, columns=albums.c.keys())
df_albums.head()

Unnamed: 0,AlbumId,Title,ArtistId
0,1,For Those About To Rock We Salute You,1
1,2,Balls to the Wall,2
2,3,Restless and Wild,2
3,4,Let There Be Rock,1
4,5,Big Ones,3


In [None]:
artists = Table('artists', metadata, autoload_with=engine)
query = select([artists])
results = conn.execute(query).fetchall()
df_artists = pd.DataFrame(results, columns=artists.c.keys())
df_artists.head()

Unnamed: 0,ArtistId,Name
0,1,AC/DC
1,2,Accept
2,3,Aerosmith
3,4,Alanis Morissette
4,5,Alice In Chains


#### Part 2

##### Question 1

Based on [`count()` documentation](https://docs.sqlalchemy.org/en/13/core/functions.html#sqlalchemy.sql.functions.count)

In [None]:
query = select([func.count()]).select_from(tracks)
conn.execute(query).fetchall()

[(3503,)]

Or...

In [None]:
len(df_tracks)

3503

##### Question 2

In [None]:
join_stmt = tracks.join(albums, tracks.c.AlbumId == albums.c.AlbumId)\
    .join(artists, albums.c.ArtistId == artists.c.ArtistId)

Based on [`order_by()`](https://docs.sqlalchemy.org/en/13/core/selectable.html?highlight=order_by#sqlalchemy.sql.expression.Select.order_by), [`desc()`](https://docs.sqlalchemy.org/en/13/core/sqlelement.html?highlight=desc#sqlalchemy.sql.expression.desc) and [`label()`](https://docs.sqlalchemy.org/en/13/core/sqlelement.html#sqlalchemy.sql.expression.label)  documentation.

In [None]:
query = select([artists.c.Name, func.count(tracks.c.TrackId).label('tracks_count')])\
    .select_from(join_stmt)\
    .group_by(artists.c.ArtistId)\
    .order_by(desc('tracks_count'))

In [None]:
conn.execute(query).fetchmany(5)

[('Iron Maiden', 213),
 ('U2', 135),
 ('Led Zeppelin', 114),
 ('Metallica', 112),
 ('Deep Purple', 92)]

Or (with the help of the [`DataFrame.join()` documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html))...

In [None]:
df_all = df_tracks\
    .join(df_albums.set_index('AlbumId'), on='AlbumId')\
    .join(df_artists.set_index('ArtistId'), on='ArtistId',
          lsuffix='_l', rsuffix='_r')
df_all.Name_r.value_counts()[:5]

Iron Maiden     213
U2              135
Led Zeppelin    114
Metallica       112
Lost             92
Name: Name_r, dtype: int64

# Create your own DB

## Table creation

For this tutorial we will use an in-memory-only SQLite database. This is an easy way to test things without needing to have an actual database defined anywhere.

In [180]:
engine = create_engine('sqlite:///:memory:', echo=False)

We have to define a metadata object.

In [181]:
metadata = MetaData()

Next we define the schemas of the tables.

In [182]:
users = Table('users', metadata,
    Column('id', Integer),
    Column('name', String),
    Column('fullname', String),
)

In [183]:
addresses = Table('addresses', metadata,
    Column('id', Integer),
    Column('user_id', Integer),
    Column('email_address', String)
)

> **Note:** The metadata object makes sure there are no ambiguities in the database. Try to create another table with the same name and read the exception.

Finally, we use the metadata object to create all the tables.

In [184]:
metadata.create_all(engine)

## Insert data

All operations are sent to the database through the connection object.

In [185]:
conn = engine.connect()

The [`insert()`](https://docs.sqlalchemy.org/en/13/core/dml.html#sqlalchemy.sql.expression.insert) method is a wrapper for SQL's INSERT command.

In [186]:
ins = users.insert().values(id=1234, name='jack', fullname='Jack Jones')

In [187]:
str(ins)

'INSERT INTO users (id, name, fullname) VALUES (:id, :name, :fullname)'

In [188]:
result = conn.execute(ins)

> **Note:** This `ResultProxy` object does not contain any result.

### Testing

In [190]:
conn.execute(select(users)).fetchall()

[(1234, 'jack', 'Jack Jones')]

# Example

In this example we do the following:
1. We insert the data of the MovieLens files into two database tables.
2. We use SQLAlchemy to find the best movie (having at least 30 viewers).

## Inspect the data

In [None]:
import sys
if 'google.colab' in sys.modules:
    from google.colab import files
    uploaded = files.upload()

In [191]:
df_movies = pd.read_csv('movies.csv')
df_movies.head()

Unnamed: 0,movieID,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [193]:
df_ratings = pd.read_csv('ratings.csv')
df_ratings.head()

Unnamed: 0,userID,movieID,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


## Creating the tables

In [194]:
engine = create_engine('sqlite:///:memory:', echo=False)
metadata = MetaData()
conn = engine.connect()

In [195]:
movies = Table('movies', metadata,
    Column('movieID', Integer),
    Column('title', String),
    Column('genres', String),
)

In [196]:
ratings = Table('ratings', metadata,
    Column('userID', Integer),
    Column('movieID', Integer),
    Column('rating', Float),
    Column('timestamp', Integer)
)

In [197]:
metadata.create_all(engine)

## Inserting the data

In [198]:
for ind, row in df_movies.iterrows():
    ins = movies.insert().values(movieID=row.movieID, title=row.title, genres=row.genres)
    conn.execute(ins)

In [200]:
conn.execute(select(movies)).fetchmany(5)

[(1, 'Toy Story (1995)', 'Adventure|Animation|Children|Comedy|Fantasy'),
 (2, 'Jumanji (1995)', 'Adventure|Children|Fantasy'),
 (3, 'Grumpier Old Men (1995)', 'Comedy|Romance'),
 (4, 'Waiting to Exhale (1995)', 'Comedy|Drama|Romance'),
 (5, 'Father of the Bride Part II (1995)', 'Comedy')]

In [201]:
for ind, row in df_ratings.iterrows():
    ins = ratings.insert().values(userID=row.userID, movieID=row.movieID, rating=row.rating, timestamp=row.timestamp)
    conn.execute(ins)

In [202]:
conn.execute(select(ratings)).fetchmany(5)

[(1, 31, 2.5, 1260759144),
 (1, 1029, 3.0, 1260759179),
 (1, 1061, 3.0, 1260759182),
 (1, 1129, 2.0, 1260759185),
 (1, 1172, 4.0, 1260759205)]

## Executing the query

In [203]:
join_stmt = movies.join(ratings, ratings.c.movieID == movies.c.movieID)

**To Be Continued**

In [None]:
engine = create_engine('mssql+pyodbc://{}:{}@'.format(server_user, server_password) \
                            + server_name + '/' + \
                            db_name + '?trusted_connection=no&driver=ODBC+Driver+17+for+SQL+server')