We want to be able to:

- connect Python with a MySQL database
- be able to manage a MYSQL database from Python (e.g. creating databases and tables, inserting data, deleting tables, ...)
- write Pandas DataFrames to a MySQL database
- read data from a MySQL database into a Pandas DataFrame
- use SQL queries to filter data in a MySQL database

In [1]:
from sqlalchemy import create_engine, text   # Needed to connect to the database
import pandas as pd
import os                                    # Needed to access environment variables
from dotenv import load_dotenv               # Load passwords etc from .env file 

## 1.1 Connect to MySQL Database

- Instead of writing your database credentials into a Python script or Jupyter Notebook, you can store them in a separate .env file. 
- This is a text file that contains environment variables in the form of name=value pairs. 
- You can then load these variables into your Python script using the dotenv package.
- Such a .env file should never be shared with others or checked into version control.

In [2]:
# Load the .env file
load_dotenv('.env', override=True)

True

- The package SQLAlchemy allows you to connect to different SQL databases (e.g. MySQL, PostgreSQL, SQLite, ...).
- To connect to a MySQL database, you need to specify both the database dialect (mysql) and choose an appropriate driver (mysqlconnector, pymysql, ...). The driver needs to be installed separately. [MySQL Connector/Python](https://dev.mysql.com/doc/connector-python/en/connector-python-introduction.html) is the official MySQL driver for Python. 

In [3]:
# Connection to local MYSQL database
DIALECT = 'mysql'
DRIVER = 'mysqlconnector'                          # pip install mysql-connector-python
USER = os.getenv('MYSQL_USER')
PASSWORD = os.getenv('MYSQL_PASSWORD')
HOST = 'localhost'
PORT = '3306'

connection_string = f"{DIALECT}+{DRIVER}://{USER}:{PASSWORD}@{HOST}:{PORT}"
engine = create_engine(connection_string)

ModuleNotFoundError: No module named 'mysql'

## 1.2 Create a new database

- We can use the `execute` method to run arbitrary SQL statements
- To make sure that the connection to the database is closed after the transaction, we can use the `with` statement. 

In [None]:
with engine.connect() as connection:
    connection.execute(text('CREATE DATABASE IF NOT EXISTS music'))

DatabaseError: (mysql.connector.errors.DatabaseError) 2003 (HY000): Can't connect to MySQL server on 'localhost:3306' (10061)
(Background on this error at: https://sqlalche.me/e/20/4xp6)

- Typically, we would not create a new database in this way, but use a database that already exists.
- In this case, we would directly specify an engine that points to the existing database.

In [None]:
connection_string = f"{DIALECT}+{DRIVER}://{USER}:{PASSWORD}@{HOST}:{PORT}/music"
engine = create_engine(connection_string)

## 1.3 Write Pandas DataFrame to database

- We use the `to_sql` method of Pandas DataFrames to write an entire DataFrame to a database table
- We can specify how to handle the situation that the table already exists: 'fail', 'replace', 'append'
- Also we need to consider whether the index should be written to the database (default)

In [None]:
df = pd.read_csv("data/tracks.csv")
df.shape
df.head()

Unnamed: 0,id,name,album_name,artist_ids,danceability,energy,speechiness,acousticness,valence,tempo,duration_ms
0,05Mp2UJulSttxQ4E6hQPH3,Ohne mein Team,Palmen aus Plastik,"1aS5tqEs9ci5P9KD9tZWa6,0Dvx6p8JDyzeOPGmaCIH1L,...",0.766,0.8,0.0938,0.16,0.635,129.999,188504
1,4bHsxqR3GMrXTxEPLuK5ue,Don't Stop Believin',Escape,0rvjqX7ttXeg3mTy8Xscbt,0.5,0.748,0.0363,0.127,0.514,118.852,250987
2,3rdAz1fbUfZxYgaCviYhRo,Todo De Ti,VICE VERSA,1mcTU81TzQhprhouKaTkpq,0.78,0.719,0.0506,0.302,0.336,127.962,199604
3,254bXAqt3zP6P50BdQvEsq,Everywhere - 2017 Remaster,Tango In the Night (Deluxe Edition),08GQAI4eElDnROBrJRGE0X,0.73,0.487,0.0303,0.258,0.731,114.965,226653
4,2PGA1AsJal6cyMNmKyE56q,200 km/h,Platte,1qQLhymHXFPtP5U8KNKsm6,0.899,0.67,0.163,0.269,0.413,148.065,163147


In [None]:
df.to_sql(name='tracks', con=engine, if_exists='replace', index=False)

DatabaseError: (mysql.connector.errors.DatabaseError) 2003 (HY000): Can't connect to MySQL server on 'localhost:3306' (10061)
(Background on this error at: https://sqlalche.me/e/20/4xp6)

## 1.4 Read from database into Pandas DataFrame

- We can read entire database tables

In [None]:
tracks = pd.read_sql('tracks', con=engine)
tracks.head()

DatabaseError: (mysql.connector.errors.DatabaseError) 2003 (HY000): Can't connect to MySQL server on 'localhost:3306' (10061)
(Background on this error at: https://sqlalche.me/e/20/4xp6)

- ... and send arbitrary SQL queries to read data into a Pandas DataFrame

In [None]:
pd.read_sql('select name, album_name, danceability from tracks order by danceability desc limit 5', con=engine)

Unnamed: 0,name,album_name,danceability
0,Pure Cocaine,Street Gossip,0.964
1,Yes Indeed,Harder Than Ever,0.963
2,Low Down,My Turn (Deluxe),0.962
3,CAIRO,MAÑANA SERÁ BONITO,0.957
4,Players,Players,0.954


In [None]:
pd.read_sql('select * from tracks where danceability > 0.8', con=engine)

Unnamed: 0,id,name,album_name,artist_ids,danceability,energy,speechiness,acousticness,valence,tempo,duration_ms
0,2PGA1AsJal6cyMNmKyE56q,200 km/h,Platte,1qQLhymHXFPtP5U8KNKsm6,0.899,0.670,0.1630,0.2690,0.413,148.065,163147
1,6GomT970rCOkKAyyrwJeZi,Move Your Body,Move Your Body,"37czgDRfGMvgRiUKHvnnhj,0aOIluXr131XqrXFwFCFGT",0.848,0.821,0.0527,0.0169,0.249,125.051,157445
2,6hw1Sy9wZ8UCxYGdpKrU6M,Roller,Platte,1qQLhymHXFPtP5U8KNKsm6,0.941,0.758,0.1700,0.0256,0.683,128.017,157093
3,1F205Nl2feOSYSztLNOJAL,3 Am,Sauce Boyz,"5XJDexmWFLWOkjOEjOVX3e,00XhexlJEXQstHimpZN910",0.852,0.613,0.1080,0.4130,0.576,89.951,208237
4,5ddFjrPG8NgQQ6xlOQIVd2,Tú Me Dejaste De Querer,El Madrileño,"5TYxZTjIPqKM8K8NuP9woO,5IbUz6BcOu6IVY512oxavP,...",0.823,0.723,0.2290,0.3110,0.505,83.970,198493
...,...,...,...,...,...,...,...,...,...,...,...
180,24vDSi6wZW34oY8sTrgQf7,BRÛLURES INDIENNES,DIAMANT DU BLED,54kCbQZaZWHnwwj9VP2hn4,0.848,0.404,0.2430,0.5600,0.416,103.950,177973
181,7jbu9k6w67hWlhSinmGT3c,COEUR DE ICE (feat. Damso),DIAMANT DU BLED,"54kCbQZaZWHnwwj9VP2hn4,2UwqpfQtNuhBwviIC0f2ie",0.851,0.567,0.2310,0.2150,0.446,119.969,192213
182,4bGWT7GWjqaO1Sj9ZbUtEG,FROSTIES,DIAMANT DU BLED,54kCbQZaZWHnwwj9VP2hn4,0.883,0.785,0.2590,0.2770,0.294,104.029,110267
183,5R9ZGpFYnzW3WmVGk2uh4J,L'ARMOIRE,DIAMANT DU BLED,54kCbQZaZWHnwwj9VP2hn4,0.878,0.576,0.2080,0.5930,0.243,111.067,165573


In [None]:
pd.read_sql('select count(*), avg(danceability) from tracks', con=engine)

Unnamed: 0,count(*),avg(danceability)
0,792,0.695527


In [None]:
tracks = pd.read_sql('tracks', con=engine)
print(tracks.shape[0])
print(tracks.danceability.mean())


792
0.6955265151515151


## 1.5 Delete database and clean up

- Now we want to undo the previous steps, and clean up
- Caution: this is irreversible!
- First, we drop the newly created table "tracks" within our "music" database

In [None]:
# If you want to delete the table and database, then remove the quotes from this code cell
"""
with engine.connect() as connection:
    connection.execute(text("Drop table if exists tracks"))
    connection.execute(text("Drop database if exists music"))
"""

- To make sure that all open connections to the database are closed, we should explicitly close them at the end of the script

In [None]:
# If you want to close all open connections, then uncomment the next line

# engine.dispose()

# Write all music related tables to database

In [None]:
charts = pd.read_csv('data/charts.csv')
artists = pd.read_csv('data/artists.csv')
tracks = pd.read_csv('data/tracks.csv')
lyrics = pd.read_csv('data/lyrics.csv')

In [None]:
charts.to_sql(name='charts', con=engine, if_exists='replace', index=False)
artists.to_sql(name='artists', con=engine, if_exists='replace', index=False)
tracks.to_sql(name='tracks', con=engine, if_exists='replace', index=False)
lyrics.to_sql(name='lyrics', con=engine, if_exists='replace', index=False)

598

Send a complex SQL statement that joins tables, etc. and only reads in the result of that operation into a Pandas DataFrame

In [None]:
query = """select * from charts c left join tracks t on c.track_id = t.id order by t.danceability desc limit 10"""
pd.read_sql(query, con=engine)

Unnamed: 0,country,date,position,track_id,streams,id,name,album_name,artist_ids,danceability,energy,speechiness,acousticness,valence,tempo,duration_ms
0,us,2023-03-28,109,577YBGuskWkVDCxZrLRB4v,371849,577YBGuskWkVDCxZrLRB4v,Pure Cocaine,Street Gossip,5f7VJjfbwm532GiveGC0ZK,0.964,0.487,0.421,0.00127,0.107,127.05,154024
1,us,2023-03-28,130,6vN77lE9LK6HP2DewaN6HZ,348685,6vN77lE9LK6HP2DewaN6HZ,Yes Indeed,Harder Than Ever,"5f7VJjfbwm532GiveGC0ZK,3TVXtAsR1Inumwj472S9r4",0.963,0.346,0.53,0.0355,0.562,119.957,142273
2,us,2023-03-28,48,5m0yZ33oOy0yYBtdTXuxQe,536204,5m0yZ33oOy0yYBtdTXuxQe,Low Down,My Turn (Deluxe),5f7VJjfbwm532GiveGC0ZK,0.962,0.619,0.405,0.0354,0.154,127.958,144652
3,es,2023-03-28,33,16dUQ4quIHDe4ZZ0wF1EMN,210791,16dUQ4quIHDe4ZZ0wF1EMN,CAIRO,MAÑANA SERÁ BONITO,"790FomKkXshlbRYZFtlgla,3m5qlPf2OkihLz3dRYnkPA",0.957,0.677,0.292,0.483,0.469,115.0,198667
4,de,2023-03-28,56,6UN73IYd0hZxLi8wFPMQij,110455,6UN73IYd0hZxLi8wFPMQij,Players,Players,6AMd49uBDJfhf30Ak2QR5s,0.954,0.516,0.16,0.03,0.624,105.001,139560
5,gb,2023-03-28,68,531KGXtBroSrOX9LVmiIgc,89990,531KGXtBroSrOX9LVmiIgc,Starlight,Starlight,6Ip8FS7vWT1uKkJSweANQK,0.954,0.367,0.288,0.341,0.372,124.026,211935
6,us,2023-03-28,33,6UN73IYd0hZxLi8wFPMQij,599242,6UN73IYd0hZxLi8wFPMQij,Players,Players,6AMd49uBDJfhf30Ak2QR5s,0.954,0.516,0.16,0.03,0.624,105.001,139560
7,gb,2023-03-28,13,6UN73IYd0hZxLi8wFPMQij,174014,6UN73IYd0hZxLi8wFPMQij,Players,Players,6AMd49uBDJfhf30Ak2QR5s,0.954,0.516,0.16,0.03,0.624,105.001,139560
8,gb,2023-03-28,86,3yfqSUWxFvZELEM4PmlwIR,79954,3yfqSUWxFvZELEM4PmlwIR,The Real Slim Shady,The Marshall Mathers LP,7dGJo4pcD2V6oG8kP0tJRR,0.949,0.661,0.0572,0.0302,0.76,104.504,284200
9,de,2023-03-28,64,28yd3NLXZkDi5p9segSvcf,103839,28yd3NLXZkDi5p9segSvcf,Vergessen wie,Vergessen wie,6rqlONGmPuP2wJVSfliLBI,0.949,0.679,0.0682,0.0467,0.126,121.025,224445
