## Using Python to Query MySQL
This notebook demonstrates using a couple of different database connectivity libraries to connect to and query a MySQL database.
- **PyMySQL** library
- MySQL's Native **mysql.connector** library
- **SqlAlchemy** library

### 1.0. Prerequisites

#### 1.1. First, you must **install** the libaries into your *python* environment by executing the following commands in a *Terminal window*
- python -m pip install PyMySQL
- python -m pip install mysql.connector-python
- python -m pip install sqlalchemy

#### 1.2. Next, as with all Jupyter Notebooks, you need to **Import** the libaries that you'll be working with in the notebook,

In [1]:
import os
import pymysql
import mysql.connector
from sqlalchemy import create_engine

import pandas as pd
import matplotlib.pyplot as plt

#### 1.3. And then, Assign Connection Variables for the MySQL Server & Database with which You'll be Working 

In [2]:
host_name = "localhost" #"compid-mysql.mysql.database.azure.com"
host_ip = "127.0.0.1"
port = "3306"

user_id = "jtupitza"
pwd = "Passw0rd123"
db_name = "northwind"

### 2.0. Using the PyMySQL Library
#### 2.1. Using a Cursor to Iterate the Rows Returned

In [None]:
conn = pymysql.connect(host=host_name, user=user_id, password=pwd, database=db_name)
cursor = conn.cursor()

try:
    cursor.execute('SELECT * FROM products;')
    
    for row in cursor.fetchmany(size=5):
        print(row)
        
    cursor.close()
    
except:
    print ("Error: unable to fetch data")
    
conn.close()

In [None]:
conn = pymysql.connect(host=host_name, user=user_id, password=pwd, database=db_name)
cursor = conn.cursor(pymysql.cursors.DictCursor)

try:
    cursor.execute('SELECT * FROM products ORDER BY list_price DESC LIMIT 5;')
    
    for row in cursor.fetchall():
        print(row)
        
    cursor.close()
    
except:
    print ("Error: unable to fetch data")
    
conn.close()

#### 2.2. Using the Pandas read_sql() Method to Return a DataFrame

In [None]:
conn = pymysql.connect(host=host_name, user=user_id, password=pwd, database=db_name)

df = pd.read_sql("SELECT * FROM products ORDER BY list_price DESC;", conn)

conn.close()
df.head()

### 3.0. Using the MySQL Native Connection Library
#### 3.1. Using a Cursor to Iterate the Rows Returned

In [None]:
lbound = 15.00
ubound = 20.00

sql_query = """
    SELECT id AS product_id
        , product_name
        , list_price
    FROM northwind.products
    WHERE list_price BETWEEN %s AND %s
    ORDER BY list_price DESC;
"""

In [None]:
conn = mysql.connector.connect(user=user_id, password=pwd,
                               host=host_name, database=db_name,
                               auth_plugin='mysql_native_password')

In [None]:
cursor = conn.cursor()

try:
    cursor.execute(sql_query, (lbound, ubound))
    
    for (product_id, product_name, list_price) in cursor:
        print("{}, {} was sold for {}".format(product_id, product_name, list_price))
        
    cursor.close()
    
except:
    print ("Error: unable to fetch data")   

In [None]:
cursor = conn.cursor()

try:
    cursor.execute(sql_query, (lbound, ubound))
    
    row = cursor.fetchone() 
    while row:
        print(row)
        row = cursor.fetchone()
        
    cursor.close()
    
except:
    print ("Error: unable to fetch data")

In [None]:
conn.close()

#### 3.2. Using Pandas read_sql() Method to Return a DataFrame

In [4]:
lbound = 15.00
ubound = 20.00

sql_query = """
    SELECT id AS product_id
        , product_name
        , list_price
    FROM northwind.products
    WHERE list_price BETWEEN {0} AND {1}
    ORDER BY list_price DESC;
""".format(lbound, ubound)

print(sql_query)


    SELECT id AS product_id
        , product_name
        , list_price
    FROM northwind.products
    WHERE list_price BETWEEN 15.0 AND 20.0
    ORDER BY list_price DESC;



In [5]:
configs = {
    'user': user_id,
    'password': pwd,
    'host': host_name,
    'database': db_name,
    'auth_plugin': 'mysql_native_password',
    'raise_on_warnings': True
}

conn = mysql.connector.connect(**configs)

df = pd.read_sql(sql_query, conn)

conn.close()
df.tail()



Unnamed: 0,product_id,product_name,list_price
0,57,Northwind Traders Ravioli,19.5
1,40,Northwind Traders Crab Meat,18.4
2,1,Northwind Traders Chai,18.0
3,66,Northwind Traders Tomato Sauce,17.0
4,86,Northwind Traders Cake Mix,15.99


### 4.0. Using the SQLAlchemy Connection Library

In [None]:
conn_str = f"mysql+pymysql://{user_id}:{pwd}@{host_name}/{db_name}"

sqlEngine = create_engine(conn_str, pool_recycle=3600)
conn = sqlEngine.connect()

df = pd.read_sql(sql_query, conn);

conn.close()
df.head()

### 5.0 Define Helper Functions to Encapsulate and Abstract the Implementation Details

In [None]:
sql_query = """
    SELECT id AS product_id
        , product_name
        , list_price
    FROM northwind.products
    ORDER BY list_price DESC;
"""

#### 5.1. Using Individual Connection Parameters

In [None]:
def get_pymysql_dataframe(host, user, password, database_name, sql_query_string):
    connection = pymysql.connect(host=host, user=user, password=password, database=database_name)
    dframe = pd.read_sql(sql_query_string, connection)
    connection.close()
    
    return dframe

In [None]:
df = get_pymysql_dataframe(host_name, user_id, pwd, db_name, sql_query)
df.head()

In [None]:
print("Shape: {}\n".format(df.shape))

#### 5.1.1. Using SqlAlchemy

In [None]:
def get_sqlalchemy_dataframe(user_id, pwd, host_name, db_name, sql_query):
    conn_str = f"mysql+pymysql://{user_id}:{pwd}@{host_name}/{db_name}"
    sqlEngine = create_engine(conn_str, pool_recycle=3600)
    connection = sqlEngine.connect()
    dframe = pd.read_sql(sql_query, connection);
    connection.close()
    
    return dframe

In [None]:
df = get_sqlalchemy_dataframe(user_id, pwd, host_name, db_name, sql_query)
df.head(3)

In [None]:
print(f"Shape: {df.shape[0]} Observations x {df.shape[1]} Features")

#### 5.2. Using a Configurations Dictionary

In [6]:
def get_mysql_dataframe(sql_query_string, args):
    connection = mysql.connector.connect(**args)
    dframe = pd.read_sql(sql_query_string, connection)
    connection.close()
    
    return dframe

In [7]:
dframe = get_mysql_dataframe(sql_query, configs)
dframe.tail(3)



Unnamed: 0,product_id,product_name,list_price
2,1,Northwind Traders Chai,18.0
3,66,Northwind Traders Tomato Sauce,17.0
4,86,Northwind Traders Cake Mix,15.99


In [None]:
print(f"Shape: {dframe.shape[0]} Observations x {dframe.shape[1]} Features")

### 6.0. Writing a Pandas DataFrame to a SQL Database

In [None]:
def insert_sqlalchemy_dataframe(user_id, pwd, host_name, db_name, df, table_name):
    conn_str = f"mysql+pymysql://{user_id}:{pwd}@{host_name}/{db_name}"
    sqlEngine = create_engine(conn_str, pool_recycle=3600)
    connection = sqlEngine.connect()
    df.to_sql(table_name, con=connection, if_exists='replace') #, index_label='product_id');  'append'
    connection.close()

In [None]:
insert_sqlalchemy_dataframe(user_id, pwd, host_name, db_name, dframe, 'dim_products')

In [None]:
df = get_sqlalchemy_dataframe(user_id, pwd, host_name, db_name, 'SELECT * FROM dim_products')
df.head()

### 7.0. Explore Pandas DataFrames' Capabilities
#### 7.1. Display the Data Type of Each Feature

In [8]:
sql_query = "SELECT * FROM northwind.products;"

df = get_mysql_dataframe(sql_query, configs)



In [9]:
df.dtypes

supplier_ids                 object
id                            int64
product_code                 object
product_name                 object
description                  object
standard_cost               float64
list_price                  float64
reorder_level                 int64
target_level                  int64
quantity_per_unit            object
discontinued                  int64
minimum_reorder_quantity    float64
category                     object
attachments                  object
dtype: object

#### 7.1. Inspect the Cardinality (number of unique values) of each Feature

In [10]:
df.nunique()

supplier_ids                12
id                          45
product_code                43
product_name                45
description                  0
standard_cost               29
list_price                  37
reorder_level                8
target_level                10
quantity_per_unit           32
discontinued                 2
minimum_reorder_quantity     6
category                    16
attachments                  1
dtype: int64

In [11]:
unique_values = []

for column in df.columns:
    unique_values.append(df[column].unique())
    
data = list(zip(df.columns, unique_values))    
    
pd.DataFrame(data, columns=['Feature', 'Unique Values'])

Unnamed: 0,Feature,Unique Values
0,supplier_ids,"[4, 10, 2;6, 2, 8, 6, 1, 7, 3;4, 5, 3, 9]"
1,id,"[1, 3, 4, 5, 6, 7, 8, 14, 17, 19, 20, 21, 34, ..."
2,product_code,"[NWTB-1, NWTCO-3, NWTCO-4, NWTO-5, NWTJP-6, NW..."
3,product_name,"[Northwind Traders Chai, Northwind Traders Syr..."
4,description,[None]
5,standard_cost,"[13.5, 7.5, 16.5, 16.0125, 18.75, 22.5, 30.0, ..."
6,list_price,"[18.0, 10.0, 22.0, 21.35, 25.0, 30.0, 40.0, 23..."
7,reorder_level,"[10, 25, 5, 15, 30, 20, 50, 100]"
8,target_level,"[40, 100, 20, 60, 120, 80, 75, 125, 200, 50]"
9,quantity_per_unit,"[10 boxes x 20 bags, 12 - 550 ml bottles, 48 -..."


#### 7.2. Display any Missing (NULL) values

In [12]:
df.isnull().sum().sort_values(ascending=True)

supplier_ids                 0
id                           0
product_code                 0
product_name                 0
standard_cost                0
list_price                   0
reorder_level                0
target_level                 0
discontinued                 0
category                     0
attachments                  0
quantity_per_unit            5
minimum_reorder_quantity    15
description                 45
dtype: int64

#### 7.3. Separate Numerical and Categorical Features

In [13]:
numerical_cols = [col for col in df.columns if df.dtypes[col] != 'O']
categorical_cols = [col for col in df.columns if col not in numerical_cols]

print(numerical_cols)
print(categorical_cols)

['id', 'standard_cost', 'list_price', 'reorder_level', 'target_level', 'discontinued', 'minimum_reorder_quantity']
['supplier_ids', 'product_code', 'product_name', 'description', 'quantity_per_unit', 'category', 'attachments']


#### 7.4. Evaluate the Statistical Distribution of the Numerical Features

In [14]:
df[numerical_cols].describe()

Unnamed: 0,id,standard_cost,list_price,reorder_level,target_level,discontinued,minimum_reorder_quantity
count,45.0,45.0,45.0,45.0,45.0,45.0,30.0
mean,57.933333,11.6825,15.845778,22.444444,69.555556,0.022222,15.0
std,33.750017,12.689461,16.743022,23.442924,50.506775,0.149071,8.304548
min,1.0,0.5,1.2,5.0,20.0,0.0,5.0
25%,21.0,2.0,2.99,10.0,40.0,0.0,10.0
50%,66.0,7.5,10.0,10.0,40.0,0.0,10.0
75%,88.0,16.0125,21.35,25.0,100.0,0.0,25.0
max,99.0,60.75,81.0,100.0,200.0,1.0,30.0


#### 7.5. Write the Contents of the DataFrame to a Comma-Separated Values (CSV) File

In [17]:
data_dir = os.path.join(os.getcwd(), 'data')
dest_file = os.path.join(data_dir, 'northwind_products.csv')

df.to_csv(dest_file)