## Query a Database
### Saving and Loading Data
*Curtis Miller*

Here we extract data from a MySQL database and store it in Python as a pandas `DataFrame`.

First, boilerplate.

In [None]:
import pymysql
from sqlalchemy import create_engine
import pandas as pd
from pandas import DataFrame

def pymysql_sqlalchemy_stringgen(user, passwd, host, dbname):
    """Generate a connection string for use with SQLAlchemy for MySQL and PyMySQL connections
    
    Args:
        user (str): The username of the connecting user
        passwd (str): The user's password
        host (str): The host for where the database is located
        dbname (str): The name of the database to connect with
    
    Returns:
        (str) A SQLAlchemy connection string suitable for use with create_engine()
    
    Additional options for the connection are not supported with this function.
    """
    
    return "mysql+pymysql://" + user + ":" + passwd + "@" + host + "/" + dbname

conn = create_engine(pymysql_sqlalchemy_stringgen("root", pswd, "localhost", "poppyramids")).connect()    # Connect to database

We know in advance the data is in the table `populations` in the database. We can use the pandas function `read_sql()` to pass a query and get its results.

In [None]:
american_pop_2017 = pd.read_sql('SELECT * FROM populations WHERE country = "UnitedStates" AND year = 2017;',    # The SQL query
                                con=conn,    # The connection object
                                index_col=["country", "year", "age"])
american_pop_2017

In [None]:
# Other queries
pd.read_sql('SELECT * FROM populations;', con=conn, index_col=["country", "year", "age"])    # Read whole table

In [None]:
pd.read_sql('SELECT country, year, age, male_population, female_population FROM populations;',    # Narrow down columns
            con=conn, index_col=["country", "year", "age"])

In [None]:
pd.read_sql('SELECT country, year, both_sexes_population FROM populations WHERE age = "Total";',     # Only population totals
            con=conn, index_col=["country", "year"])

In [None]:
pd.read_sql('SELECT country, year, both_sexes_population FROM populations WHERE age = "Total" AND (year = 2013 OR year = 2014);',
            con=conn, index_col=["country", "year"])

In [None]:
pd.read_sql('SELECT country FROM populations;', con=conn)    # Compare this call...

In [None]:
pd.read_sql('SELECT DISTINCT country FROM populations;', con=conn)    # ...to this call.

In [None]:
conn.close()