Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE`/`raise NotImplementedError` or "YOUR ANSWER HERE", as well as your name and collaborators below:

## Set User Credentials

With a shared resource at a provider like a MySQL RDBMS, we need to use credentials to authenticate ourselves to the server, and need the logical location of the server itself.

For these notebooks, these are kept in a text file named 'creds.json', stored either in the same directory or in a data directory.  For this notebook, this is stored in the same directory as the notebook.

- Right click on the `creds.json` file and select *Open With*->*Editor*
- Replace the mysql dictionary's key for "user" (currently `"nostudent"`) with the base part of your email address (i.e. without the `@denison.edu`).  Your password on the mysql server, at present, is the same as your user, so change that from `"nostudent"` as well.  The server should be correct, mapped to `"hadoop2.mathsci.denison.edu"`. Likewise, the scheme should be correct, mapped to `"mysql+mysqlconnector"`. 

**Make sure to use double quotes for strings** ... this is `JSON`, not Python, and we have to follow JSON syntax.

Once this is complete, execute the following cell to connect to the database using SQL alchemy. If you are off-campus you will need to use a VPN first.

In [None]:
import pandas as pd
import os
import os.path
import json
import sqlalchemy as sa

def getmysql_creds(dirname=".",filename="creds.json"):
    """ Using directory and filename parameters, open a credentials file
        and obtain the four parts needed for a connection string to
        a remote provider using the "mysql" dictionary within
        an outer dictionary.  
        
        Return a scheme, server, user, and password
    """
    assert os.path.isfile(os.path.join(dirname, filename))
    with open(os.path.join(dirname, filename)) as f:
        D = json.load(f)
    mysql = D["mysql"]
    return mysql["scheme"], mysql["server"], mysql["user"], mysql["pass"],mysql["database"]

scheme, server, u, password, database = getmysql_creds()
template = '{}://{}:{}@{}/{}'
cstring = template.format(scheme, u, password, server,database)

engine=sa.create_engine(cstring)

print(cstring) # you should be in your personal 
               # database space now, if you edited the JSON

**Q1** Write a function `create_connection_str(user,p,h,d)` that creates a connection string to connect to MySQL protocol given by host `h` and database `d`, for user `user` and password `p`. Test your function with your own username and password.

In [None]:
# Solution cell

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Testing cell
cstring = create_connection_str(u,u,'hadoop2.mathsci.denison.edu','book')
print(cstring)
engine = sa.create_engine(cstring)
conn = engine.connect()
query = "SELECT * FROM indicators0"
result_proxy = conn.execute(query)
fields = result_proxy.keys()
query2 = "SELECT * FROM topnames"
result_proxy2 = conn.execute(query2)
fields2 = result_proxy2.keys()
conn.close()
assert len(fields) == 5
assert len(fields2) == 4

**Q2** In reference to the `school` database, please list all courses (subject and number) that were not taught as classes during the year (hint: recall that outer joins can be used for such 'set difference' questions). Do this using your function `create_connection_str`, and using a `with` environment to make sure the connection you create is closed when you are done. Save the result of your query as `result_proxy`.

In [None]:
# Solution cell
# YOUR CODE HERE
raise NotImplementedError()
resultdf = pd.DataFrame(result_proxy.fetchall(),
                       columns = result_proxy.keys())

resultdf.head()

In [None]:
# Testing cell
assert len(resultdf) == 94
assert list(resultdf.columns) == ['coursesubject','coursenum']

**Q3** Use your function `create_connection_str` to create a connection string, engine, and connection to the `nycflights13` database. Then use `read_sql_table()` to create a `pandas` dataframe `df` from the `planes` table of `nycflights13`.

In [None]:
# Solution cell
# YOUR CODE HERE
raise NotImplementedError()
df.head()

In [None]:
# Testing cell
assert df.shape == (3322,9)
assert 'EMBRAER' in list(df['manufacturer'])

**Q4** Write a function `select_message(dbcon,u,f,i)` that uses a select query (that incorporates variables) to select the message from the `emails` table of the `enron` database, where the user is `u`, the folder is `f`, and the `emailid` is `i`. Return your answer as a data frame, and make use of `read_sql_query()` in `pandas`. Here `dbcon` is the connection, as you can see from the testing cell.

In [None]:
# Solution cell
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Testing cell
cstring = create_connection_str(u,u,'hadoop2.mathsci.denison.edu','enron')
engine = sa.create_engine(cstring)
with engine.connect() as connection:
    df = select_message(connection,'allen-p','inbox',39)

print(df['message'][0])

assert len(df) == 1
assert 'West Power Desk' in df['message'][0]
assert 'The method for distribution of the weekly reports has changed' in df['message'][0]