# Denison CS181/DA210 SW Lab #12 - Step 1

Before you turn this problem in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells** (in the menubar, select Kernel$\rightarrow$Restart And Run All).

Make sure you fill in any place that says `# YOUR CODE HERE` or "YOUR ANSWER HERE".

---

#### Import Python modules and load "SQL Magic"

In [1]:
import pandas as pd
import os
import os.path
import json
import sqlalchemy as sa

#### Set credentials

In [2]:
def getsqlite_creds(dirname=".",filename="creds.json",source="sqlite"):
    """ Using directory and filename parameters, open a credentials file
        and obtain the two parts needed for a connection string to
        a local provider using the "sqlite" dictionary within
        an outer dictionary.  
        
        Return a scheme and a dbfile
    """
    assert os.path.isfile(os.path.join(dirname, filename))
    with open(os.path.join(dirname, filename)) as f:
        D = json.load(f)
    sqlite = D[source]
    return sqlite["scheme"], sqlite["dbdir"], sqlite["database"]

In [3]:
def buildConnectionString(source="sqlite_book"):
    scheme, dbdir, database = getsqlite_creds(source=source)
    template = '{}:///{}/{}.db'
    return template.format(scheme, dbdir, database)

cstring = buildConnectionString("sqlite_book")
print("Connection string:", cstring)

Connection string: sqlite:///../../dbfiles/book.db


---

## Part A: Connecting to a SQL DB using `sqlalchemy`

For this lab, we'll use another powerful library, `sqlalchemy`, which enables us to programmatically access (query and modify) databases using several protocol schemes and database system choices.  (We'll still use local SQLite databases, though.)

First, we need to open a _connection_.  Note that we'll also need to close this connection, or your computer will leave resources actively connected to the database!

In [4]:
# Connect to the database
engine = sa.create_engine(cstring)
connection = engine.connect()

Now that we have a connection, we can execute queries.  This is similar to what we've done so far with "SQL Magic".

In [5]:
# Build the query
query = """
SELECT * FROM indicators0
"""

# Execute the query
result_proxy = connection.execute(query)

# Get the results from the "proxy"
result_list = result_proxy.fetchall()
result_list

[('CHN', 1386.4, 12143.5, 76.4, 1469.88),
 ('FRA', 66.87, 2586.29, 82.5, 69.02),
 ('GBR', 66.06, 2637.87, 81.2, 79.1),
 ('IND', 1338.66, 2652.55, 68.8, 1168.9),
 ('USA', 325.15, 19485.4, 78.5, 391.6)]

The result of this query is a list of tuples.  Each tuple corresponds to a record in the result.

We can easily convert this result to a `pandas DataFrame` by treating it as a LoL (recall that tuples are effectively immutable lists).

In [6]:
# Build a DataFrame from the result
fields = result_proxy.keys()
df1 = pd.DataFrame(result_list, columns=fields)
df1

Unnamed: 0,code,pop,gdp,life,cell
0,CHN,1386.4,12143.5,76.4,1469.88
1,FRA,66.87,2586.29,82.5,69.02
2,GBR,66.06,2637.87,81.2,79.1
3,IND,1338.66,2652.55,68.8,1168.9
4,USA,325.15,19485.4,78.5,391.6


Of course, don't forget to close the connection!

In [7]:
# Close the connection!
try:
    connection.close()
except:
    pass
del engine

---

## Part B: Try it Yourself!

**Q1:** Write code to issue a SQL query for all rows and column in the `countries` table of the `book` database.  Retrieve _all_ rows from the result, and use that list-of-tuples data structure (without converting to a `pandas` `DataFrame`) to determine the land area of Zimbabwe (code `ZWE`; the last record in `countries`).  Put this value in a variable `zwe_land`.

Make sure the connection is closed upon completion.

In [8]:
# Connect to the database
engine = sa.create_engine(cstring)
connection = engine.connect()

# YOUR CODE HERE
# raise NotImplementedError()
query = """
SELECT land
FROM countries
ORDER BY code DESC
LIMIT 1
"""

# Execute the query
result_proxy = connection.execute(query)

# Get the results from the "proxy"
result_list = result_proxy.fetchall()
zwe_land = int(result_list[0][0])
zwe_land


# Close the connection!
try:
    connection.close()
except:
    pass
del engine

In [9]:
# Testing cell
assert zwe_land == 386850

> You've reached the first checkpoint in the lab.  Make sure to have it signed off by the instructor or TA.
>
> Checkpoint 1: Would it be more efficient to update your SQL query to return just a single value?  Why or why not?  What if we wanted the largest land area?

---

## Part C: Interface between `sqlalchemy` and `pandas`

The `pandas` library interfaces with `sqlalchemy`, providing two functions for even easier processing of the results of our SQL queries:
- `pandas.read_sql_table`: returns all records in a table (effectively `SELECT * FROM table`)
- `pandas.read_sql_query`: returns all records resulting from a SQL query

In [10]:
# Re-connect to the database
engine = sa.create_engine(cstring)
connection = engine.connect()

In [11]:
# Read an entire table into a pandas DataFrame
df2 = pd.read_sql_table("indicators0", con=connection)
df2

Unnamed: 0,code,pop,gdp,life,cell
0,CHN,1386.4,12143.5,76.4,1469.88
1,FRA,66.87,2586.29,82.5,69.02
2,GBR,66.06,2637.87,81.2,79.1
3,IND,1338.66,2652.55,68.8,1168.9
4,USA,325.15,19485.4,78.5,391.6


In [12]:
# Read the results of a SQL query into a pandas DataFrame
query = """
SELECT * FROM indicators0
"""

df3 = pd.read_sql_query(query, con=connection)
df3

Unnamed: 0,code,pop,gdp,life,cell
0,CHN,1386.4,12143.5,76.4,1469.88
1,FRA,66.87,2586.29,82.5,69.02
2,GBR,66.06,2637.87,81.2,79.1
3,IND,1338.66,2652.55,68.8,1168.9
4,USA,325.15,19485.4,78.5,391.6


**Q2:** Write a SQL query to determine the country with the largest land area.  You should use `read_sql_query()`, your query result should be a single record (with fields `code`, `country`, and `land`), and you should store it in the `pandas DataFrame` `df_q2`.

In [13]:
# YOUR CODE HERE
# raise NotImplementedError()
query = """
SELECT code, country, land
FROM countries
ORDER BY land DESC
LIMIT 1
"""

# # Execute the query
# result_proxy = connection.execute(query)

# # Get the results from the "proxy"
# result_list = result_proxy.fetchall()
# zwe_land = int(result_list[0][0])
# zwe_land

# Display the DataFrame
df_q2 = pd.read_sql_query(query, con=connection)
df_q2

Unnamed: 0,code,country,land
0,RUS,Russian Federation,16376900.0


In [14]:
# Testing cell
assert df_q2.shape == (1,3)
assert df_q2.loc[0, "land"] > 16000000

In [15]:
# Close the connection!
try:
    connection.close()
except:
    pass
del engine