# Denison CS-181/DA-210 Homework

---

## Single Table SQL Exercises

In the following set of exercises, the focus is on the SQL and not on the Python Programming.  You are encouraged to use an external client, like `TablePlus` to design, debug, and incrementally develop your SQL answers.  Like in the IC on basic SQL statements, in each question you modify so that the Python variable `query` has the string value of your SQL.

**Neatness and Readability Count:** Any SQL that has multiple clauses should be broken over multiple lines.  Keywords should be captialized.  Indentation can and should be used to make the SQL more readable.

In [1]:
import os
import sys
import lxml
import pandas as pd
from IPython.display import Markdown as md

def add_modules():
    """
    Starting at the current directory and proceeding up the file system
    tree, search for a directory named `modules`.  If found, and if not
    already there, add to the Python module search path.
    
    Params: None
    
    Return: None
    """
    directory = "."
    levels = 0
    while not os.path.isdir(os.path.join(directory, "modules")) and \
          levels < 5:
        directory = os.path.join(directory, "..")
        levels += 1
    module_path = os.path.abspath(os.path.join(directory, "modules"))
    if os.path.isdir(module_path):
        if not module_path in sys.path:
            sys.path.append(module_path)

add_modules()
import util

datadir = util.resolve_dir("dbfiles")

-----
**Q1:** In reference to the table `indicators`, write a query to find the unique country codes that appear.  Results should be in alphabetical order.

In [4]:
query = """
SELECT DISTINCT code
FROM indicators
ORDER BY code ASC
"""

result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
print("Row Count:", len(result_df))
result_df.head()

Row Count: 218


Unnamed: 0,code
0,ABW
1,AFG
2,AGO
3,ALB
4,AND


In [5]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT DISTINCT code
FROM indicators
ORDER BY code ASC

```

In [6]:
# Testing cell
assert True

-----
**Q2:** In reference to the table `indicators`, write a query to find the set of rows for which `gdp` is missing data.  Project the year and the code columns for such rows.  In a comment line, and thinking about the result, state what you might do to better understand the instances of missing gdp.

In [14]:
query = """
SELECT year, code
FROM indicators
WHERE gdp IS NULL
"""

result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
print("Row Count:", len(result_df))
result_df.head()

Row Count: 3202


Unnamed: 0,year,code
0,1960,ABW
1,1960,AGO
2,1960,ALB
3,1960,AND
4,1960,ARE


In [15]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT year, code
FROM indicators
WHERE gdp IS NULL

```

In [16]:
# Testing cell
assert True

-----
**Q3:** In reference to the table `indicators`, write a quey that obtains the rows where `cell` is **not missing** and where cell is non-zero.  Project columns code, year, and cell in increasing value of cell.

In [18]:
query = """
SELECT code, year, cell
FROM indicators
WHERE cell NOT NULL AND cell > 0.0
ORDER BY cell ASC
"""

result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
print("Row Count:", len(result_df))
result_df.head()

Row Count: 4623


Unnamed: 0,code,year,cell
0,JPN,1981,0.01
1,DNK,1982,0.01
2,NOR,1982,0.01
3,ARE,1985,0.01
4,AUT,1985,0.01


In [19]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT code, year, cell
FROM indicators
WHERE cell NOT NULL AND cell > 0.0
ORDER BY cell ASC

```

In [20]:
# Testing cell
assert True

-----
**Q4:** Suppose we want to find out in what country and in what year was the greatest number of cell phones.  The answer should be apparent from the results of the query and, ideally, the answer should consist of a single row.  Again we want year, code, and cell values.

In [22]:
query = """
SELECT year, code, MAX(cell)
FROM indicators
"""

result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
print("Row Count:", len(result_df))
result_df.head()

Row Count: 1


Unnamed: 0,year,code,MAX(cell)
0,2017,CHN,1469.88


In [23]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT year, code, MAX(cell)
FROM indicators

```

In [24]:
# Testing cell
assert True

-----
**Q5:** Use a subquery to select the top ten entries in indicators by population, then select the bottom three of those by gdp.  For both the subquery and the main query, you can project all available columns.

In [37]:
query = """
SELECT *
FROM (SELECT *
      FROM indicators
      ORDER BY pop DESC
      LIMIT 10)
ORDER BY gdp ASC
LIMIT 3
"""


result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
print("Row Count:", len(result_df))
result_df.head()

Row Count: 3


Unnamed: 0,year,code,pop,gdp,life,cell,imports,exports
0,2017,IND,1338.66,2652.55,68.8,1168.9,442983.0,296212.0
1,2018,IND,1352.62,2726.32,,,,
2,2011,CHN,1344.13,7551.5,75.4,986.25,1741430.0,1899280.0


In [38]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT *
FROM (SELECT *
      FROM indicators
      ORDER BY pop DESC
      LIMIT 10)
ORDER BY gdp ASC
LIMIT 3

```

In [39]:
# Testing cell
assert True

-----
**Q6:** In reference to the table indicators, write a query to find all rows with no missing data for any of the numeric fields.  You may project all columns, and rows should be ordered in decreasing order of year and increaseing order of country code.

In [45]:
query = """
SELECT *
FROM (SELECT *
      FROM indicators
      WHERE pop NOT NULL AND gdp NOT NULL AND life NOT NULL AND cell NOT NULL AND imports NOT NULL and exports NOT NULL
      ORDER BY year DESC)
ORDER BY code ASC
"""

result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
print("Row Count:", len(result_df))
result_df.head()

Row Count: 6958


Unnamed: 0,year,code,pop,gdp,life,cell,imports,exports
0,2015,ABW,0.1,2.69,75.7,0.14,1165.33,79.45
1,2014,ABW,0.1,2.65,75.6,0.14,1261.56,110.83
2,2013,ABW,0.1,2.58,75.4,0.14,1302.98,167.75
3,2012,ABW,0.1,2.53,75.3,0.14,1257.45,172.96
4,2010,ABW,0.1,2.39,75.0,0.13,1068.99,124.52


In [46]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT *
FROM (SELECT *
      FROM indicators
      WHERE pop NOT NULL AND gdp NOT NULL AND life NOT NULL AND cell NOT NULL AND imports NOT NULL and exports NOT NULL
      ORDER BY year DESC)
ORDER BY code ASC

```

In [47]:
# Testing cell
assert True

-----
**Q7:** Suppose, like in the book, we want a subset of the `indicators` data containing rows for the countries `CHN`, `IND`, `GBR`, `VNM`, and `RUS`, and for the years 2007 and 2017.  Construct a query that obtains all columns and just the specified rows.  (So each of the countries should have its two rows the two specified years.)  Order the result by country code and then by year.

In [54]:
query = """
SELECT *
FROM (SELECT *
      FROM indicators
      WHERE year IN ('2007', '2017')
      ORDER BY code ASC)
ORDER BY year DESC
"""

result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
print("Row Count:", len(result_df))
result_df.head()

Row Count: 436


Unnamed: 0,year,code,pop,gdp,life,cell,imports,exports
0,2017,ABW,0.11,2.7,76.0,,1188.69,89.08
1,2017,AFG,36.3,20.19,64.0,23.93,6804.74,850.38
2,2017,AGO,29.82,122.12,61.8,13.32,12010.2,30517.0
3,2017,ALB,2.87,13.03,78.5,3.63,5670.49,2463.2
4,2017,AND,0.08,3.01,,0.08,,


In [55]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT *
FROM (SELECT *
      FROM indicators
      WHERE year IN ('2007', '2017')
      ORDER BY code ASC)
ORDER BY year DESC

```

In [56]:
# Testing cell
assert True