# Denison CS-181/DA-210 Homework

---

## Basic SQL Flipped Class Practice

### TablePlus

For these practice exercises, you should debug/develop/try out your SQL in table plus, after you have downloaded and installed the software and created connections to the `book.db` database in the `dbfiles` folder of your class repository.

The link for TablePlus installation: https://tableplus.com/

### Instructions

Each of the questions below describe a query that can be solved using SQL.  We have made it so you do **not** have to understand the Python programming.  You simply put your SQL query as the value for the `query` string variable inside the triple double quotes. (And, as usual, delete the two-line sequence for where your answer goes.)

Using our own function in the `util` module, the cell submits the query to the approriate SQLite database, and, if the query was syntactically correct, creates and shows a pandas dataframe of the result.   

If there are any problems with execution, None is returned from the query function and we report an empty table.  If there is an exception, for instance, when we have invalid syntax, we print the message for the exception and print the query before returning None.

In [1]:
import os
import sys
import lxml
import pandas as pd
from IPython.display import Markdown as md

def add_modules():
    """
    Starting at the current directory and proceeding up the file system
    tree, search for a directory named `modules`.  If found, and if not
    already there, add to the Python module search path.
    
    Params: None
    
    Return: None
    """
    directory = "."
    levels = 0
    while not os.path.isdir(os.path.join(directory, "modules")) and \
          levels < 5:
        directory = os.path.join(directory, "..")
        levels += 1
    module_path = os.path.abspath(os.path.join(directory, "modules"))
    if os.path.isdir(module_path):
        if not module_path in sys.path:
            sys.path.append(module_path)

add_modules()
import util

datadir = util.resolve_dir("dbfiles")

**Example:** In the following, we show the solution for answering the query question: project the pop column for the indicators0 table.

In [2]:
query = """
SELECT pop FROM indicators0
"""

result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
result_df

Unnamed: 0,pop
0,1386.4
1,66.87
2,66.06
3,1338.66
4,325.15


In [3]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT pop FROM indicators0

```

In [4]:
# Testing cell
assert True

assert result_df.shape == (5,1)
assert 'pop' in result_df.columns
assert result_df.loc[0, 'pop'] == 1386.40

-----
**Q1** Project the life and cell columns of the indicators0 table.

In [5]:
query = """
SELECT life, cell FROM indicators0
"""

result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
result_df

Unnamed: 0,life,cell
0,76.4,1469.88
1,82.5,69.02
2,81.2,79.1
3,68.8,1168.9
4,78.5,391.6


In [6]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT life, cell FROM indicators0

```

In [7]:
# Testing cell
assert True

-----
**Q2** Project the code, year, and gdp columns from the indicators table, but restrict the result to 15 rows.

In [9]:
query = """
SELECT code, year, gdp FROM indicators LIMIT 15
"""

result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
result_df

Unnamed: 0,code,year,gdp
0,ABW,1960,
1,AFG,1960,0.54
2,AGO,1960,
3,ALB,1960,
4,AND,1960,
5,ARE,1960,
6,ARG,1960,
7,ARM,1960,
8,ASM,1960,
9,ATG,1960,


In [10]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT code, year, gdp FROM indicators LIMIT 15

```

In [11]:
# Testing cell
assert True

-----
**Q3** Repeat the previous query, but create alias/renamed columns of `C` for the code, `Y` for the year, and `GDP` (all caps) for the gdp.

In [12]:
query = """
SELECT code AS C, year AS Y, gdp AS GDP FROM indicators LIMIT 15
"""


result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
result_df

Unnamed: 0,C,Y,GDP
0,ABW,1960,
1,AFG,1960,0.54
2,AGO,1960,
3,ALB,1960,
4,AND,1960,
5,ARE,1960,
6,ARG,1960,
7,ARM,1960,
8,ASM,1960,
9,ATG,1960,


In [13]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT code AS C, year AS Y, gdp AS GDP FROM indicators LIMIT 15

```

In [14]:
# Testing cell
assert True

-----
**Q4** Use a revised column ordering to repeat the last problem, but with GDP as the first column, followed by Y and then C.

In [22]:
query = """
SELECT gdp AS GDP, year AS Y, code AS C FROM indicators LIMIT 15
"""

result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
result_df

Unnamed: 0,GDP,Y,C
0,,1960,ABW
1,0.54,1960,AFG
2,,1960,AGO
3,,1960,ALB
4,,1960,AND
5,,1960,ARE
6,,1960,ARG
7,,1960,ARM
8,,1960,ASM
9,,1960,ATG


In [23]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT gdp AS GDP, year AS Y, code AS C FROM indicators LIMIT 15

```

In [24]:
# Testing cell
assert True

-----
**Q5** Generate a table with all of the columns of the countries table, limiting to a total of 12 rows

In [26]:
query = """
SELECT * FROM countries LIMIT 12
"""


result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
result_df

Unnamed: 0,code,country,region,income,land
0,ABW,Aruba,Latin America & Caribbean,High income,180.0
1,AFG,Afghanistan,South Asia,Low income,652860.0
2,AGO,Angola,Sub-Saharan Africa,Lower middle income,1246700.0
3,ALB,Albania,Europe & Central Asia,Upper middle income,27400.0
4,AND,Andorra,Europe & Central Asia,High income,470.0
5,ARE,United Arab Emirates,Middle East & North Africa,High income,71020.0
6,ARG,Argentina,Latin America & Caribbean,Upper middle income,2736690.0
7,ARM,Armenia,Europe & Central Asia,Upper middle income,28470.0
8,ASM,American Samoa,East Asia & Pacific,Upper middle income,200.0
9,ATG,Antigua and Barbuda,Latin America & Caribbean,High income,440.0


In [27]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT * FROM countries LIMIT 12

```

In [28]:
# Testing cell
assert True

-----
**Q6** Project all the columns of the countries table, but present them in descending order of the code field, and limit to a total of 10 rows

In [30]:
query = """
SELECT * FROM countries ORDER BY code DESC LIMIT 12
"""

result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
result_df

Unnamed: 0,code,country,region,income,land
0,ZWE,Zimbabwe,Sub-Saharan Africa,Lower middle income,386850.0
1,ZMB,Zambia,Sub-Saharan Africa,Lower middle income,743390.0
2,ZAF,South Africa,Sub-Saharan Africa,Upper middle income,1213090.0
3,YEM,"Yemen, Rep.",Middle East & North Africa,Low income,527970.0
4,XKX,Kosovo,Europe & Central Asia,Upper middle income,
5,WSM,Samoa,East Asia & Pacific,Upper middle income,2830.0
6,VUT,Vanuatu,East Asia & Pacific,Lower middle income,12190.0
7,VNM,Vietnam,East Asia & Pacific,Lower middle income,310070.0
8,VIR,Virgin Islands (U.S.),Latin America & Caribbean,High income,350.0
9,VGB,British Virgin Islands,Latin America & Caribbean,High income,150.0


In [31]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT * FROM countries ORDER BY code DESC LIMIT 12

```

In [32]:
# Testing cell
assert True

-----
**Q7** Project all columns of topnames, but order first by the sex column and then by descending order of the year.  Restrict the result to 20 rows.

In [33]:
query = """
SELECT sex, * FROM topnames ORDER BY year DESC LIMIT 12
"""

result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
result_df

Unnamed: 0,sex,year,sex.1,name,count
0,Male,2018,Male,Liam,19837
1,Female,2018,Female,Emma,18688
2,Male,2017,Male,Liam,18798
3,Female,2017,Female,Emma,19800
4,Male,2016,Male,Noah,19117
5,Female,2016,Female,Emma,19496
6,Male,2015,Male,Noah,19635
7,Female,2015,Female,Emma,20455
8,Male,2014,Male,Noah,19305
9,Female,2014,Female,Emma,20936


In [34]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT sex, * FROM topnames ORDER BY year DESC LIMIT 12

```

In [35]:
# Testing cell
assert True

**Q8** Generate a table from topnames with columns year, name, and count for the rows where the count exceeds 90000. Sort the result by descending count values

In [37]:
query = """
SELECT year, name, count FROM topnames WHERE count > 90000 ORDER BY count DESC
"""

result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
result_df

Unnamed: 0,year,name,count
0,1947,Linda,99689
1,1948,Linda,96211
2,1947,James,94757
3,1957,Michael,92704
4,1949,Linda,91016
5,1956,Michael,90656
6,1958,Michael,90517


In [38]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT year, name, count FROM topnames WHERE count > 90000 ORDER BY count DESC

```

In [39]:
#### Testing cell
assert True

**Q9** Generate a table from topnames with columns year, name, and count for the rows where the count exceeds 90000 and also the sex is Female.

In [46]:
query = """
SELECT year, name, count FROM topnames WHERE count > 90000 AND sex = "Female"
"""

result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
result_df

Unnamed: 0,year,name,count
0,1947,Linda,99689
1,1948,Linda,96211
2,1949,Linda,91016


In [47]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT year, name, count FROM topnames WHERE count > 90000 AND sex = "Female"

```

In [48]:
# Testing cell
assert True

**Q10** Find the unique Female names from the topnames table.

In [51]:
query = """
SELECT DISTINCT name FROM topnames
"""


result_df = util.sqlite_query(datadir, "book", query)
if result_df is None:
    print("EMPTY TABLE RESULT")
result_df

Unnamed: 0,name
0,Mary
1,John
2,Robert
3,James
4,Linda
5,Michael
6,David
7,Lisa
8,Jennifer
9,Jessica


In [52]:
md(f"""\n```sql\n{query}\n```""")


```sql

SELECT DISTINCT name FROM topnames

```

In [53]:
# Testing cell
assert True