# Module 8 Assignment


A few things you should keep in mind when working on assignments:

1. Run the first code cell to import modules needed by this assignment before proceeding to problems.
2. Make sure you fill in any place that says `# YOUR CODE HERE`. Do not write your answer anywhere else other than where it says `# YOUR CODE HERE`. Anything you write elsewhere will be removed or overwritten by the autograder.
3. Each problem has an autograder cell below the answer cell. Run the autograder cell to check your answer. If there's anything wrong in your answer, the autograder cell will display error messages.
4. Before you submit your assignment, make sure everything runs as expected. Go to the menubar, select Kernel, and Restart & Run all. If the notebook runs through the last code cell without an error message, you've answered all problems correctly.
5. Make sure that you save your work (in the menubar, select File → Save and CheckPoint).

-----

# Run Me First!

In [1]:
import sqlite3 as sql
import pandas as pd

from nose.tools import assert_equal, assert_true

-----

## Problem 1: Establishing a connection, getting a cursor

In the code cell below, we declare a function named `create_connection` that takes one function parameter: `file_path`, which is a string containing the file path to the SQLite3 database.

To complete this problem, finish writing the function `create_connection`:
- Establish a sqlite3 connection to the database 'file_path'.
- Create a cursor using the connection to this database.
- Return the connection to the database and the database cursor.

-----

In [2]:
# connect to a datbase
def create_connection(file_path):
    '''
    Creates and establishes a connection to a database
    
    Parameters
    ----------
    file_path: string containing path to create database
    
    Returns
    -------
    con: sqlite3 connection
    cur: sqlite3 database cursor object
    '''
    
    # YOUR CODE HERE
    con = sql.connect(file_path)
    cur = con.cursor()
    
    return con,cur

In [3]:
con, cur = create_connection('sql_files/m8.db')

cur.execute("PRAGMA table_Info('Courses')")
result = cur.fetchall()

assert_true('Subject' in result[0], msg='Connection is not established correctly.')

-----

## Problem 2: Selecting all data from a table

In the code cell below, we declare a function named `select_all` that takes one function parameters: `cur`, which is the cursor.

For this problem, the database has a **Courses** table.

To complete this problem, finish writing the function `select_all`:
- Use the cursor represented by `cur` to execute a query that selects all data from the table **Courses**.
- Use fetchall() function to fetch all the results from the cursor.
- Return the result returned by fetchall().

-----

In [4]:
def select_all(cur):
    '''
    Fetch all result in the table Courses
    
    Parameters
    ----------
    cur: sqlite3 cursor
    
    Returns
    -------
    All data fetched from the table
    '''
    
    # YOUR CODE HERE
    cur.execute('SELECT * FROM Courses')
    result = cur.fetchall()
    return result

In [5]:
data = select_all(cur)
assert_equal(len(data), 13, msg="Your answer does not match the solution.")
assert_true('ACCY' in data[0], msg="Your answer does not match the solution.")
assert_true(199 in data[0], msg="Your answer does not match the solution.")
assert_true(10033 in data[0], msg="Your answer does not match the solution.")
print("Courses:")
print(f'{"Subject":8s}{"CRN":4s}{"CourseNumber":10s}')
for row in data:
    print(f'{row[0]:8s}{row[1]:3}{row[2]:6}')
      

Courses:
Subject CRN CourseNumber
ACCY    199 10033
ACCY    199 69998
ACCY    200 29670
ACCY    201 36478
ART     102 62794
ART     150 65459
ART     310 64968
IE      300 51898
IE      512 35414
IE      360 61503
LAW     600 30836
LAW     604 31954
LAW     634 56475


-----

## Problem 3: Selecting all data into a DataFrame

In the code cell below, we declare a function named `select_all_to_dataframe` that takes one function parameters, `con`, which is the database connection.

For this problem, the database has a **Courses** table.

To complete this problem, finish writing the function `select_all_to_dataframe`:
- Use pandas read_sql function with database connect represented by `con` to load all data from the table **Courses** and load the result to a DataFrame
- Return the DataFrame.

-----

In [6]:
# Select data
def select_all_to_dataframe(con):
    '''
    Selects data from table Courses to a DataFrame.
    
    Parameters
    ----------
    con: sqlite3 connection
    
    Returns
    -------
    dataframe that contains all data in the table
    '''
    
    # YOUR CODE HERE
    df = pd.read_sql_query("SELECT * FROM Courses", con)
    return df

In [7]:
result = select_all_to_dataframe(con)
assert_equal(result.shape, (13,3), msg="Your answer does not match the solution")
result

Unnamed: 0,Subject,CourseNumber,CRN
0,ACCY,199,10033
1,ACCY,199,69998
2,ACCY,200,29670
3,ACCY,201,36478
4,ART,102,62794
5,ART,150,65459
6,ART,310,64968
7,IE,300,51898
8,IE,512,35414
9,IE,360,61503


-----

## Problem 4: Selecting data by subject into a DataFrame

In the code cell below, we declare a function named `select_data_by_subject` that takes two function parameters: `con`, which is the database connection, and `sub`, which is the subject to select.

For this problem, the database has a **Courses** table. The **Courses** table has a TEXT column **Subject**.

To complete this problem, finish writing the function `select_data_by_subject`:
- Use pandas read_sql function with database connect represented by `con` to read all data from the table **Courses** with **Subject** equals to the string represented by `sub`, load the results to a DataFrame.
- Return the DataFrame.

-----

In [8]:
df = pd.read_sql_query("SELECT * FROM Courses", con)
df

Unnamed: 0,Subject,CourseNumber,CRN
0,ACCY,199,10033
1,ACCY,199,69998
2,ACCY,200,29670
3,ACCY,201,36478
4,ART,102,62794
5,ART,150,65459
6,ART,310,64968
7,IE,300,51898
8,IE,512,35414
9,IE,360,61503


In [9]:
df2 = pd.read_sql_query("SELECT * FROM Courses \
                            WHERE Subject='ACCY'", con)
                        
df2

Unnamed: 0,Subject,CourseNumber,CRN
0,ACCY,199,10033
1,ACCY,199,69998
2,ACCY,200,29670
3,ACCY,201,36478


In [10]:
# Select data by subject
def select_data_by_subject(con, sub):
    '''
    Selects data from the Courses table by Subject.
    
    Parameters
    ----------
    con: sqlite3 connection
    sub: subject to select
    
    Returns
    -------
    dataframe that contains course info with particular subject
    '''
    
    # YOUR CODE HERE
    df = pd.read_sql_query("SELECT * FROM Courses \
                            WHERE Subject='sub'", con)
    return df

In [11]:
accy = select_data_by_subject(con, 'ACCY')
assert_equal(accy.shape, (4,3), msg="Your answer does not match the solution.")
accy

AssertionError: Tuples differ: (0, 3) != (4, 3)

First differing element 0:
0
4

- (0, 3)
?  ^

+ (4, 3)
?  ^
 : Your answer does not match the solution.

In [12]:
#Release database connection and cursor
cur.close()
con.close()