# Denison DA210/CS181 SW Lab #12 - Step 3

Before you get your checkpoints, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells**.

Make sure you fill in any place that says `# YOUR CODE HERE` or "YOUR ANSWER HERE".

---

#### Import Python modules

In [None]:
import pandas as pd
import os
import os.path
import json
import sqlalchemy as sa

#### Set credentials

In [None]:
def getsqlite_creds(dirname=".",filename="creds.json",source="sqlite"):
    """ Using directory and filename parameters, open a credentials file
        and obtain the two parts needed for a connection string to
        a local provider using the "sqlite" dictionary within
        an outer dictionary.  
        
        Return a scheme and a dbfile
    """
    assert os.path.isfile(os.path.join(dirname, filename))
    with open(os.path.join(dirname, filename)) as f:
        D = json.load(f)
    sqlite = D[source]
    return sqlite["scheme"], sqlite["dbdir"], sqlite["database"]

In [None]:
def buildConnectionString(source="sqlite_book"):
    scheme, dbdir, database = getsqlite_creds(source=source)
    template = '{}:///{}/{}.db'
    return template.format(scheme, dbdir, database)

---

## Part E: Table Creation

Recall the syntax for SQL `CREATE TABLE` statements:

CREATE TABLE [IF NOT EXISTS] _table-name_ ( \
&nbsp;&nbsp;&nbsp;&nbsp;_field-name_ _data-type_ _constraints_ \
&nbsp;&nbsp;&nbsp;&nbsp;[, _field-name_ _data-type_ _constraints_ ]* \
&nbsp;&nbsp;&nbsp;&nbsp;[, _table-constraint_ ]* \
)

#### **Before going further, create a new database!**

In SQLiteStudio, navigate to `Database` -> `Add a database`.  Click the green round "+" button to create a new local DB file (store it in your repository's `dbfiles` directory, along with `book` and `school`).  Name it `entertainment`.

Let's connect to our new empty `entertainment` database and add a new table:

In [None]:
# Build the conection string
cstring = buildConnectionString("sqlite_entertainment")
print("Connection string:", cstring)

# Connect to the database
engine = sa.create_engine(cstring)
connection = engine.connect()

In [None]:
# Write the create-table SQL statement
statement = """
CREATE TABLE IF NOT EXISTS movies (
    movieid INT NOT NULL,
    title VARCHAR(64) NOT NULL,
    release DATE NULL,
    rating FLOAT DEFAULT 0.0,
    PRIMARY KEY (movieid)
)
"""

# Execute the statement
try:
    connection.execute(statement)
except sa.exc.SQLAlchemyError as err:
    print("CREATE of movies failed:", str(err))

In [None]:
# Close the connection!
try:
    connection.close()
except:
    pass
del engine

In SQLiteStudio, double-click on the `entertainment` database, and you should now see a single table, `movies`.  If it doesn't show up, right-click on "Tables" and choose "Refresh all database schemas".

> You've reached the second checkpoint in the lab.  Make sure to have it signed off by the instructor or TA.
>
> Checkpoint 2: There is no `NULL` or `NOT NULL` constraint given for the `rating` field in the SQL statement above.  Inspect the `movies` table in SQLiteStudio.  What is the default when neither variant of this field constraint is specified?

---

# Part F: Table Population

#### Using an `INSERT INTO` statement

The SQL syntax to populate data into an existing table is straightforward:

INSERT INTO _table-name_ [ ( _column-list_ ) ] \
&nbsp;&nbsp;&nbsp;&nbsp;_field-name_ VALUES ( _value-list_ )

We first focus on populating one record at a time.  Note that we can use this syntax in one of two ways, depending on whether we are inserting all defined fields, or only an explicit set of fields.

In [None]:
# Insert all fields -- try this in SQLiteStudio using Tools -> Open SQL editor
stmt = """
    INSERT INTO movies
    VALUES (109445, 'Frozen', '2013-11-27', 7.3)
"""

In [None]:
# Insert only some fields -- try this in SQLiteStudio using Tools -> Open SQL editor
stmt = """
    INSERT INTO movies (title, movieid)
    VALUES ('Guardians of the Galaxy', 118340)
"""

If you tried out the above commands, you can delete the rows (or not) in the table using buttons above the data in the `Data` tab in SQLiteStudio before continuing.

#### Using `pandas` to insert an entire `DataFrame`

We can hard-code such insert statements, but this will become tedious quickly.  Instead, we can `pandas` to insert all of the rows in a `DataFrame`.

In [None]:
# Re-connect to the database
engine = sa.create_engine(cstring)
connection = engine.connect()

In [None]:
# The movie data we'll insert
DoL = {
    "movieid": [109445, 118340, 299536, 301518, 420818, 424694],
    "title": ["Frozen", "Guardians of the Galaxy", "Avengers: Infinity War",
              "Toy Story 4", "The Lion King", "Bohemian Rhapsody"],
    "release": ["2013-11-27", "2014-08-01", "2018-04-27",
                "2019-06-21", "2019-07-19", "2018-11-02"],
    "rating": [7.3, 7.9, 8.3, 7.6, 7.1, 8.0]
}

df = pd.DataFrame(DoL)
df

We'll use the `pandas` `DataFrame` method `to_sql`, with the following parameters:
* `name`: the name of the table (a string)
* `con`: the connection to the database (the result of `sa.create_engine().connect()`)
* `if_exists`: the action to take if the table exists (e.g., to append or overwrite)
* `index`: a Boolean indicating whether the `DataFrame` has an index that should be converted into a column

Note that we need to use `if_exists="append"` to append the rows to an existing table.  If the table doesn't already exist, then this will create a new table, but without the constraints (e.g., primary key) defined, so make sure to create the table first!

In [None]:
# Inserting directly from a DataFrame
df.to_sql("movies", con=connection, if_exists="append", index=False)

Let's see how we did by querying our `movies` table.

In [None]:
df2 = pd.read_sql_table("movies", connection)
df2

In [None]:
df3 = pd.read_sql_query("SELECT * FROM movies WHERE release > '2019-06-01'", connection)
df3

In [None]:
# Close the connection!
try:
    connection.close()
except:
    pass
del engine

---

## Part G: Try it Yourself!


**Q4:** Create a table, `television`, (programmatically in Python or via a SQL statement executed in SQLiteStudio) to represent the following table of data.  Make sure to use fields that make sense, and define a primary key.  Think about which fields may be allowed to be missing.

id | title    | service | episodes
:--|:--------|:---------|:-------------------------
0  | Stranger Things   | Netflix | 25
1  | The Crown | Netflix | 40
2  | Star Trek: Discovery   | Paramount+ | 55
3  | Star Trek: Lower Decks | Paramount+ | 20
4  | The Handmaid's Tale | Hulu | 46

In [None]:
# Re-connect to the database
engine = sa.create_engine(cstring)
connection = engine.connect()

# YOUR CODE HERE
raise NotImplementedError() # or comment out if doing this in SQLiteStudio

**Q5:** Now that you have your data, write a SQL statement to insert the first record.  You should hard-code your solution for this question.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Debugging cell
df = pd.read_sql_table("television", connection)
df

In [None]:
# Testing cell
df = pd.read_sql_table("television", connection)
assert df.shape == (1,4)

**Q6:** Build a `pandas` `DataFrame`, named `tv_df`, for the remaining 4 rows.  The table is copied again below:

id | title    | service | episodes
:--|:--------|:---------|:-------------------------
0  | Stranger Things   | Netflix | 25
1  | The Crown | Netflix | 40
2  | Star Trek: Discovery   | Paramount+ | 55
3  | Star Trek: Lower Decks | Paramount+ | 20
4  | The Handmaid's Tale | Hulu | 46

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

# Display the DataFrame
tv_df

In [None]:
# Testing cell
assert tv_df.shape == (4,4)
assert list(tv_df["service"]) == ["Netflix", "Paramount+", "Paramount+", "Hulu"]
assert list(tv_df["episodes"]) == [40, 55, 20, 46]

**Q7:** Use your dataframe from the previous question to insert the remaining four rows into the `television` table.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Debugging cell
df = pd.read_sql_table("television", connection)
df

In [None]:
# Testing cell
df = pd.read_sql_table("television", connection)
assert df.shape == (5,4)

assert df.episodes.max() == 55
assert df.episodes.min() == 20

query = """
    SELECT title
    FROM television
    WHERE episodes = 40
"""
df2 = pd.read_sql_query(query, connection)
assert df2.shape == (1,1)
assert df2.iloc[0,0] == "The Crown"

In [None]:
# Close the connection!
try:
    connection.close()
except:
    pass
del engine

> You've reached the third (and final) checkpoint in the lab.  Make sure to have it signed off by the instructor or TA.
>
> Checkpoint 3: What, if anything, would happen if you used `if_exists=replace` in your bulk insert statement?  What about `if_exists=fail`?
>
> _Hint_: You may want to take a look at the [documentation for the `DataFrame.to_sql` method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html).

---

---

## Part H

How much time (in minutes/hours) did you spend on this lab outside of class?

YOUR ANSWER HERE