## The Movies Database

Given is the diagram of movies database consisting of 3 tables - Movies, Halls and Tickets

* Movie: This table has 4 columns - Movie_id (unique identifier for each Movie specific to language), Movie_name (Name of the movie), Language (Language of the Movie), Rating (Average rating given by viewers)
* Hall: This table has 3 columns - Hall_id (unique identifer for each Movie Hall), Hall_name (Name of the hall), Seating_capacity (maximum ticketed seats available in the hall)
* Ticket: This table has 3 columns - Movie_id (unique identifier for each Movie specific to language), Hall_id (unique identifer for each Movie Hall), Tickets_sold (number of tickets sold for the given Movie at the given Hall)

<img src="../../../images/movies_db.PNG" style="width:65vw"> <br>

<b>Tasks:</b>
1. Create an empty database named 'moviesdb' using connect method.
2. Create empty tables with connected relationships among the three tables as shown in above diagram. Using foreign key constraints and enforce referential integrity.
3. Extract the data from the three tables, links of which are provided in the code block. Load them into a dataframe and then into the table.
4. Write a query to filter out halls with large number of seats. We want halls in the top 20% based on seating capcity
5. Write a query to extract data of which movie ran at over 80% of seating capacity and at which hall this was achieved. The query should extract all combinations of movie and hall with over 80% capacity.


In [1]:
movies_data_link = "https://raw.githubusercontent.com/colaberry/DSin100days/master/data/movie.csv"
halls_data_link = "https://raw.githubusercontent.com/colaberry/DSin100days/master/data/hall.csv"
tickets_data_link = "https://raw.githubusercontent.com/colaberry/DSin100days/master/data/ticket.csv"

In [2]:
import sqlite3
import pandas as pd 
import numpy as np 

db_name = "movies.db"


Your first task is to connect to a db with the name ```db_name```. Directly connecting to the DB will create the DB automatically. Once you connect to the DB you will need enable foreign keys. Look to the Data Modeling notebook to figure out how to enable foreign keys. 


In [3]:
moviesdb = "####"
"####" # execute to set foreign keys to ON 
val = moviesdb.execute("PRAGMA foreign_keys") # get foreign key state

foreign_key =  val.fetchall()[0][0]
if foreign_key == 1: 
    fenabled = "Yes"
else: 
    fenabled = "No"
    
print("Foreign key has been enabled: {}".format(fenabled))

Foreign key has been enabled: Yes


Foreign key has been enabled: Yes


## Creating and adding data to Movie Table

In this section you will create the movie_table table in the sql database. Then you will add data to the table. You will have to write multiline sql queries for all this. Make sure you review how to write multiline sql queries and how to add primary keys. 

The first query involves writing a query for droping the movie_table if it exists. In the second query involves creating the movie table with the name ```movie_table``` make sure you set the ```movie_id``` column as a not null primary key. Make sure that you check the data type the column. 

Example: [Column Name] [Data type] 

We suggest using ```"{}".format(var)``` to populate the ```movie_table_query``` since you can easily switch thing the columns names. 

The last part you will have to write will be a query that provides table_info. Look at how to do this [here ](https://www.sqlitetutorial.net/sqlite-tutorial/sqlite-describe-table/). Table info will show if you have correctly assigned a primary key or not. The last column of the first row should show a value of 1 if the primary key has been properly assigned.


Important Note: If you run this code and see ```FOREIGN KEY constraint failed``` that means that you have already created a database with the ```movie_table```. It is best to delete the database first and then rerun the code. This way there are no issues. You can also the shell command ```!rm dbname.db``` to remove the database directly from the jupyter notebook. This makes it easier to run the code and make sure that you dont have to go into the jupyter file system. 


In [4]:
# Creating movie table
# Keep these column names. Do not change them!
movies_table_column_names  = ['movie_id', 'movie_name', 'langauge', 'rating']
col1, col2, col3, col4  = movies_table_column_names

# There are two queries to fill. First one to drop the table. This is approx 1 line. 
# The second query is the one to create a table called movie_table which is approx 5 lines 
# of code. Remember to use .format to substitute {} with string. 
drop_table_query = "####"
movies_table_query = "####"
# This part checks if the table creation and dropping executes without error. 
# If an exception is raised it is printed. 
try:
    moviesdb.execute(drop_table_query)
    moviesdb.execute(movies_table_query)
except Exception as e:
    print(e)

# This part describes each column of the table. Look up the sql command table info 
# for sqlite. You will need to use fetchall to show the output of query
info_var = "####"
movie_table_describe = "####"
print(movie_table_describe)

[(0, 'movie_id', 'INTEGER', 1, None, 1), (1, 'movie_name', 'TEXT', 0, None, 0), (2, 'langauge', 'TEXT', 0, None, 0), (3, 'rating', 'INTEGER', 0, None, 0)]


[(0, 'movie_id', 'INTEGER', 1, None, 1), (1, 'movie_name', 'TEXT', 0, None, 0), (2, 'langauge', 'TEXT', 0, None, 0), (3, 'rating', 'INTEGER', 0, None, 0)]


Next we are going to write two functions:  ```get_movietable``` and ```check_state```. The first one will be used to query the database and get the table ```movie_table``` and the second one will be used to check if the table is full or empty.

In [5]:
# Write the function get_movietable. You will need to write a query to select the movie_table from the database. Execute the query and return the cursor.  
def get_movietable(db):
    query = "####"
    cur = "####"
    return "####"


# Write the function get_anytable. This function is very similar to movietable.

def get_anytable(db, table_name): 
    query = "####"
    cur = "####"
    return cur.fetchall()

# Write the check state function. This takes two inputs, the database and table name. We define the default state as empty then we can use the 
# get_anytable function to get the table. Then using an if statement we can assign the state as full and return state.
def check_state(db, table_name): 
    state = "Empty"
    table = "####"
    if table: 
        state = "Full" 
    return state

# Get the outputs from the functions above. 
movie_table  = get_movietable(moviesdb)
state_movie_table = check_state(moviesdb, "movie_table")
print("State of movie table is {}".format(state_movie_table))

State of movie table is Empty


State of movie table is Empty

In [6]:
# You will need to read the movies data from movies_data_link. Use pandas read csv for this purpose. 
movies_data = "####"

# Set the column names from for movies_data to the column names we have given above
movies_data.columns = movies_table_column_names

# We then have covert the movies_data datatable to a sql table. Remember you are appending the data to movie_table
# Check the arguments of coverting function find out how to append. When you append ensure that you set index=False
# Also make sure you specify that the con=moviesdb 
movies_data."####"

# Use check state to check the state of the movie_table table and print the result
state_movie_table = "####"
print("State of movie table is {}".format(state_movie_table))

State of movie table is Full


State of movie table is Full

## Creating and adding data to Hall Table

In [7]:
# Same set of instructions as we did for movie table
# Keep these column names. Do not change them!
hall_table_column_names  = ['hall_id', 'hall_name', 'seating_capacity']
col1, col2, col3  = hall_table_column_names
hall_table_name = "hall_table"

# Same game as the movie table query. First write a query to drop the 
# hall table then write a query to create a new table. Remember to use .format()
# to fill the multiline query with the column names. 
drop_table_query = "####"

# This query is approx 5 lines
hall_table_query = "####"

# Similar to above this part checks the if the query runs without any issues
try:
    moviesdb.execute(drop_table_query)
    moviesdb.execute(hall_table_query)
except Exception as e:
    print(e)

In [8]:
# Here use the get any table function with the hall table name to get the hall table
# Then check the state of the hall table
hall_table  = "####"
state_hall_table = "####"
print("State of hall table is {}".format(state_hall_table))

State of hall table is Empty


State of hall table is Empty


In [9]:
# Here we will write a function to covert the hall table data from pandas dataframe to a sql table 
# and add it to the movies database. We did this earlier with the movies table but we are converting
# the code above to a function, this is because we will be reusing the code again for tickets table
def df_to_table(csv_path, column_names, table_name, db):
    
    # Read the csv file given a csv path 
    df = "####"
    df.columns = "####"
    
    # Similar to what we did with movie table. Convert df to a sql table remember to use the append
    # functionality and make sure that con=db and index=False. Otherwise tables will not append
    "####" 
    
    # Check the state of the table in the db and return both the dataframe and teh state.
    state = "####"
    return df, state 


# Use df to table to get the state hall table and hall_df 
hall_df , state_hall_table = "####"

print("State of hall table is {}".format(state_hall_table))

State of hall table is Full


State of hall table is Full


## Creating and adding data to Ticket Table

This part is a bit tricky since you need to add movie table and hall table columns as foreign keys create the ticket table. Rest of the code is similar to what we had above. You can look at an example of creating foreign keys [here](https://www.sqlitetutorial.net/sqlite-tutorial/sqlite-describe-table/).


In [10]:
# Create a list of column names for the hall table. We will use this later
ticket_table_column_names  = ['movie_id', 'hall_id', 'tickets_sold']
col1, col2, col3  = ticket_table_column_names
ticket_table_name = "ticket_table"

# Drop the ticket table if it exists
drop_table_query = "####"

# The ticket table has no primary keys but it has two foreign keys attached to it. One is the movie_id that 
# comes from movie table and hall_id which comes from hall table. This query is approx 8 lines
ticket_table_query = "####"
try:
    moviesdb.execute(drop_table_query)
    moviesdb.execute(ticket_table_query)
except Exception as e:
    print(e)

In [11]:
# Use the df to table function to get the ticket_df and state_ticket_table. Your inputs will be the ticket sales data,
# the columns names for ticket table column, the ticket table name and the db. 
ticket_df , state_ticket_table = "####"

## Get rows with  large seating capacity 
At this stage all the data tables ```movie_table```, ```hall_table``` and ```ticket_table``` should be in the database. 
In this part,  you will query the database to extract the right data. The queries you will have to write will be multiple lines. The first query is get the halls that extract largest number of tickets sold. We will set the threshold of tickets sold to be the top 20% in the of the seating capacity. So you need to filter the data accordingly. The query requires you to select and display all the following columns: 

- Movie name
- Hall name  
- Seating Capacity
- tickets sold

Remember you will need to use inner joins to combine the three tables on the movie id and hall id columns. Finally you will also need to order by the seating capacity and convert the result to a dataframe

In [13]:

info_var = "####"
filtered_result = "####"
filtered_df = "####"
filtered_df.head()

Unnamed: 0,0,1,2,3
0,The Unimaginable,Wurchester Cinema,150,100
1,Ramen loving Ronin,Wang's Town Cinema,150,106
2,The Unimaginable,Princessville Cinema,150,136
3,Kobali,Quagmire Movie Hall,150,114
4,Senjiruven,VMC Browns,150,73


Your resulting dataframe results should look like this <br>
array([['The Unimaginable', 'Wurchester Cinema', 150, 100], <br>
       ['Ramen loving Ronin', "Wang's Town Cinema", 150, 106], <br>
       ['The Unimaginable', 'Princessville Cinema', 150, 136], <br>
       ['Kobali', 'Quagmire Movie Hall', 150, 114], <br>
       ['Senjiruven', 'VMC Browns', 150, 73]], dtype=object) <br>

[('The Unimaginable', 'Princessville Cinema', 150), <br>
 ('Kobali', 'Princessville Cinema', 150), <br>
 ('Lakewalker', 'VMC Browns', 150), <br>
 ('Lakewalker', 'Ardour Movie Hall', 150), <br>
 ('Senjiruven', 'VMC Frocksburry', 150), <br>
 ('Lakewalker', 'VMC Quasiland', 150), <br>
 ('La Belle', 'Ardour Movie Hall', 150), <br>
 ('Senjiruven', 'Ardour Movie Hall', 150), <br>
 ('The Unimaginable', 'Showtime Rivermoore', 120)] <br>

## Get rows with Maximum occupancy
In this query you will pull data based on maximum occupancy. We define max occupancy as the ratio of tickets sold vs seating capacity. We want this ratio to be higher than 0.80. You will have to modify the query above. In order to do this calculation you will need to cast the tickets sold and seating capacity columns as floats. 


In [14]:

info_var = "####"
filtered_result = "####"
filtered_df = "####"
filtered_df.head()

Unnamed: 0,0,1,2,3,4
0,The Unimaginable,Showtime Rivermoore,120,120,1.0
1,Ramen loving Ronin,Showtime Rivermoore,119,120,0.991667
2,Senjiruven,VMC Frocksburry,148,150,0.986667
3,La Belle,Ardour Movie Hall,147,150,0.98
4,Kobali,Showtime Shwimm's Market,116,120,0.966667


array([['The Unimaginable', 'Showtime Rivermoore', 120, 120, 1.0], <br>
       ['Ramen loving Ronin', 'Showtime Rivermoore', 119, 120, <br>
        0.9916666666666667], <br>
       ['Senjiruven', 'VMC Frocksburry', 148, 150, 0.9866666666666667], <br>
       ['La Belle', 'Ardour Movie Hall', 147, 150, 0.98], <br>
       ['Kobali', "Showtime Shwimm's Market", 116, 120, <br>
        0.9666666666666667]], dtype=object) <br>
 

In [None]:
import requests
from time import gmtime, strftime
from IPython.display import clear_output

user_ns = tuple(get_ipython().user_ns.items())
user_id = ''
lab_name = ''
import os

def display_result(result, res_code):
    from IPython.display import display, HTML
    js = "<style>p.res {background-color: lightblue;} p.pas {color: green;text-align: center;font-size: 150%;} p.fail {color: red;text-align: center;font-size: 150%;}  p.wait {color: blue;text-align: center;font-size: 150%;} p.center {text-align: center;font-size: 200%;}</style>"
    if res_code == 'PASS':
        #js = js + "<script>$('#notebook-container').after('<p id=lastidp class=pas>%s</p>')</script>"   % result
        js = js + "<script>$('#lastidp').replaceWith('<p id=lastidp class=pas>%s</p>')</script>)"   % result
    elif res_code == 'FAIL':
        js = js + "<script>$('#lastidp').replaceWith('<p id=lastidp class=fail>%s</p>')</script>"   % result
    else:
        js = js + """<script>
        if($('#lastidp').length){
        $('#lastidp').replaceWith('<p id=lastidp class=wait>%s</p>')
        }else{
        $('#notebook-container').after('<p id=lastidp class=wait>%s</p>')
        }
        </script>
        """   % (result,result)
    display(HTML(js))

display_result('','WAIT')
    
# if value_type == int:
for name, value in user_ns:
    if name == 'user_id':
        user_id = value
    if name == 'nb_name':
        lab_name = value
        lab_name = os.path.basename(lab_name).split('.')[0]

input_content_file = lab_name + '.ipynb'

import random
n = random.randint(1001,3001)
f = {'file': ( 'rnd' +  str(n) + '_' + input_content_file, open(input_content_file,'rb'))}
d = {'user': user_id}
r = requests.post('http://autograde.refactored.ai/uploader', data=d, files=f)
#r = requests.post('http://localhost:5000/uploader', data=d, files=f)
tm = strftime("%Y-%m-%d %H:%M:%S", gmtime())
msg = r.json()['result']['message'][0]
res = r.json()['result']['result']
answerid = r.json()['answerid']
result = tm + ' - ' + msg + '    Answerid is : ' + str(answerid) 
#score = r.json()['result']['score']
#result = tm + ' - ' + msg + '    Your result is : ' + res + '     And the score is : ' + str(score)
#print(result)
display_result(result,'WAIT')

import time
count = 0
while True:
    #r = requests.get('http://localhost:5000/get_result?answer_id=' + str(answerid))
    r = requests.get('http://autograde.refactored.ai/get_result?answer_id=' + str(answerid))
    res = r.json()['result']['result']
    if res != 'WAIT':
        tm = strftime("%Y-%m-%d %H:%M:%S", gmtime())
        msg = r.json()['result']['message'][0]
        res = r.json()['result']['result']
        score = r.json()['result']['score']
        result = tm + ' - ' + msg + ',    Your result is : ' + res + '    And your score is : ' + str(score) 
        clear_output(wait=True)
        break
    else:
        if(count ==0):
            print('Waiting for your score......')
        count = count + 1
        time.sleep(10)
        if(count > 10):
            result = 'Error occurred during evaluation. Please resubmit.'
            break

display_result(result,res)