## The Movies Database

Given is the diagram of movies database consisting of 3 tables - Movies, Halls and Tickets

* Movie: This table has 4 columns - Movie_id (unique identifier for each Movie specific to language), Movie_name (Name of the movie), Language (Language of the Movie), Rating (Average rating given by viewers)
* Hall: This table has 3 columns - Hall_id (unique identifer for each Movie Hall), Hall_name (Name of the hall), Seating_capacity (maximum ticketed seats available in the hall)
* Ticket: This table has 3 columns - Movie_id (unique identifier for each Movie specific to language), Hall_id (unique identifer for each Movie Hall), Tickets_sold (number of tickets sold for the given Movie at the given Hall)

<img src="../../../images/movies_db.PNG" style="width:65vw"> <br>

<b>Tasks:</b>
1. Create an empty database named 'moviesdb' using connect method.
2. Create empty tables with connected relationships among the three tables as shown in above diagram. Using foreign key constraints and enforce referential integrity.
3. Extract the data from the three tables, links of which are provided in the code block. Load them into a dataframe and then into the table.
4. Write a query to extract data of which movie ran at over 80% of seating capacity and at which hall this was achieved. The query should extract all combinations of movie and hall with over 80% capacity.

In [1]:
movies_data_link = "../../../data/movie.csv"
halls_data_link = "../../../data/hall.csv"
tickets_data_link = "../../../data/ticket.csv"

In [2]:
import sqlite3
import pandas as pd 
import numpy as np 

db_name = "movies.db"


In [3]:
# GOT TO COMMAND LINE AND CREATE DB 
# SET PRAGMA foreign_keys = ON; in sqlite


In [4]:
moviesdb = sqlite3.connect(db_name)
moviesdb.execute("PRAGMA foreign_keys = 1")
val = moviesdb.execute("PRAGMA foreign_keys")
print(val.fetchall())

[(1,)]


In [5]:
# fetch contents of movie_table 

def get_movietable(db):
    query = """SELECT * FROM movie_table"""
    cur = db.execute(query)
    return cur.fetchall()

def get_anytable(db, table_name): 
    query = """SELECT * FROM {}""".format(table_name)
    cur = db.execute(query)
    return cur.fetchall()
    
def check_state(db, table_name): 
    state = "Empty"
    table = get_anytable(db, table_name)
    if table: 
        state = "Full" 
    return state


## Creating and adding data to Movie Table

In [6]:
# Creating movie table
movies_table_column_names  = ['movie_id', 'movie_name', 'langauge', 'rating']
col1, col2, col3, col4  = movies_table_column_names
drop_table_query = 'DROP TABLE IF EXISTS movie_table'
movies_table_query = """CREATE TABLE movie_table ( 
                      {} INTEGER NOT NULL PRIMARY KEY,
                      {} TEXT,
                      {} TEXT, 
                      {} INTEGER)""".format(col1, col2, col3, col4)

# Create a list of column names for the movies table. We will use this later

try:
    moviesdb.execute(drop_table_query)
    moviesdb.execute(movies_table_query)
except Exception as e:
    print(e)

info_var = moviesdb.execute("PRAGMA table_info([movie_table]);")
info_var.fetchall()

[(0, 'movie_id', 'INTEGER', 1, None, 1),
 (1, 'movie_name', 'TEXT', 0, None, 0),
 (2, 'langauge', 'TEXT', 0, None, 0),
 (3, 'rating', 'INTEGER', 0, None, 0)]

In [7]:
movie_table  = get_movietable(moviesdb)
state_movie_table = check_state(moviesdb, "movie_table")
print("State of movie table is {}".format(state_movie_table))

State of movie table is Empty


In [8]:
# Adding data to movie table
movies_data = pd.read_csv(movies_data_link)
movies_data.columns = movies_table_column_names


movies_data.to_sql(name='movie_table', con=moviesdb, index=False, if_exists='append')

# Checking if the movie table has any data
state_movie_table = check_state(moviesdb, "movie_table")
print("State of movie table is {}".format(state_movie_table))



State of movie table is Full


## Creating and adding data to Hall Table

In [9]:
# Create a list of column names for the hall table. We will use this later
hall_table_column_names  = ['hall_id', 'hall_name', 'seating_capacity']
col1, col2, col3  = hall_table_column_names
hall_table_name = "hall_table"

# drop table if it exists
drop_table_query = 'DROP TABLE IF EXISTS {}'.format(hall_table_name)
hall_table_query = """CREATE TABLE {} (
                      {} INTEGER PRIMARY KEY,
                      {} TEXT,
                      {} INTEGER
                       )""".format(hall_table_name, col1, col2, col3 )


try:
    moviesdb.execute(drop_table_query)
    moviesdb.execute(hall_table_query)
except Exception as e:
    print(e)

In [10]:
hall_table  = get_anytable(moviesdb, "hall_table")
state_hall_table = check_state(moviesdb, "hall_table")
print("State of hall table is {}".format(state_hall_table))

State of hall table is Empty


In [12]:
def df_to_table(csv_path, column_names, table_name, db ):
    df = pd.read_csv(csv_path)
    df.columns = column_names
    df.to_sql(name=table_name, con=db, index=False, if_exists='append')    
    state = check_state(db, table_name)
    return df, state 



hall_df , state_hall_table = df_to_table(halls_data_link, hall_table_column_names, hall_table_name, moviesdb)

print("State of hall table is {}".format(state_hall_table))


State of hall table is Full


## Creating and adding data to Ticket Table

In [13]:
# Create a list of column names for the hall table. We will use this later
ticket_table_column_names  = ['movie_id', 'hall_id', 'tickets_sold']
col1, col2, col3  = ticket_table_column_names
ticket_table_name = "ticket_table"

# drop table if it exists
drop_table_query = 'DROP TABLE IF EXISTS {}'.format(ticket_table_name)
ticket_table_query = """CREATE TABLE {} (
                      {} INTEGER,
                      {} INTEGER,
                      {} INTEGER,
                       FOREIGN KEY ({})
                       REFERENCES movie_table ({}), 
                       FOREIGN KEY ({})
                       REFERENCES hall_table ({}))""".format(ticket_table_name, col1, col2, col3, col1, col1, col2, col2  )

try:
    moviesdb.execute(drop_table_query)
    moviesdb.execute(ticket_table_query)
except Exception as e:
    print(e)

In [14]:

ticket_df , state_ticket_table = df_to_table(tickets_data_link, ticket_table_column_names, ticket_table_name, moviesdb )



In [28]:

info_var = moviesdb.execute("""SELECT movie_table.movie_name, hall_table.hall_name, ticket_table.tickets_sold 
                               FROM movie_table, hall_table, ticket_table
                               WHERE tickets_sold >= (SELECT 0.80*MAX(ticket_table.tickets_sold) FROM ticket_table) ORDER BY tickets_sold DESC""")
info_var.fetchall()

[('Kobali', 'Princessville Cinema', 148),
 ('Kobali', "Wang's Town Cinema", 148),
 ('Kobali', 'Ardour Movie Hall', 148),
 ('Kobali', 'VMC Hollowville', 148),
 ('Kobali', 'VMC Frocksburry', 148),
 ('Kobali', 'VMC Browns', 148),
 ('Kobali', 'VMC Quasiland', 148),
 ('Kobali', 'VMC Goodborough', 148),
 ('Kobali', 'Showtime Browns', 148),
 ('Kobali', 'Showtime Nathan Square', 148),
 ('Kobali', 'Showtime Rivermoore', 148),
 ('Kobali', "Showtime Shwimm's Market", 148),
 ('Kobali', 'Cangshin Boxes', 148),
 ('Kobali', 'Wurchester Cinema', 148),
 ('Kobali', 'Quagmire Movie Hall', 148),
 ('Senjiruven', 'Princessville Cinema', 148),
 ('Senjiruven', "Wang's Town Cinema", 148),
 ('Senjiruven', 'Ardour Movie Hall', 148),
 ('Senjiruven', 'VMC Hollowville', 148),
 ('Senjiruven', 'VMC Frocksburry', 148),
 ('Senjiruven', 'VMC Browns', 148),
 ('Senjiruven', 'VMC Quasiland', 148),
 ('Senjiruven', 'VMC Goodborough', 148),
 ('Senjiruven', 'Showtime Browns', 148),
 ('Senjiruven', 'Showtime Nathan Square', 148