# Fandango Seating Analysis

Create a heat-map of a theater seating-chart indicating the average demand of a seat given its position relative to the screen. Do this using a sort of Gaussian kernel density estimation where the value of a position in a theater corresponds to the average fraction of open seats in the theater when a seat at that position is reserved.

**Seating Convention Notes:**

* Every seat is labeled with a letter (indicating row) and a 1-indexed number (indicating column).
    * Some theaters, however, indicate wheelchair seats with the character combination "WC" instead of the row letter to which that seat belongs.
* Row letters increase from the **front** to the **back** of the theater (from closest to farthest from the screen).
* Column numbers increase from **house right** to **house left** (i.e., right to left when the screen is in front of you).
* Not every row is guaranteed to have the same number of columns.
* Sometimes, a row may have missing column numbers due to walkways, handicap spaces, or unusual seating arrangements.

**Visualization Notes:**

* If a column number is missing for a given row in a seating chart, I will assume that the lack of this column number accounts for a physical gap in the seating chart. Thus, I will take the number of columns in a row to be equal to the highest column number in that row specified by the seating chart. All resulting gaps will be treated as seats with 0 demand.
* Since theaters come in many shapes and sizes, I will map all seating charts onto a standardized square grid. This mapping process is 2 steps:
  1. Map the seats onto a rectangular space such that if a theater has N rows and M columns, the rectangle can be populated with N * M square-packed circles of equal radius (where each circle represents 1 seat located approximately at the circle's center).
    * We can account for leg-room by multiplying the vertical distance by some constant leg-room factor.
  2. Then stretch the rectangle (and the circles that fill it) vertically or horizontally so that the seats fill the space of a square.
* I will assume that every seat in a theater is the same size.
* I will assume that each row is centered along the axis which divides the theater into a left and right half. Thus, if a row has fewer columns than the highest possible number of columns in that theater, the permanently unoccupied space at the left and right edges of that row will be be treated as seats with 0 demand.
* Choosing a colormap: https://matplotlib.org/users/colormaps.html

**Exporting the resulting image:**

* How to make tight layout work? https://matplotlib.org/users/tight_layout_guide.html

In [2]:
import os
import re
import json
from datetime import datetime

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches

In [4]:
answer = None

def check():
    answer = input("Input something: ")
    return

t = Thread(target = check)
t.daemon = True
t.start()
t.join(1)
print(t.is_alive())

# answer = input("Input something: ")
# t.join()

True
Input something: 


### Load data from file

In [12]:
def get_file_path(movie_params):
    '''
    Specify data file path given a dictionary of movie parameters.
    
    ARGUMENT: movie_params -- dictionary of movie parameters
    RETURNS: file_path -- string of path to movie data file 
    '''
    
    data_dir = movie_params["data_dir"]
    movies_dir = movie_params["movies_dir"]
    movie_dir = movie_params["movie_dir"]
    theater_dir = movie_params["theater_dir"]
    file_name = movie_params["file_name"]
    
    file_path = "../" + "/".join([data_dir, movies_dir, movie_dir, theater_dir, file_name])
    
    return file_path
    
def load_data(file_path):
    '''
    Load seating configuration data and seating chart snapshots from data file of scraped data.
    
    ARGUMENT: file_path -- string of path to movie data file
    
    RETURNS:
    config -- dictionary with keys ["auditorium", "description", "seats", "seat_types"]
    snapshots -- list of seating reservation data snapshots
    '''
    
    with open(file_path, "r") as fp:
        data = fp.readlines()

    data = [json.loads(x.strip().replace("'", '"')) for x in data]
    config = data[0]
    snapshots = data[1:]
    
    return config, snapshots

### Process seat configuration data

In [1]:
def fix_WC(seat_let_num):
    '''
    Convert "WC" wheelchair seating labels into proper row letter labels. 
    
    ARGUMENT: 
    seat_let_num -- list of lists [LETTER, NUMBER] in reverse alphanumeric
        order specifying all the seat labels in the theater. Some theaters
        specify wheelchair seats with the character combination "WC" rather
        than the standard row letter, but the seat is still usually placed
        in the list in the order in which the seats would appear if they
        all had standard row letter labels. 
    
    RETURNS: 
    seat_let_num -- list of lists [LETTER, NUMBER] in reverse alphanumeric
        order with any "WC" row labels replaced with the best guess proper
        row letter label.
    
    '''
    
    for i in range(len(seat_let_num)):

        if seat_let_num[i][0].upper() == "WC":

            before = seat_let_num[i - 1]
            after = seat_let_num[i + 1]

            if before[0].upper() != "WC" and before[1] == 1:
                seat_let_num[i][0] = chr(ord(before[0] + 1))

            elif before[0].upper() != "WC" and before[1] == seat_let_num[i][1] + 1:
                seat_let_num[i][0] = before[0]
                
    return seat_let_num

def process_seat_config_data(config):
    '''
    Get seat configuration data from the scraped configuration dictionary.
    
    ARGUMENT:
    config -- dictionary with keys ["auditorium", "description", "seats", "seat_types"]
    
    RETURNS:
    seat_config_data -- dictionary with the following keys:
        "row_col" -- list of lists [ROW_NUMBER, COL_NUMBER] with 1-indexed labels
        "num_seats" -- total number of seats in the auditorium
        "num_rows" -- total number of rows in the auditorium
        "num_cols" -- list of ints specifying the number of columns in each row  
    '''
    
    seat_labels = config["seats"]
    matches = [re.search(r"(\w+?)(\d+)", x) for x in seat_labels]
    seat_let_num = [[x.group(1), int(x.group(2))] for x in matches]
    seat_let_num = fix_WC(seat_let_num)
    letters = sorted(list(set([x[0] for x in seat_let_num])))
    letter_to_row = dict(zip(letters, [i + 1 for i in range(len(letters))]))

    # Re-cast seating labels as row-col numeric pairs (rather than letter-number pairs)
    seat_row_col = [[letter_to_row[x[0]], x[1]] for x in seat_let_num]

    num_seats = len(seat_row_col)
    num_rows = len(letters)
    num_cols = [max([x[1] for x in seat_row_col if x[0] == i + 1]) for i in range(num_rows)]
    
    seat_config_data = {
        "row_col": seat_row_col,
        "num_seats": num_seats,
        "num_rows": num_rows,
        "num_cols": num_cols
    }
    
    return seat_config_data

### Process seat snapshot data

In [14]:
def process_snapshots(snapshots, seat_config_data):
    '''
    Get a list of data on seats that were reserved.
    
    ARGUMENTS:
    snapshots -- list of seating reservation data snapshots
    seat_config_data -- dictionary with keys ["row_col", "num_seats", "num_rows", "num_cols"]
    
    RETURNS:
    seat_reservations -- list of lists [ROW_NUMBER, COL_NUMBER, FRAC_REMAINING]:
        ROW_NUMBER -- numeric row label of reserved seat
        COL_NUMBER -- numeric column label of reserved seat
        FRAC_REMAINING -- fraction of remaining unreserved seats in the auditorium when this 
            seat was reserved
    '''
    
    num_seats = seat_config_data["num_seats"]
    seat_row_col = seat_config_data["row_col"]
    
    reserved = [snap[1]["R"] for snap in snapshots]

    seat_sets = []
    seat_set = set()

    for r in reserved:

        new_seat_set = set(r)

        added_seats = new_seat_set - seat_set

        if added_seats:
            seat_set = seat_set.union(added_seats)
            frac_remaining = 1 - (len(seat_set) / num_seats)
            seat_sets.append([frac_remaining, added_seats])
    
    seat_reservations = np.array([[*seat_row_col[s], x[0]] for x in seat_sets for s in x[1]])
    
    return seat_reservations

def seat_row_col_to_rect_xy(seats, seat_config_data, plotting_params):
    '''
    Convert from row-column labels into xy seating positions.
    
    ARGUMENTS:
    seats -- list of lists [ROW_NUMBER, COL_NUMBER, FRAC_REMAINING]
    seat_config_data -- dictionary with keys ["row_col", "num_seats", "num_rows", "num_cols"]
    plotting_params -- dictionary containing "leg_room" and other keys
    
    RETURNS:
    pos -- np.array of seat positions converted from [ROW_NUMBER, COL_NUMBER, 
        FRAC_REMAINING] to approximate [X, Y, FRAC_REMAINING] values, assuming 
        that the seating chart is centered on the xy-point [0, 0], that 
        x-distance between horizontally adjacent seats is 1 and that y-distance 
        between vertically adjacent seats is LEG_ROOM.
    '''
    
    # seats = np.array([row, col, demand]), with 1-indexed row and col values
    
    num_rows = seat_config_data["num_rows"]
    num_cols = np.array(seat_config_data["num_cols"])
    leg_room = plotting_params["leg_room"]
    
    if len(seats.tolist()) == 0:
        return None
    
    row = seats[:, 0]
    col = seats[:, 1]
    
    x = -(col - 1) + (num_cols[row.astype("int") - 1] - 1)/2
    y = (-(row - 1) + (num_rows - 1)/2) * leg_room
    
    pos = np.empty_like(seats)
    pos[:] = seats
    pos[:, 0] = x
    pos[:, 1] = y
    
    return pos

### Compute rectangular heat-map

In [2]:
def compute_rect_heat_map(seat_positions, seat_config_data, plotting_params):
    '''
    Compute the demand values for every position in an auditorium, given the 
    seating seating reservation data
    
    ARGUMENTS: 
    seat_positions -- np.array [[x, y, frac_remaining], ...]
    seat_config_data -- dictionary with keys ["row_col", "num_seats", "num_rows", "num_cols"]
    plotting_params -- dictionary with the following keys:
        "blip_width" -- size of Gaussian blip relative to x-distance between adjacent seat centers
        "seat_width" -- size of seat bounding box relative to x-distance between adjacent seat centers
        "leg_room"   -- ratio of y-dist./x-dist. between two vert./horiz. adjacent seat centers
        "resolution" -- number of points to be computed across one dimension of the heat-map domain
        "p"          -- power (exponent) for non-linearly weighting seat demand values
        "N"          -- number of contour levels to be plotted
    
    RETURNS:
    heat_map_data -- dictionary with the following keys:
        "X" -- meshgrid np.array of x domain to plot
        "Y" -- meshgrid np.array of y domain to plot
        "D" -- np.array of Gaussian kernel density demand values to plot over the xy domain
        "x_min" -- lowest value in x domain
        "x_max" -- highest value in x domain
        "y_min" -- lowest value in y domain
        "y_max" -- highest value in y domain
    '''
    
    num_rows = seat_config_data["num_rows"]
    num_cols = seat_config_data["num_cols"]
    blip_width = plotting_params["blip_width"]
    leg_room = plotting_params["leg_room"]
    resolution = plotting_params["resolution"]
    p = plotting_params["p"]
    
    seat_x_max = max(num_cols)/2
    seat_y_max = (num_rows/2) * leg_room

    x_min = -(seat_x_max + leg_room)
    x_max = -x_min
    y_min = -(seat_y_max + leg_room) 
    y_max = seat_y_max + 2 * leg_room

    x = np.linspace(x_min, x_max, resolution)
    y = np.linspace(y_min, y_max, resolution)
    
    if type(seat_positions) != np.ndarray:
        
        X, Y = np.meshgrid(x, y)
        D = 0 * X
        
    else:
        
        s = np.repeat(1, len(seat_positions))
        X, Y, _ = np.meshgrid(x, y, s)
    
        X_0 = seat_positions[:, 0]
        Y_0 = seat_positions[:, 1]
        d = seat_positions[:, 2]

        D_individual = (d ** p) * np.exp(-((X - X_0)**2 + (Y - Y_0)**2) / (blip_width ** 2))
        D = np.sum(D_individual, axis = 2)
        
        X = X[:, :, 0]
        Y = Y[:, :, 0]

    heat_map_data = {
        "X": X,
        "Y": Y,
        "D": D, 
        "x_min": x_min,
        "x_max": x_max,
        "y_min": y_min,
        "y_max": y_max
    }
    
    return heat_map_data

### Plot rectangular heat-map

In [5]:
def get_plot_title(movie_data_fp, config):
    '''
    Specify plot title given movie information.
    
    ARGUMENTS:
    movie_data_fp -- path to movie data file
    config -- dictionary with keys ["auditorium", "description", "seats", "seat_types"]
    
    RETURNS:
    title -- string of title for plot, containing the movie title, movie venue, 
        auditorium number at the movie venue, and date and time of the movie showing
    '''
    
    path_dirs = movie_data_fp.split(".")[-2].split("/")

    movie_title = path_dirs[-3]
    movie_venue = path_dirs[-2]
    movie_time = "_".join(path_dirs[-1].split("_")[:2])
    auditorium = config["auditorium"]

    parsed_time = datetime.strptime(movie_time, "%Y-%m-%d_%H%M")
    formatted_time = datetime.strftime(parsed_time, "%Y/%m/%d - %I:%M %p (%A)")

    title = "%s\n%s\n%s | %s" % (movie_title, movie_venue[8:], auditorium, formatted_time)
    
    return title

def get_image_path(movie_data_fp, config):
    '''
    Specify image path given movie data filepath, where images are grouped by theater
    directory, then auditorium directory.
    
    ARGUMENTS:
    movie_data_fp -- path to movie data title
    config -- dictionary with keys ["auditorium", "description", "seats", "seat_types"]
    
    RETURNS:
    image_path -- path to image file
    image_dir -- path to directory where image file is stored
    '''
    
    path_dirs = movie_data_fp.split(".")[-2].split("/")

    movie_title = path_dirs[-3]
    movie_venue = path_dirs[-2]
    movie_time = "_".join(path_dirs[-1].split("_")[:2])
    auditorium = config["auditorium"]
        
    image_dir = "../images/%s/%s" % (movie_venue, auditorium)
    image_name = "%s_%s.png" % (movie_title, movie_time)
    image_path = "/".join([image_dir, image_name])
    
    return (image_path, image_dir)

def save_fig_to_file(movie_data_fp, config):
    '''
    Export matplotlib figure as PNG image. Make appropriate directories as necessary.
    
    ARGUMENTS:
    movie_data_fp -- path to movie data title
    config -- dictionary with keys ["auditorium", "description", "seats", "seat_types"]
    
    RETURNS:
    image_path -- path to image file
    '''
    
    (image_path, image_dir) = get_image_path(movie_data_fp, config)
    
    if not os.path.exists(image_dir):
        os.makedirs(image_dir)
    
    if os.path.exists(image_path):
        
        print(" -- ALREADY EXISTS: %s" % image_path)
        
    else:
        
        plt.savefig(image_path, 
                    dpi = 200, 
                    bbox_inches = "tight",
                    pad_inches = 0.5,
                    transparent = False)
    
        plt.gcf().clear()
        
        print(" |-- IMAGE EXPORTED: %s" % image_path)
    
    return image_path

def plot_rect_heat_map(heat_map_data,
                       plotting_params,
                       seat_config_pos=None,
                       plot_seating_chart=True,
                       movie_data_fp=None,
                       config=None,
                       save_fig=False,
                       display_fig=True):
    '''
    Plot rectangular heat map in a matplotlib figure. Optionally, plot the heat 
    map over boxes indicating seat locations, and/or save figure as a PNG image.
    
    ARGUMENTS:
    heat_map_data -- dictionary with keys ["X", "Y", "D", "x_min", "x_max", "y_min", "y_max"]
    plotting_params -- dictionary with keys ["blip_width", "seat_width", "leg_room", "resolution", "p", "N"]
    seat_config_pos -- np.array [[X, Y], ...] of all seat centers in auditorium (default: None)
    plot_seating_chart -- boolean, do or don't plot boxes to designate seat locations (default: True)
    movie_data_fp -- path to movie data title (default: None)
    config -- dictionary with keys ["auditorium", "description", "seats", "seat_types"] (default: None)
    save_fig -- boolean, do or don't save figure to file (default: False)
    display_fig -- boolean, do or don't display figure in notebook (default: True)
    
    RETURNS: None
    '''
    
    plt.close("all")
    
    alpha = 1
    
    x_min = heat_map_data["x_min"]
    x_max = heat_map_data["x_max"]
    y_min = heat_map_data["y_min"]
    y_max = heat_map_data["y_max"]
    
    fig = plt.figure(figsize = (15, 15))
    ax = fig.add_subplot(1, 1, 1)
    ax.set_xlim(-x_max, x_max)
    ax.set_ylim(-y_max, y_max)
    ax.axis("equal")
    ax.axis("off")
    
    title = get_plot_title(movie_data_fp, config) if movie_data_fp else "Movie Theater Heat-Map"
    
    font = {'family': 'Century Gothic', 'weight': 'normal', 'size' : 18}
    plt.rc('font', **font)
    ax.set_title(title, verticalalignment = "bottom", y = 1)
    
    X = heat_map_data["X"]
    Y = heat_map_data["Y"]
    D = heat_map_data["D"]
    N = plotting_params["N"]
    p = plotting_params["p"]
    leg_room = plotting_params["leg_room"]
    
    # Plot heat-map
    ax.contourf(X, Y, D, N, 
                cmap = "RdYlBu_r", vmin = 0, vmax = 1.8 - 0.05 * p)
    
    # Plot movie screen
    screen_color = "gray"
    screen_xy_anchor = (x_min + leg_room/2, y_max - leg_room)
    screen_width = 2 * x_max - leg_room
    screen_height = 1.25
    ax.add_patch(patches.Rectangle(screen_xy_anchor, 
                                   width = screen_width,
                                   height = screen_height,
                                   fc = screen_color))
    
    # Plot seating chart
    if plot_seating_chart and type(seat_config_pos) == np.ndarray:
        
        seat_params = {
            "standard": {
                "ec": "black",
                "hatch": None
            },
            "wheelchair": {
                "ec": "#ADD8E6",
                "hatch": "//"
            },
            "companion": {
                "ec": "#ADD8E6",
                "hatch": "..."
            },
            "unavailableSeat": {
                "ec": "black",
                "hatch": "xxx"
            }
        }
        
        seat_width = plotting_params["seat_width"]
        seat_patches_xy = seat_config_pos - np.array([seat_width, seat_width])/2
        seat_types = config["seat_types"]
        
        for i, p in enumerate(seat_patches_xy):
            
            ax.add_patch(patches.Rectangle(tuple(p), 
                                           width=seat_width, 
                                           height=seat_width, 
                                           ec=seat_params[seat_types[i]]["ec"],
                                           hatch=seat_params[seat_types[i]]["hatch"],
                                           alpha=alpha, 
                                           fill=None))
    
    # Save heat-map plot to file, grouped by auditorium
    if save_fig and movie_data_fp and config:
        ax.set_xlim(-x_max, x_max)
        ax.set_ylim(-y_max, y_max)
        ax.xaxis.set_visible(False)
        ax.yaxis.set_visible(False)
        image_path = save_fig_to_file(movie_data_fp, config)

    # Display heat-map
    if display_fig:
        plt.show()
        plt.gcf().clear()
    
    return

### Run pipeline for one file: movie data file to heat-map

In [17]:
def create_heat_map_from_movie_data(file_path, 
                                    plotting_params,
                                    plot_seating_chart=True,
                                    save_fig=False,
                                    display_fig=True,
                                    check_image_existence=True):
    '''
    Given a data file path, run the whole pipeline of functions to compute and plot
    a heatmap of the data for that movie showing. If save_fig = True, save that
    plot to disk as a PNG.
    
    ARGUMENTS:
    file_path -- path to movie data title (default: None)
    plotting_params -- dictionary with keys ["blip_width", "seat_width", "leg_room", "resolution", "p", "N"]
    plot_seating_chart -- boolean, do or don't plot boxes to designate seat locations (default: True)
    save_fig -- boolean, do or don't save figure to file (default: False)
    display_fig -- boolean, do or don't display figure in notebook (default: True)
    check_image_existence -- boolean, overwrite if false, skip if true (default: True)
    
    RETURNS:
    heat_map_data -- dictionary with keys ["X", "Y", "D", "x_min", "x_max", "y_min", "y_max"]
    '''
    
    # Load data from file
    config, snapshots = load_data(file_path)
    
    # Check if output image already exists
    if check_image_existence:
        (image_path, image_dir) = get_image_path(file_path, config)
        if os.path.exists(image_path):
            print('ALREADY EXISTS: "%s"' % image_path)
            return None

    print('\nANALYZING: "%s"' % file_path)
        
    # Process seat configuration data
    seat_config_data = process_seat_config_data(config)

    # Process seat snapshot data
    seat_reservations = process_snapshots(snapshots, seat_config_data)
    seat_positions = seat_row_col_to_rect_xy(seat_reservations, seat_config_data, plotting_params)
    seat_config_pos = seat_row_col_to_rect_xy(np.array(seat_config_data["row_col"]), seat_config_data, plotting_params)

    # Compute rectangular heat-map
    heat_map_data = compute_rect_heat_map(seat_positions, seat_config_data, plotting_params)

    # Plot rectangular heat-map
    plot_rect_heat_map(heat_map_data, 
                       plotting_params,
                       seat_config_pos = seat_config_pos,
                       plot_seating_chart = plot_seating_chart,
                       movie_data_fp = file_path,
                       config = config,
                       save_fig = save_fig,
                       display_fig = display_fig)
        
    return heat_map_data

### Generate rectangular heat-map images for many movie data files

* Specify the parameters . . .
  * (**`start_date`**, **`end_date`**): make images only for movie data files with showtimes within this time range
  * **`movie_dirs`**: make images only for movies with these titles
  * **`plotting_params`**: specify the parameters that control the plot appearance

In [18]:
start_date = "2018-07-14"
end_date = "2018-07-14"

movies_dirs = [
    "Ant-Man and the Wasp", 
    "Ant-Man and the Wasp 3D",
    "Ant-Man and the Wasp An IMAX 3D Experience"
]

plotting_params = {
    
    "blip_width": 1,       # size of Gaussian blip relative to x-distance between adjacent seat centers
    "seat_width": 0.8,     # size of seat bounding box relative to x-distance between adjacent seat centers
    "leg_room": 2,         # ratio of y-dist./x-dist. between two vert./horiz. adjacent seat centers
    "resolution": 200,     # number of points to be computed across one dimension of the heat-map domain
    "p": 2,                # power (exponent) for non-linearly weighting seat demand values
    "N": 1000              # number of contour levels to be plotted
    
}

movies_paths = ["../data/movies/%s" % m for m in movies_dirs]

tic = datetime.now()

for movie_path in movies_paths:
    for theater in os.listdir(movie_path):
        for file in os.listdir("%s/%s" % (movie_path, theater)):
            if file.split(".")[1] == "txt":
                movie_date = file.split("_")[0]
                if movie_date >= start_date and movie_date <= end_date:
                    file_path = "%s/%s/%s" % (movie_path, theater, file)
                
                    try:
                        heat_map_data = create_heat_map_from_movie_data(file_path,
                                                                        plotting_params,
                                                                        save_fig = True,
                                                                        display_fig = False)
                    except:
                        print(" |\n |------ >>>>>>>>>>>>>>>>>> ERROR: no tickets bought. <<<<<<<<<<<<<<<<<<\n")

dt = (datetime.now() - tic).total_seconds()/60
print("\n" + 100 * "-")
print("\nExporting complete. (%.2f minutes)" % dt)


ANALYZING: "../data/movies/Ant-Man and the Wasp/AABFB - Edwards Big Newport 6 & RPX/2018-07-14_1200_-784805982.txt"
 |-- IMAGE EXPORTED: ../images/AABFB - Edwards Big Newport 6 & RPX/Auditorium 1/Ant-Man and the Wasp_2018-07-14_1200.png
ALREADY EXISTS: "../images/AABFB - Edwards Big Newport 6 & RPX/Auditorium 1/Ant-Man and the Wasp_2018-07-14_1515.png"
ALREADY EXISTS: "../images/AABFB - Edwards Big Newport 6 & RPX/Auditorium 1/Ant-Man and the Wasp_2018-07-14_1830.png"
ALREADY EXISTS: "../images/AABTB - Edwards Irvine Spectrum 21 IMAX & RPX/Auditorium 4 (21+ On/Ant-Man and the Wasp_2018-07-14_1030.png"
ALREADY EXISTS: "../images/AABTB - Edwards Irvine Spectrum 21 IMAX & RPX/Auditorium 6 (21+ On/Ant-Man and the Wasp_2018-07-14_1230.png"
ALREADY EXISTS: "../images/AABTB - Edwards Irvine Spectrum 21 IMAX & RPX/Auditorium 4 (21+ On/Ant-Man and the Wasp_2018-07-14_1330.png"
ALREADY EXISTS: "../images/AABTB - Edwards Irvine Spectrum 21 IMAX & RPX/Auditorium 6 (21+ On/Ant-Man and the Wasp_201