#### Usage

This notebook can be used to select a movie from the list and see the 10 movies with the most similar plotlines for all emotions. A measure of similarity, in arbitrary units, is given in parenthesis (0: smallest distance, matching plotlines).

> In order to run, you must have executed i) scraping_script.py, ii) emotion_script.py and finally iii) dtw_script.py successfully.
More specifically, you will need:
* the emotion counts stored in the .npy files in *data/emotions/arrays*
* the movie information in *data/scraping/successful_files.csv* and
* the pickled dictionary containing distances *data/distances.pkl*

### Import modules

In [1]:
import operator
import os
import sys
import cPickle as pickle

import ipywidgets as widgets #new version of IPython.htlm
from ipywidgets import fixed

module_path = os.path.abspath(os.path.join('../code'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
from plotline_utilities import make_title_dictionary

### Preparing data

In [15]:
#download dictionaries to switch from filename to nice title:
filename_to_title, title_to_filename = make_title_dictionary()

#widget for the selection of files:
path_to_file = '../data/emotions/arrays'
files = os.listdir(path_to_file)
legit_files = [filename[:-4] for filename in files if filename[-3:]=='npy']
legit_titles = [filename_to_title[filename] for filename in legit_files]

select_widget = widgets.Select()
select_widget.options = legit_titles

#get the pickled dictionary with the distance
distance_dictionary = pickle.load(open('../data/distances.pkl', 'rb'))

In [18]:
def f_interactive(title, distance_dictionary, filename_to_title, title_to_filename):
    '''
    parameters:
    -----------
    title: STR, the file that is studied
    distance_dictionary: DICT, obtained in pickled form after running the dtw_script
    filename_to_title and title_to_filename: DICTs, switch easily from filename to properly printed out title
    '''
    filename = title_to_filename[title]
    #get relevant entry:
    entry = distance_dictionary[filename]
    #sort to select top 10
    top_10 = sorted(entry.items(), key=operator.itemgetter(1))[:10]
    #as list of tuples (filename, distance)
    for filename, distance in top_10:
        print filename_to_title[filename]+ '('+ str(round(distance,1))+')'

### Exploring data

In [19]:
widgets.interactive(f_interactive,
                    title=select_widget,
                    distance_dictionary=fixed(distance_dictionary),
                    filename_to_title=fixed(filename_to_title),
                    title_to_filename=fixed(title_to_filename))

Chasing Amy(0.7)
Life As A House(0.8)
Island, The(0.9)
Burlesque(0.9)
Big White, The(0.9)
Scarface(0.9)
Final Destination(1.0)
Enemy of the State(1.0)
Sunset Blvd.(1.0)
Confidence(1.0)
