#Film Board Ranks the Decades of Cinema: 2000s

Hosted by Rengar18 ([RYM List](https://rateyourmusic.com/list/Rengar18/film-board-ranks-the-decades-of-cinema-2000s/))

Notebook by YasashiiDia ([GitHub](https://github.com/YasashiiDia/ModifiedBorda))

This notebook represents the third iteration of an idea I had when I made my first RYM poll. However, there is still a lot of room for improvement and additional features. Eventually I would like to publish a generic and more user-friendly version of this as a template for anyone interested in running a Borda-type poll.

**Instructions:**

**You need to be signed into your Google account if you want to interact with the widgets in this notebook.** You also need to accept a scary-looking prompt warning you that this notebook was not authored by Google. The source code may be lousy, but it is completely contained within this notebook and hence can be readily reviewed and modified. Please let me know if you find any bugs!

Also, you don't have to worry about permanently "breaking" the code, the master version is stored in my Google drive and cannot be modified by you. If you modify this notebook and save it, a copy will be stored in your own Google drive. You can always restore the master version by visiting [this link](https://colab.research.google.com/drive/1hOq6fSF2a7t00FXl-KBUVlYifpz9ZkHp?usp=sharing).

To run the entire notebook: **Runtime -> Run all** (easiest/recommended way to get started)

To run an individual cell: Click on the run button in the top left corner of the cell

To hide code: double click on the rendered area to the right of the code cell.

**Regarding metadata:**

The metadata (cast, crew, language, etc.) was automatically fetched from [TMDB.org](https://www.themoviedb.org/). However, since I only have the film title and the decade as search queries, a lot of the fetched data is still incorrect (due to movies that have similar titles). Fixing this would require tedious, manual cleanup. Not today.

Once RYM gets its own API, all of this will be much simpler.

# Load Data

In [1]:
#@markdown <- Run this cell first to load the data. **This is required for any other cells to work.**

import math
import numpy as np
import pandas as pd
import requests
import json
import seaborn as sns; sns.set()
pd.set_option('max_rows', None)

query_tmdb = False
max_queries = 10000
top_items = 100

#########################################
# Option 1: Load data from Google sheet #
#########################################

"""
The spreadsheet needs three subsheets, named "Votes", "Chart", "Titles".
The Votes and first two columns of the Titles sheets need to be filled by the user.
The notebook will update the Charts sheet and save the vote matrix as a csv file.
Reference sheet: https://docs.google.com/spreadsheets/d/12QQ6aC2SsDjtlT7u5kcbFiKBAQWU0JA_rq2CEKciwTk/edit?usp=sharing
"""

load_from_sheet = False
SHEETNAME = '2000s Movies'
default_delimiter = ". " # Example vote: 1. [Film123]
special_delimiters = ["\) "] # Provide alterantive delimiters here, e.g. 1) [Film123]

###########################################
# Option 2: Load vote matrix from GitHub. #
###########################################

vote_matrix = pd.read_csv("https://raw.githubusercontent.com/YasashiiDia/ModifiedBorda/main/2000s%20Movies_vote_matrix.csv",index_col=[0,1])

titles_csv = "https://raw.githubusercontent.com/YasashiiDia/ModifiedBorda/main/2000s%20Movies%20-%20Titles.csv"
meta_df = pd.read_csv(titles_csv, index_col=0)

##############################################################
# Option 3: Paste the vote matrix directly here as a string. #
##############################################################

vote_matrix_string = ""

###############################################################
###############################################################
###############################################################

def get_votes_df(votesheet):

  votes = np.array(votesheet.get_all_values())
  votes_df = pd.DataFrame(votes[1:,:], columns=votes[0], index=range(1,len(votes[:,0])))

  # Tag unranked votes
  for voter in votes_df:
    votes_df[voter] = votes_df[voter].mask(votes_df[voter].str.startswith("["), "-1. " + votes_df[voter].astype(str))
    
    #remove comments after ]
    votes_df[voter] = votes_df[voter].str.split("]", n=1, expand=True) + "]"

  # Replace non-default delimiters
  for sd in special_delimiters:
    votes_df.replace(sd, default_delimiter, regex=True, inplace=True)

  return votes_df.mask(votes_df=="]", "0")

def get_vote_matrix(votes_df):

  vote_matrix = pd.DataFrame()

  for voter in votes_df:
    for i, vote in enumerate(votes_df[voter]):
      if vote == "": break
      try: rank, title = vote.split(". ", 1)
      except ValueError: 
        rank, title = i+1, vote.split(". ", 1)[-1]
      title = title.lstrip()
      vote_matrix.loc[title, voter] = rank

  vote_matrix.fillna(0,inplace=True)
  vote_matrix = vote_matrix.astype(pd.SparseDtype("int", 0))
  vote_matrix = vote_matrix.drop(["0"])
  #print('Density:', vote_matrix.sparse.density, '\nvote_matrix.shape', vote_matrix.shape)
  return vote_matrix

def get_titles_df(titlessheet):

  titles_arr = np.array(titlessheet.get_all_values())
  titles_df = pd.DataFrame(titles_arr[1:,1], index=titles_arr[1:,0], columns=["Title"])
  return titles_df

def get_vote_matrix_titled(vote_matrix, titles_df):
  vote_matrix = vote_matrix.sort_index()
  titles_df = titles_df.sort_index()
  assert np.sum(vote_matrix.index != titles_df.index) == 0

  get_vote_matrix_titled=pd.DataFrame(vote_matrix.values, columns=vote_matrix.columns, index=[vote_matrix.index,titles_df["Title"]])
  get_vote_matrix_titled.index.names = ["ID", "Title"]
  return get_vote_matrix_titled

def gaussian(x, mu, sig):
    return np.exp(-np.power(x - mu, 2.) / (2 * np.power(sig, 2.)))

def superellipse(x, n=2, a=1, b=1, size=1):
  return b * (size**n - np.abs(x/a)**n)**(1/n)

def linear_pop_multiplier(counts, most_votes, pop_weight):

  theta = np.linspace(-1/most_votes, 1/most_votes, 201)[pop_weight+100]
  b = (1-theta*most_votes)/2
  multipliers = theta * counts + b
  return 2*multipliers

def exp_pop_multiplier(counts, most_votes, pop_weight):

  if pop_weight == 0: return np.ones(len(counts))

  multipliers = 1 + most_votes * np.exp(-(counts-1)**2 / (2*(pop_weight*most_votes)**2))
  multipliers /= 1 + most_votes
  return multipliers

def elliptical_pop_multiplier(counts, most_votes, pop_weight):

  if pop_weight >= 0: # mirror superelipse along vertical axis
    counts = counts + 2 * (most_votes//2 - counts) + 1 + most_votes%2
  
  n = np.linspace(1, 0.1, 101)[np.abs(pop_weight)]
  multipliers = superellipse(counts-1, n=n, a=1, b=1/most_votes, size=most_votes) # counts-1 to move superellipse upwards
  return 2*multipliers

def get_results_df(vote_matrix, Weight, PopWeight, pop_multiplier, MAX_LENGTH=50):

  vote_matrix = vote_matrix.mask(vote_matrix==-1, (MAX_LENGTH+1)/2)

  results = pd.DataFrame(index=vote_matrix.index)
  results["Votes"] = vote_matrix.astype(bool).sum(axis=1)
  MOST_VOTES = max(results["Votes"])

  score_matrix = vote_matrix.mask(vote_matrix>0, superellipse(vote_matrix-1,n=Weight,a=1,b=1,size=MAX_LENGTH)) # vm-1 to move superellipse upwards
  results["Score"] = score_matrix.sum(axis=1)
  results["Score"] *= pop_multiplier(results["Votes"], MOST_VOTES, PopWeight)
  results["Score"] = results["Score"].round(1)
  results["Score"] += 0.00001*results["Votes"] # hacky way of breaking ties by number of votes AND use method="min" for tied votes
  results["Rank"] = results["Score"].rank(ascending=False,method='min').astype(int)
  results["Score"] = results["Score"].round(1)

  return results

def get_results_classic(vote_matrix, Weight, MAX_LENGTH=50):

  vote_matrix = vote_matrix.mask(vote_matrix==-1, (MAX_LENGTH+1)/2)

  results = pd.DataFrame(index=vote_matrix.index)
  results["Votes"] = vote_matrix.astype(bool).sum(axis=1)

  score_matrix = vote_matrix.mask(vote_matrix>0, superellipse(vote_matrix-1,n=Weight,a=1,b=1,size=MAX_LENGTH)) # Elliptical
  results["Score"] = score_matrix.sum(axis=1)
  results["Score"] *= results["Votes"]
  results["Score"] = results["Score"].round(1)
  results["Score"] += 0.00001*results["Votes"] # hacky way of breaking ties by number of votes AND use method="min" for tied votes
  results["Rank"] = results["Score"].rank(ascending=False,method='min').astype(int)
  results["Score"] = results["Score"].round(1)

  return results

def get_votes_df_from_vote_matrix(vote_matrix):
  all_user_votes = []
  for v in vote_matrix: 
    all_user_votes.append(pd.Series(vote_matrix[v][vote_matrix[v] > 0].sort_values().index.get_level_values(level="Title"), name=v))
  return pd.concat(all_user_votes, axis=1)

def get_chart_df(vote_matrix):

  # Count votes per title
  counts = vote_matrix.astype(bool).sum(axis=1)
  cdf = pd.DataFrame(index=vote_matrix.index)
  cdf['Votes'] = counts.values
  most_votes = max(cdf['Votes'])
  max_length = np.max(vote_matrix.values)

  # Borda count
  results = get_results_df(vote_matrix, Weight=1, PopWeight=0, pop_multiplier=linear_pop_multiplier)

  # Unqiue score
  results["Unique\nScore"] = results["Score"].where(results["Votes"]==1,0)
  results["Unique\nRank"] = results["Unique\nScore"].rank(ascending=False,method='first').astype(int)

  # Popular score
  results['Popular\nScore'] = results['Score']*results['Votes']
  results["Popular\nRank"] = results["Popular\nScore"].rank(ascending=False,method='first').astype(int)

  # Gold medals
  gold_medals = get_results_df(vote_matrix, Weight=0.1, PopWeight=0, pop_multiplier=linear_pop_multiplier)
  gold_medals["Score"] /= max_length
  gold_medals.drop("Votes",axis=1, inplace=True)
  gold_medals.rename({"Score":"Gold\nMedals", "Rank":"Gold\nRank"}, axis=1, inplace=True)

  # Esoteric score
  esoteric_results = get_results_df(vote_matrix, Weight=0.4, PopWeight=-50, pop_multiplier=elliptical_pop_multiplier)
  esoteric_results.drop("Votes",axis=1, inplace=True)
  esoteric_results.rename({"Score":"Esoteric\nScore", "Rank":"Esoteric\nRank"}, axis=1, inplace=True)

  return pd.concat([results, esoteric_results, gold_medals],axis=1)
  
def sheet_updater():

  # Load sheets
  votesheet = gc.open(SHEETNAME).worksheet('Votes')
  chartsheet = gc.open(SHEETNAME).worksheet('Chart')
  titlessheet = gc.open(SHEETNAME).worksheet('Titles')

  # Get votes df
  votes_df = get_votes_df(votesheet)
  display(votes_df.head())

  # Get vote matrix
  vote_matrix = get_vote_matrix(votes_df)
  display(vote_matrix.head())
  print("Vote matrix shape:", vote_matrix.shape)

  # Append titles to vote matrix index
  titles_df = get_titles_df(titlessheet)

  if len(titles_df) != len(vote_matrix):
    print(len(titles_df),len(vote_matrix))
    print(set(vote_matrix.index) - set(titles_df.index))
    print(set(titles_df.index) - set(vote_matrix.index))
    for title in vote_matrix.index:
      print(title)
    raise Exception("Update titles_df")

  vote_matrix = get_vote_matrix_titled(vote_matrix, titles_df)
  vote_matrix.to_csv(f"/content/drive/MyDrive/{SHEETNAME}_vote_matrix.csv")
  print("Saved vote matrix:", f"/content/drive/MyDrive/{SHEETNAME}_vote_matrix.csv")
  display(vote_matrix.head())

  # Make the chart df
  cdf = get_chart_df(vote_matrix)
  cdf["RYM ID"] = cdf.index.get_level_values(level="ID")
  cdf = cdf.sort_values(by="RYM ID")
  cdf["Title"] = cdf.index.get_level_values(level="Title")
  cols = ["Rank", "Title", "RYM ID", "Votes",	"Score",	"Esoteric\nRank",	"Esoteric\nScore", "Gold\nRank", "Gold\nMedals", "Popular\nRank", "Popular\nScore", "Unique\nRank", "Unique\nScore"]
  cdf = cdf[cols]

  # Update the chart sheet
  set_with_dataframe(chartsheet, cdf.sort_values("Rank"), include_index=False)
  return cdf, vote_matrix

if load_from_sheet:

  from google.colab import auth
  auth.authenticate_user()

  from google.colab import drive
  drive.mount('/content/drive')

  import gspread
  from gspread_dataframe import set_with_dataframe
  from oauth2client.client import GoogleCredentials
  gc = gspread.authorize(GoogleCredentials.get_application_default())

  cdf, vote_matrix = sheet_updater()
  display(cdf.sort_values(by="Esoteric\nRank").head(20))

elif vote_matrix_string != "":
  import io
  data = io.StringIO(vote_matrix_string)
  vote_matrix = pd.read_csv(data, sep=",", index_col=[0,1])

cleaned_title = vote_matrix.index.get_level_values(level="Title")
cleaned_title = cleaned_title.where(cleaned_title.str[-1:] != ']', cleaned_title.str[:-1].str.split('[').str[1])
vote_matrix.index.get_level_values(level="Title")
vote_matrix['Title'] = cleaned_title
vote_matrix.index = vote_matrix.index.droplevel(level="Title")
vote_matrix.set_index('Title', append=True, inplace=True)

nantitles = vote_matrix.index.get_level_values(level="Title").to_numpy() != vote_matrix.index.get_level_values(level="Title").to_numpy()
vote_matrix.index = pd.MultiIndex.from_tuples([(x[0], x[0] if nan else x[1]) for x, nan in zip(vote_matrix.index, nantitles)], names=["ID","Title"])

#display(vote_matrix.head())

vote_matrix_all_ranked = vote_matrix.mask(vote_matrix < 0, 25.5)
all_user_votes = get_votes_df_from_vote_matrix(vote_matrix_all_ranked)

BORDA_RANK = get_results_df(vote_matrix, 1, 0, linear_pop_multiplier).sort_values(by="Rank")
BORDA_RANK_CLASSIC = get_results_classic(vote_matrix, 1).sort_values(by="Rank")

##################################
##################################
##################################
  
def query_tmdb_func(df, max_queries):

  for i, id in enumerate(df.index):

    if i > max_queries: break
    title = df.loc[id,"Title"]

    if df.loc[id]["TMDB_id"] == "":
      print("Fetching ID for:", title)
      r = requests.get('https://api.themoviedb.org/3/search/movie?api_key='+api_key+'&query='+title)
      parsed = json.loads(r.text)  

      try:
        for j, res in enumerate(parsed['results']):
          if 2000 <= int(res['release_date'][:4]) <= 2009:
            df.loc[id, "TMDB_id"] = int(res['id'])
            break

      except (ValueError, KeyError):
        print(json.dumps(parsed, indent=4, sort_keys=True))
        set_with_dataframe(titlessheet, meta_df, include_index=True)

    if len(set([df.loc[id, "Release"],df.loc[id,"IMGID"]]).intersection(set([""]))) > 0:

      print("Fetching metadata for", title)
      r = requests.get('https://api.themoviedb.org/3/movie/'+str(df.loc[id, "TMDB_id"])+'?api_key='+api_key)
      r_credits = requests.get('https://api.themoviedb.org/3/movie/'+str(df.loc[id, "TMDB_id"])+'/credits?api_key='+api_key)
      parsed = json.loads(r.text)
      parsed_credits = json.loads(r_credits.text)

      try:
        parsed_crew = parsed_credits['crew']
        parsed_cast = parsed_credits['cast']

        df.loc[id, "IMGID"] = parsed['poster_path']
        df.loc[id, "Release"] = parsed['release_date']
        df.loc[id, "Runtime"] = parsed['runtime']
        df.loc[id, "Language"] = parsed['original_language']
        df.loc[id, "Genres"] = ','.join([g['name'] for g in parsed['genres']])

        df.loc[id, "Cast"] = ",".join([d["name"] for d in parsed_cast if d["order"] < 5])
        df.loc[id, "Director"] = ",".join([d["name"] for d in parsed_crew if d["job"] == "Director"])
        df.loc[id, "Producer"] = ",".join([d["name"] for d in parsed_crew if d["job"] == "Producer"])
        df.loc[id, "Writer"] = ",".join([d["name"] for d in parsed_crew if d["job"] == "Writer"])
        df.loc[id, "Sound Designer"] = ",".join([d["name"] for d in parsed_crew if d["job"] == "Sound Designer"])
        df.loc[id, "Editor"] = ",".join([d["name"] for d in parsed_crew if d["job"] == "Editor"])
        df.loc[id, "Director of Photography"] = ",".join([d["name"] for d in parsed_crew if d["job"] == "Director of Photography"])
        df.loc[id, "Composer"] = ",".join([d["name"] for d in parsed_crew if d["job"] == "Original Music Composer"])
        df.loc[id, "Art Direction"] = ",".join([d["name"] for d in parsed_crew if d["job"] == "Art Direction"])
        df.loc[id, "Production Design"] = ",".join([d["name"] for d in parsed_crew if d["job"] == "Production Design"])
        df.loc[id, "Costume Design"] = ",".join([d["name"] for d in parsed_crew if d["job"] == "Costume Design"])
        df.loc[id, "Makeup Artist"] = ",".join([d["name"] for d in parsed_crew if d["job"] == "Makeup Artist"])

      except KeyError:
        print(json.dumps(parsed, indent=4, sort_keys=True))
        set_with_dataframe(titlessheet, df, include_index=True)

  return df

def get_titles_meta_df(titlessheet):

  titles_arr = np.array(titlessheet.get_all_values())
  titles_meta_df = pd.DataFrame(titles_arr[1:,1:], index=titles_arr[1:,0], columns=titles_arr[0,1:])
  return titles_meta_df

if query_tmdb:

  # from google.colab import auth
  # auth.authenticate_user()

  from google.colab import drive
  drive.mount('/content/drive')

  import gspread
  from gspread_dataframe import set_with_dataframe
  from oauth2client.client import GoogleCredentials
  gc = gspread.authorize(GoogleCredentials.get_application_default())
  titlessheet = gc.open(SHEETNAME).worksheet('Titles')

  with open("/content/drive/MyDrive/RYMAnimeChart/tmdb_api_key.txt", "r") as f:
    api_key = f.read()[:-1]

  meta_df_prior = get_titles_meta_df(titlessheet).sort_index()
  meta_df_prior["Title"] = BORDA_RANK_CLASSIC.sort_index().index.get_level_values(level="Title")

  ids_to_query = BORDA_RANK_CLASSIC.sort_values(by="Rank").index.get_level_values(level="ID")
  meta_df_prior = meta_df_prior.loc[ids_to_query]
  meta_df = query_tmdb_func(meta_df_prior, max_queries)

  #test = gc.open(SHEETNAME).worksheet('test')
  set_with_dataframe(titlessheet, meta_df, include_index=True)

from IPython.display import HTML

def get_image_row(meta_df, BORDA_RANK_CLASSIC):
  image_row = '<div id="carousel">'
  for i, index in enumerate(meta_df.index):
    image_row += '<div class="slide">'
    image_row += f"<h1 style='text-align:center'>{BORDA_RANK_CLASSIC['Rank'][i]}</h1><p style='text-align:center'>Score: {BORDA_RANK_CLASSIC['Score'][i]} | Votes: {BORDA_RANK_CLASSIC['Votes'][i]}<br></p>"
    image_row += '<a href="https://www.themoviedb.org/movie/'+str(meta_df.loc[index,'TMDB_id'])+'" target = blank_>'
    image_row += '<img src="https://image.tmdb.org/t/p/w600_and_h900_bestv2'
    image_row += meta_df.loc[index,'IMGID'] + '" alt = "' + meta_df.loc[index]["Title"] + '" style="width:200px"></a>'
    image_row += "</div>"
  image_row += '</div>'
  return image_row

image_row = get_image_row(meta_df[:top_items], BORDA_RANK_CLASSIC)

statprint = f"""
<h3>
Voters: {len(vote_matrix.columns)} |
Films: {len(vote_matrix)} |
Votes: {np.sum(np.sum(vote_matrix.astype(bool)))} <br>
"""

HTML("""

<style>

  #carousel {
    background-color: #ffffff00;    
    overflow: visible;
    white-space:nowrap;
}

#carousel .slide {
    display: inline-block;
    padding: 5px;
}

</style>"""+image_row+statprint)

# Interactive Chart

__Score Calculation:__ With default settings, the raw score is calculated according to a Borda count. That is, for a poll with a maximum list size of N, the top item gets N points, the second item gets N-1 points, etc. The points are summed over all lists to determine the total raw score. Ties are broken by the number of votes.

---

The chart can be customized using two weights:

__Top Weight:__ Determines the distribution of points for each item in a ranked list of votes. Negative weights give greater emphasis to the items ranked at the top of the list. The top ranked item always gets N points. (See plots below)

__Pop Weight:__ Determines the popularity multiplier by which the raw score is multiplied to get the final score. Negative popularity weights emphasize items with few votes, positive weights emphasize items with many votes.

---

Further explanations:

__Input__: At this stage, there is only one possible input available, "Full". Further down below in the notebook, we will cluster the voters into subgroups, which will allow us to create charts based on a chosen subgroup.

__Diff.__: Difference between the classic rank (see below) and the current rank.

---

Some example charts:

__Borda Count__: Top Weight = Pop Weight = 0. The default chart.

__Esoteric Chart__: Top weight = -5, Pop Weight = -15. Highlights items with few voters but high placements.

__Unique Items__: Top Weight = 0, Pop Weight = -20. Ranking of items that have received only one vote.

__Gold Medals__: Top Weight = -10, Pop Weight = 0. Only the top-ranked items get any points. Dividing the score by N yields the number of "gold medals" received by the respective item.

__Classic Chart__: Top Weight = 0, Classic = True. Setting this checkbox to true will multiply the raw score by the number of votes, disabling custom popularity weighting. The Top Weight can still be adjusted.

In [2]:
#@markdown <- Run this cell to activate the widgets. Re-run if the widgets disappear. If anything else breaks, try re-running all the cells from the top, else contact me on RYM (YasashiiDia).

import ipywidgets as widgets
import matplotlib.pyplot as plt
from IPython.display import display
from collections import Counter

params = {
   'axes.labelsize': 16,
   'font.size': 16,
   'legend.fontsize': 16,
   'xtick.labelsize': 14,
   'ytick.labelsize': 14,
   'font.family': 'sans-serif'
   }
plt.rcParams.update(params)
palette = sns.color_palette()

class Clusters:

  def __init__(self, clusters_df):
    self.clusters_df = clusters_df

  def set_cluster_list(self, cluster_list):
    self.cluster_list = cluster_list

class Charts:

  def __init__(self, custom_chart):
    self.custom_chart = custom_chart

  def set_default_chart(self, default_chart):
    self.default_chart = default_chart

def hover(hover_color='silver'):
    return dict(selector="tr:hover", props=[("background-color", "%s" % hover_color)])

def alternate_row_colors(background_color='gainsboro'):
  return dict(selector='tr:nth-child(even)',props=[("background-color", "%s" % background_color)])

def color_signs(s):
    '''
    Color positive values green, negative values red, zero blue
    '''
    zeros=np.where(s==0)
    s=np.where(s>0, "color: green", "color: red")
    s[zeros]="color: blue"
    return s

def style_df(a):

    a_styled = a.style.set_properties(**{'text-align': 'center'})#.hide_index("ID") outdated pandas
    a_styled = a_styled.set_table_styles([dict(selector='th', props=[('text-align', 'center')])]) # centering index name
    a_styled = a_styled.format("{:.1f}",subset=['Score'])
    a_styled = a_styled.format("{:+.0f}",subset=['Diff.'])
    a_styled = a_styled.format("{:.0f}",subset=['Rank','Votes','Runtime'])
    #styles = [alternate_row_colors(),hover()]   
    #a_styled.set_table_styles(styles)
    a_styled.apply(color_signs, axis=0, subset=['Diff.'])
    return a_styled

def filt(vote_matrix, Results, Weight, PopWeight, Classic,pop_multiplier):

    if Classic:
      results = get_results_classic(vote_matrix,Weight,MAX_LENGTH)
    
    else:
      results = get_results_df(vote_matrix, Weight, PopWeight, pop_multiplier)
    
    results = pd.concat([results,meta_df_display],axis=1)
    results = results.sort_values(by="Rank")
    results["Diff."] = BORDA_RANK_CLASSIC["Rank"] - results["Rank"]
    results = results[["Rank","Score","Votes","Diff."]+metacols]
    charts.custom_chart = results
    return results[:Results]

def plot_weights(Weight,PopWeight,Classic,pop_multiplier,most_votes,most_votes_title):

    fig, ax = plt.subplots(1,2,figsize=(15,5))

    # Point distribution
    x = np.arange(1,51)
    y = superellipse(x-1,n=Weight,a=1,b=1,size=MAX_LENGTH) # x-1 to move superellipse upwards
    ax[0].scatter(x, y)
    ax[0].plot([1,MAX_LENGTH],[MAX_LENGTH,1], label="Borda", ls="--",c="darkgrey")
    ax[0].set_xlabel("List Rank")
    ax[0].set_ylabel("Points")
    ax[0].set_ylim(0,1.1*MAX_LENGTH)

    # Popularity multipliers
    x = np.arange(1,most_votes+1)
    y = x if Classic else pop_multiplier(x, most_votes, PopWeight)
    ax[1].scatter(x,y)
    ax[1].set_xlabel("Film Votes")
    ax[1].set_ylabel("Popularity Multiplier")
    if Classic: ax[1].set_ylim(0,1.1*most_votes) 
    else: ax[1].set_ylim(0,2.1)
    ax[1].axvline(most_votes,ls="--",c="darkgrey",label=f"Most Votes ({most_votes}): {most_votes_title}")
    
    for a in ax: a.legend(loc="best")#"upper center")
    fig.tight_layout()
    plt.show()

def plot_stats(a,Stats,Results,Display):

  fig, ax = plt.subplots(1,1,figsize=(15,5))
  ax = np.array(ax).flatten()

  if Stats == "Release":
    stats = (a["Release"].str[:4].values)

  elif Stats == "Runtime":
    ax[0].hist(a[Stats].dropna(),bins=40)
    fig.tight_layout()
    plt.show()
    return
  else:
    stats = list(a[Stats].dropna().astype(str))
  stats = [s.split(",")[0] for s in stats]

  director_counts = Counter(stats)
  try: del director_counts["nan"]
  except KeyError: pass
  directors = sorted([(k,v) for k, v in director_counts.items()], key=lambda tup: -tup[1])[:20]
  directors_labels=[d[0] for d in directors]
  directors_counts = [d[1] for d in directors]

  sns.barplot(x=directors_labels, y=directors_counts,palette=palette, ax=ax[0])
  ax[0].tick_params(
    axis='x',          # changes apply to the x-axis
    which='both',      # both major and minor ticks are affected
    bottom=False,      # ticks along the bottom edge are off
    top=False,         # ticks along the top edge are off
    labelbottom=False) # labels along the bottom edge are off

  ax[0].set_title(f"{Stats} (Top {len(a)} Films)",fontsize="large")
  def autolabel(ax, labels):
    rects = ax.patches
    n = len(rects)
    for i, (label, rect) in enumerate(zip(labels,rects)):
      height = 2
      ax.text((i+0.55)/n,0.1,label,transform=ax.transAxes,
              ha='center', va='bottom', rotation=90, color='black')

  autolabel(ax[0],directors_labels)

  fig.tight_layout()
  plt.show()

  if Display != "DataFrame":
    d = pd.DataFrame(directors_counts,index=directors_labels,columns=["Count"])
    d["Rank"] = d["Count"].rank(ascending=False,method='min').astype(int)
    for i in d.index:
      print(f"{d.loc[i,'Rank']}. {i} ({d.loc[i,'Count']})")
    print("\n")

def display_df(Input,Results,Weight,PopWeight,Plot,Display,Classic,PlotStats,Stats):

    classic_w.observe(observe_classic_w, 'value')

    Weight = WEIGHT_DISTRIBUTION[Weight+10]

    PopWeight *= 10
    if np.abs(PopWeight) <= 100:
      pop_multiplier = linear_pop_multiplier
    else:
      PopWeight -= np.sign(PopWeight)*100
      pop_multiplier = elliptical_pop_multiplier
    
    if Input == "Full": 
      vm = vote_matrix
      most_votes = MOST_VOTES
      most_votes_title = MOST_VOTES_TITLE
    else:
      cluster_number = int(Input.split(" ")[-1])
      voters = clusters.clusters_df.xs(cluster_number, level="Cluster",axis=1).columns
      vm = vote_matrix[voters]
      vote_counts = vm.mask(vm>0,1).sum(axis=1)
      most_votes = max(vote_counts)
      most_votes_title = vote_counts.sort_values().index[-1][1]
      vm = vm[vote_counts > 0]

    a = filt(vm, Results, Weight,PopWeight,Classic,pop_multiplier)

    if Plot:
      plot_weights(Weight,PopWeight,Classic,pop_multiplier,most_votes,most_votes_title)

    if PlotStats:
      plot_stats(a,Stats,Results,Display)

    if Display == "DataFrame":
      a_styled = style_df(a)
      display(a_styled)

    elif Display in ["RYM Print", "RYM Print Diff."]:
      for i in a.index:
        diff = a.loc[i,'Diff.']
        if diff > 0: color = "green"
        elif diff == 0: color = "blue"
        else: color = "red"
        string = f"[b]{a.loc[i,'Rank']:.0f}.[/b]"
        string += f" {i[0]} | Score: {a.loc[i,'Score']:.1f} | Votes: {a.loc[i,'Votes']:.0f}"
        if Display == "RYM Print Diff.": string += f" | [color {color}]{diff:+.0f}[/color]"
        print(string)

MAX_LENGTH = np.max(vote_matrix.values) # Maximum list length
COUNTS = vote_matrix.astype(bool).sum(axis=1)
MOST_VOTES_TITLE = COUNTS.sort_values().index[-1][1]
MOST_VOTES = max(COUNTS) # Number of votes of most voted entry
WEIGHT_DISTRIBUTION = list(np.linspace(0.1,0.9,10)) + list(np.linspace(1,5,11))

metacols = ['Release','Runtime','Genres','Language','Cast','Director','Producer','Writer','Director of Photography','Editor','Composer','Sound Designer','Art Direction','Production Design','Costume Design','Makeup Artist']
meta_df_display = pd.read_csv(titles_csv, index_col=[0,1],header=0,usecols=["ID","Title"]+metacols)[metacols]
meta_df_display["Release"] = meta_df_display["Release"].astype(str)

clusters = Clusters(None)
charts = Charts(None)
charts.set_default_chart(BORDA_RANK_CLASSIC)

# Widgets
layout={'width': '350px'}
res_w = widgets.IntSlider(min=10, max=len(vote_matrix)+10, step=10,layout=layout,value=100,description='Results:',continuous_updates=False)
disable_pop_weight=False
pop_weight = widgets.IntSlider(min=-20, max=20, step=1,layout=layout,value=0,description='Pop Weight:',continuous_updates=False, disabled=disable_pop_weight)

def observe_classic_w(*args):
  """Freeze pop weight when classic true"""
  pop_weight.disabled = classic_w.value

top_weight = widgets.IntSlider(min=-10, max=10, step=1,layout=layout,value=0,description='Top Weight:',continuous_updates=False)
plot_w = widgets.Checkbox(value=True,description='Plot Point Distribution')
classic_w = widgets.Checkbox(value=False,description='Classic')
display_w = widgets.Dropdown(options=['DataFrame','RYM Print','RYM Print Diff.'], value='DataFrame', description='Display:', disabled=False)
input_w = widgets.Dropdown(options=['Full'], value='Full', description='Input:', disabled=False)
plot_stats_w = widgets.Checkbox(value=True,description='Plot Statistics')
stats_w = widgets.Dropdown(options=metacols, value='Director', description='Stats:', disabled=False)

ws = [res_w,plot_w,input_w,display_w,top_weight,pop_weight,plot_stats_w,classic_w,stats_w]

out = widgets.interactive_output(display_df,{'Input':input_w,'Results':res_w,'Weight':top_weight,'PopWeight':pop_weight,"Plot":plot_w,"Display":display_w,"Classic":classic_w,"PlotStats":plot_stats_w,'Stats':stats_w})
ui = widgets.GridBox(ws, layout=widgets.Layout(grid_template_columns="repeat(2, 400px)"))

display(ui, out)

GridBox(children=(IntSlider(value=100, description='Results:', layout=Layout(width='350px'), max=1066, min=10,…

Output()

# Voter Correlations

Select a voter to display how closely their list is correlated with the lists of all other voters. Also displayed in green are all the votes in common. The opacity is determined by the difference in ranking with the selected voter.

In [3]:
#@markdown <- Run this cell to activate the widgets. Re-run if the widgets disappear. If anything else breaks, try re-running all the cells from the top, else contact me on RYM (YasashiiDia).

import ipywidgets as widgets
import matplotlib.pyplot as plt
from IPython.display import display
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

def add_voter_to_vote_matrix(vote_matrix, vote_list, name=""):

  vote_matrix_expanded = vote_matrix.copy()
  vote_matrix_expanded[name] = np.zeros(len(vote_matrix_expanded))
  vote_matrix_expanded.loc[vote_list.values, name] = vote_list.index.to_list()
  return vote_matrix_expanded

def add_voter_to_all_user_votes(vote_matrix, all_user_votes, vote_list, name=""):

  titles = vote_list.values
  titles = vote_matrix.loc[titles].index.get_level_values(level="Title")
  all_user_votes_expanded = all_user_votes.copy()
  all_user_votes_expanded[name] = titles
  return all_user_votes_expanded

def highlight_common(col, titles, props=''):

    mask = np.isin(col.values, titles)
    return np.where(mask, props, '')

def highlight_common_diff(col, titles, props=''):
    """
    Highlight elements in col that are also in titles
    Alpha of highlight color increases with proximity of common elements along row axis
    """
    mask = np.isin(col.values, titles)
    titles_pos = [np.where(titles==c)[0][0] for c in col[mask]]
    col_pos = np.arange(len(col))[mask]
    diff = list(1-np.abs(col_pos-titles_pos)/MAX_LENGTH)
    props = ["" if not m else props+f"{diff.pop(0):.2f})" for m in mask]  
    return props

def style_df2(a, Voter, Shade):
    
    a_styled = a.style.set_properties(**{'text-align': 'center'})
    a_styled = a_styled.set_table_styles([dict(selector='th', props=[('text-align', 'center')])]) # centering index name
    titles = a[Voter].values

    if Shade:
      a_styled.apply(highlight_common_diff, args=(titles,), axis=0, props='color:black;background-color:rgba(60, 179, 113, ')#mediumseagreen')
    # else:
    #   a_styled.apply(highlight_common, args=(titles,), axis=0, props='color:black;background-color:mediumseagreen')

    return a_styled

def filt2(vmc, Voter):

  c = vmc[Voter].sort_values(ascending=False)
  col_arrays = [all_user_votes_expanded[c.index].columns, c.values.round(2)]
  multi_cols = pd.MultiIndex.from_arrays(col_arrays, names=["Voter","Correlation"])
  votes = all_user_votes_expanded[c.index].values
  results = pd.DataFrame(votes,index=range(1,len(all_user_votes_expanded)+1),columns=multi_cols,copy=True)
  results.index.name = "Rank"
  return results

def display_df2(Voter, Display, RankedCorr, Shade, IncludeCharts=False):

  vmc = vmc_ranked if RankedCorr else vmc_unranked

  if Display == "DataFrame":
    if (not IncludeCharts) and Voter in chart_voters: Voter = voters[0]
    a = filt2(vmc, Voter)
    #if not IncludeCharts: a.drop(chart_voters, axis=1, level=0, inplace=True)
    a_styled = style_df2(a, Voter, Shade)
    display(a_styled)

  elif Display == "RYM Print":
    a=vmc[Voter].sort_values(ascending=False)
    for v in a.index:
      print(f"{v}: {a.loc[v]:.2f}")

MAX_LENGTH = np.max(vote_matrix.values) # Maximum list length
COUNTS = vote_matrix.astype(bool).sum(axis=1)
MOST_VOTES_TITLE = COUNTS.sort_values().index[-1][1]
MOST_VOTES = max(COUNTS) # Number of votes of most voted item

vmc_ranked = vote_matrix_all_ranked.mask(vote_matrix_all_ranked>0, MAX_LENGTH+1-vote_matrix_all_ranked).corr()
vmc_unranked = vote_matrix_all_ranked.mask(vote_matrix_all_ranked>0, 1).corr()
voters = list(vote_matrix_all_ranked.columns)
vmc = vmc_ranked
vm = vote_matrix_all_ranked
all_user_votes_expanded = all_user_votes

# Make chart voters
# classic_voter = charts.default_chart[:MAX_LENGTH].index.get_level_values(level="ID").to_list()
# classic_voter = pd.Series(classic_voter,index=(range(1,len(classic_voter)+1)))
# vote_matrix_expanded = add_voter_to_vote_matrix(vote_matrix_all_ranked, classic_voter, name=f"Top {MAX_LENGTH} Classic")
# all_user_votes_expanded = add_voter_to_all_user_votes(vote_matrix_all_ranked, all_user_votes, classic_voter, name=f"Top {MAX_LENGTH} Classic")

# custom_voter = charts.custom_chart[:MAX_LENGTH].index.get_level_values(level="ID").to_list()
# custom_voter = pd.Series(custom_voter,index=(range(1,len(custom_voter)+1)))
# vote_matrix_expanded = add_voter_to_vote_matrix(vote_matrix_expanded, custom_voter, name=f"Top {MAX_LENGTH} Custom")
# all_user_votes_expanded = add_voter_to_all_user_votes(vote_matrix_expanded, all_user_votes_expanded, custom_voter, name=f"Top {MAX_LENGTH} Custom")

# voters = list(vote_matrix_expanded.columns)
# vm = vote_matrix_expanded
# vmc_ranked = vote_matrix_expanded.mask(vote_matrix_expanded>0, MAX_LENGTH+1-vote_matrix_expanded).corr()
# vmc_unranked = vote_matrix_expanded.mask(vote_matrix_expanded>0, 1).corr()
chart_voters = [f"Top {MAX_LENGTH} Classic", f"Top {MAX_LENGTH} Custom"]

# Widgets
layout={'width': '350px'}
voter_w = widgets.Dropdown(options=voters, value=voters[0], description='Voter:', disabled=False)
display_w_n = widgets.Dropdown(options=['DataFrame','RYM Print'], value='DataFrame', description='Display:', disabled=False)
ranked_correlations_w = widgets.Checkbox(value=True,description='Use rank info for correlations')
shading_w = widgets.Checkbox(value=True,description='Highlight common votes')
include_charts_w = widgets.Checkbox(value=False,description='Include Charts')

ws_n = [voter_w,display_w_n,ranked_correlations_w,shading_w]#,include_charts_w]

out_n = widgets.interactive_output(display_df2,{'Voter':voter_w,'Display':display_w_n,'RankedCorr':ranked_correlations_w,'Shade':shading_w,'IncludeCharts':include_charts_w})
ui_n = widgets.GridBox(ws_n, layout=widgets.Layout(grid_template_columns="repeat(2, 400px)"))

display(ui_n, out_n)

GridBox(children=(Dropdown(description='Voter:', options=('aflickering', 'AinsleyIscariot', 'Allyn', 'amanda_o…

Output()

# Voter Network

Plot a network of voters, where voters are connected if they share a correlation above a chosen threshold.

In [4]:
#@markdown <- Run this cell to active the widgets. Re-run if the widgets disappear. If anything else breaks, try re-running all the cells from the top, else contact me on RYM (YasashiiDia).

import networkx as nx

def display_df_net(Plot, Width, Height, Thresh, SelfCorr):

  # Keep only correlation over a threshold and remove self correlation (cor(A,A)=1)
  links_filtered=links.loc[links['value'] > Thresh]
  if not SelfCorr:
    links_filtered=links_filtered.loc[links_filtered['var1'] != links_filtered['var2']]

  # Build graph
  G=nx.from_pandas_edgelist(links_filtered, 'var1', 'var2')

  plt.figure(1,figsize=(Width,Height))

  if Plot == "Default":
    nx.draw(G, with_labels=True, node_color='cornflowerblue', node_size=400, edge_color='grey', linewidths=0.01, font_size=15, font_color="black")
  elif Plot == "Kamada-Kawai":
    nx.draw_kamada_kawai(G, with_labels=True, node_color='cornflowerblue', node_size=400, edge_color='grey', linewidths=0.01, font_size=15, font_color="black")
  elif Plot == "Circular":
    nx.draw_circular(G, with_labels=True, node_color='cornflowerblue', node_size=400, edge_color='grey', linewidths=0.01, font_size=15, font_color="black")

vmc_ranked = vote_matrix_all_ranked.mask(vote_matrix_all_ranked>0, MAX_LENGTH+1-vote_matrix_all_ranked).corr()
vmc_unranked = vote_matrix_all_ranked.mask(vote_matrix_all_ranked>0, 1).corr()
voters = list(vote_matrix_all_ranked.columns)
vmc = vmc_ranked
vm = vote_matrix_all_ranked
all_user_votes_expanded = all_user_votes

vmc = vmc_ranked
links = vmc.stack().reset_index()
links.columns = ['var1', 'var2', 'value']

MAX_CORRELATION = round(vmc[vmc<1].max().max(), 2)

# Widgets
layout={'width': '350px'}
net_w = widgets.Dropdown(options=["Default","Kamada-Kawai","Circular"], value="Default", description='Network:', disabled=False)
width_w = widgets.IntSlider(min=5, max=40, step=1, layout=layout, value=12,description='Width:')
height_w = widgets.IntSlider(min=5, max=40, step=1, layout=layout, value=12,description='Height:')
thresh_w = widgets.FloatSlider(min=0, max=MAX_CORRELATION, step=0.01, layout=layout, value=2*MAX_CORRELATION/3,description='Threshold:')
self_corr_w = widgets.Checkbox(value=False,description='Self-Correlation')

ws_net = [net_w,thresh_w,width_w,height_w] #self_corr_w

out_net = widgets.interactive_output(display_df_net,{'Plot':net_w,'Width':width_w,'Height':height_w,'Thresh':thresh_w,'SelfCorr':self_corr_w})
ui_net = widgets.GridBox(ws_net, layout=widgets.Layout(grid_template_columns="repeat(2, 400px)"))

display(ui_net, out_net)

GridBox(children=(Dropdown(description='Network:', options=('Default', 'Kamada-Kawai', 'Circular'), value='Def…

Output()

# Voter Clustermap

A visualization of the voter correlation matrix. The rows and columns of the matrix have been reordered according to voter similarity. By "cutting" the dendrogram on the left/top of the clustermap, the voters can be clustered the into subgroups. The cutoff (red dashed line) can be adjusted via the threshold widget. The cluster members are printed below the clustermap. The image size needs to be set fairly high (>30) if you want to render all voter labels in the clustermap (this takes a few seconds).

__Note:__ Running this cell for the first time will upgrade a package (this also takes a few seconds), after which you will have to manually restart the runtime and reload all the cells before you can interact with the clustermap.

Runtime -> Restart and run all<br>

In [5]:
#@markdown <- Run this cell to activate the widgets.  Re-run if the widgets disappear. If anything else breaks, try restarting the runtime and re-running all the cells from the top, else contact me on RYM (YasashiiDia).

# Needed for latest version of hierarchy.dendrogram with dn["leaves_color_list"]
!pip install scipy --upgrade
from scipy.cluster import hierarchy

def sort_cluster_colors(dn):
    """
    Sort leaves_color_list by index so that they can be clustered again by
    sns.clustermap and used as row colors
    """
    zipped = zip(dn["leaves_color_list"], dn["leaves"])
    zipped = sorted(zipped, key = lambda zipped: zipped[1])
    return [c[0] for c in zipped]
    
def clustermap(df_corr, level=0, method="ward", **kwargs):

    linkage = hierarchy.linkage(df_corr, method=method)
    if level > len(linkage[:,2]): level = len(linkage[:,2])-1
    thresh = 0.999*sorted(linkage[:,2], key=lambda x: -x)[level]
    
    fclusters = hierarchy.fcluster(linkage, t=thresh, criterion="distance")
        
    dn = hierarchy.dendrogram(
        linkage,
        leaf_rotation=90.,  # rotates the x axis labels
        leaf_font_size=8.,  # font size for the x axis labels
        color_threshold=thresh,
        labels = df_corr.columns,
        get_leaves = True,
        no_plot = True
    )
    row_colors_cl = sort_cluster_colors(dn)

    col_arrays = [df_corr.columns.to_numpy(), fclusters]
    multi_cols = pd.MultiIndex.from_arrays(col_arrays, names=["Voter","Cluster"])
    df_cl = pd.DataFrame(df_corr.values,index=df_corr.index,columns=multi_cols,copy=True)

    return sns.clustermap(df_corr,row_linkage=linkage,col_linkage=linkage,robust=True,row_colors=row_colors_cl,col_colors=row_colors_cl, method=method, **kwargs), df_cl, thresh

def display_cm(Level, Size):

  cm, df_cl, thresh = clustermap(vmc, Level, figsize=(Size, Size))
  cm.ax_col_dendrogram.axhline(thresh, c='red', linestyle='--',lw=1.5, alpha=0.7)
  cm.ax_row_dendrogram.axvline(thresh, c='red', linestyle='--',lw=1.5, alpha=0.7)

  plt.show()

  for cluster in set(df_cl.columns.get_level_values(level="Cluster")):

    print(f"Cluster: {cluster}")
    xs = df_cl.xs(cluster,level="Cluster",axis=1)
    #display(xs.loc[xs.columns])
    print(xs.columns,"\n")

  clusters.clusters_df = df_cl
  cluster_list = ["Cluster " + str(n) for n in set(df_cl.columns.get_level_values(level="Cluster"))]
  clusters.set_cluster_list(cluster_list)

vmc = vmc_ranked
clusters = Clusters(None)

# Widgets
layout={'width': '350px'}
level_w = widgets.IntSlider(min=0, max=10, step=1, layout=layout, value=4,description='Threshold:', continuous_update=False)
size_w = widgets.IntSlider(min=5, max=40, step=1, layout=layout, value=12,description='Size:', continuous_update=False)

ws_cm = [level_w, size_w]

out_cm = widgets.interactive_output(display_cm,{'Level':level_w,'Size':size_w})
ui_cm = widgets.GridBox(ws_cm, layout=widgets.Layout(grid_template_columns="repeat(2, 400px)"))

display(ui_cm, out_cm)




GridBox(children=(IntSlider(value=4, continuous_update=False, description='Threshold:', layout=Layout(width='3…

Output()

# Cluster Charts

Now that we have clustered the voters using the previous cell, we can create charts based on the subset of voters in our chosen cluster (available in the "Input" dropdown menu).

__Note:__ This will only work if you have clustered the voters using the previous cell. If you re-cluster using a different threshold, you need to re-run the cell below to update the clusters (this will be fixed in a future update).

In [6]:
#@markdown <- Run this cell to activate the widgets.  Re-run if the widgets disappear. If anything else breaks, try re-running all the cells from the top, else contact me on RYM (YasashiiDia).

#@markdown If you get an AttributeError, you need to Runtime -> Restart and run all (and click yes to confirm)

input_w2 = widgets.Dropdown(options=['Full']+clusters.cluster_list, value='Full', description='Input:', disabled=False)
res_w2 = widgets.IntSlider(min=10, max=len(vote_matrix)+10, step=10,layout=layout,value=100,description='Results:',continuous_updates=False)
pop_weight2 = widgets.IntSlider(min=-20, max=20, step=1,layout=layout,value=0,description='Pop Weight:',continuous_updates=False)
top_weight2 = widgets.IntSlider(min=-10, max=10, step=1,layout=layout,value=0,description='Top Weight:',continuous_updates=False)
plot_w2 = widgets.Checkbox(value=True,description='Plot Point Distribution')
classic_w2 = widgets.Checkbox(value=False,description='Classic')
display_w2 = widgets.Dropdown(options=['DataFrame','RYM Print','RYM Print Diff.'], value='DataFrame', description='Display:', disabled=False)
plot_stats_w2 = widgets.Checkbox(value=True,description='Plot Statistics')
stats_w2 = widgets.Dropdown(options=metacols, value='Director', description='Stats:', disabled=False)

ws2 = [res_w2,plot_w2,input_w2,display_w2,top_weight2,pop_weight2,plot_stats_w2,classic_w2,stats_w2]

out2 = widgets.interactive_output(display_df,{'Input':input_w2,'Results':res_w2,'Weight':top_weight2,'PopWeight':pop_weight2,"Plot":plot_w2,"Display":display_w2,"Classic":classic_w2,"PlotStats":plot_stats_w2,'Stats':stats_w2})
ui2 = widgets.GridBox(ws2, layout=widgets.Layout(grid_template_columns="repeat(2, 400px)"))

display(ui2, out2)

GridBox(children=(IntSlider(value=100, description='Results:', layout=Layout(width='350px'), max=1066, min=10,…

Output()

# Changelog

#### Features

- 3.12.2021: Initial release

#### Fixes

- 4.12.2021: Cluster Charts: Fixed broken elliptical Pop Weight; film with most votes now specific to cluster

#### Minor improvements

- 4.12.2021: Interactive Chart: Pop Weight widget is now disabled when using Classic = True

# To do

#### Features

- Proper chart display with images

- Reset chart button

- Personal recommendations

- Save dataframe as image

- Metadata rankings

- Print full metadata stats

- Voter correlation with charts

- Weighted rank correlation

#### Fixes

#### Minor improvements

- Normalize pop multiplier

#### Code improvements

- Commenting

- Generally make code less lousy

- Remove chart voters before clustering

- Encapsulate expanded voter matrix and cluster list in charts class

- Profiling

- Make Python package