# Task C: Visualisation of Movie Category, Votes, and Year

The top 99 highest rated IMDB movies are visualised in this demonstration. We display how the category of a movie affects the total votes received. In essence, this shows us how categories affect popularity and allow the user to filter the movies based on the year.

## Instructions

1. The full range of years and votes is allowed by default. The sliders can be adjusted to restrict the ranges freely and the graphs will reflect the updated range. Use the sidebar on the left for this.
2. There is also a slider that changes the height of the charts.
3. Hovering over the points will show related information on tooltips.
4. The scatter plot allows rectangular selection and the bar chart allows interval selection. The brushes selectively highlight on both plots. The selection area can be moved around by dragging. The size can be changed by scrolling on the selected area.
5. Click outside the selection area to deselect and reset the plot.

In [None]:
# %%capture
# Code comments are not visible to the end-user
# !pip install altair
# !pip install mercury
# !pip install pandas

In [2]:
# Import the necessary packages
import altair as alt  # For interactive and custom visuals
import pandas as pd   # For working with the dataset
import mercury as mr  # For additional input and filtering capabilities

In [3]:
# We load in the data set from the CSV file
movies = pd.read_csv('data/imdb100/movies.csv')
# movies

In [4]:
def clean_year(year):
    """
    Removes the parentheses and returns only the year integer value
    """
    return year[1:-1]

In [5]:
def clean_votes(votes):
    """
    Removes the commas and converts votes to usable integer value
    """
    return int(votes.replace(",", ""))

In [6]:
# Assign the cleaned year value to a new column
movies['Year'] = movies['year_of_release'].apply(clean_year)

# Do the same for votes
movies['Votes'] = movies['votes'].apply(clean_votes)
# movies

In [7]:
# Remove the extra columns
movies.drop(columns=[
    'gross_total', 'year_of_release', 'run_time', 'genre', 'votes'
], inplace=True)

# Rename the rest of the columns to cleaner names
movies.rename(columns={
    'index': 'Rank', 'movie_name': 'Movie Name', 'category': 'Category', 'imdb_rating': 'Rating'
}, inplace=True)

# Assign types for easier type inference
movies = movies.astype({
    'Rank': 'int',
    'Movie Name': 'string',
    'Category': 'string',
    'Votes': 'int',
    'Year': 'int',
    'Rating': 'float'
})
# movies.dtypes

In [8]:
# Extract all individual categories present in the Category column values
categories = list(set(movies['Category']))
# categories

In [9]:
app = mr.App(
    description="A visualisation with interactive plots showing the relationship between income, rating, and category.",
    show_code=False,
    show_prompt=False,
    continuous_update=True,
    static_notebook=False,
    show_sidebar=True,
    full_screen=True,
    allow_download=False,
)

In [10]:
# Chart configuration parameters
chart_height = mr.Slider(value=380, min=350, max=600, label="Chart height", step=20)
chart_width = 600

mercury.Slider

In [11]:
# Year selection tool
min_year = min(movies['Year'])
max_year = max(movies['Year'])
selected_years = mr.Range(value=[min_year, max_year], min=min_year, max=max_year, label="Year of release", step=1)

mercury.Range

In [12]:
# Vote selection tool
min_votes = min(movies['Votes'])
min_votes = int(min_votes / 1000) * 1000
max_votes = max(movies['Votes'])
max_votes = int(1 + max_votes / 1000) * 1000
selected_votes = mr.Range(value=[min_votes, max_votes], min=min_votes, max=max_votes, label="Votes", step=1000)

mercury.Range

In [13]:
# selected_categories.value

In [14]:
# Filter based on the year and votes
y_min = selected_years.value[0]
y_max = selected_years.value[1]
v_min = selected_votes.value[0]
v_max = selected_votes.value[1]
filtered_movies = movies.query(f'Year >= {y_min} and Year <= {y_max} and Votes >= {v_min} and Votes <= {v_max}')
# filtered_movies

In [15]:
# The information we want to display on tooltips
tooltip_cols = ['Movie Name', 'Rank', 'Category', 'Rating', 'Votes', 'Year']

# The selection mechanisms on the graphs
brush = alt.selection(type='interval', resolve='global')
# single = alt.selection_single()

# The graph opacity condition for local selection (brush and click)
# opacity = alt.condition(brush | single, alt.value(0.9), alt.value(0.1))
opacity = alt.condition(brush, alt.value(0.9), alt.value(0.05))

# The chart for scatter plot of the ratings of the movies and their gross_total
points = alt.Chart(movies).mark_point(filled=True).encode(
    alt.X('Category'),
    alt.Y('Votes'),
    tooltip=tooltip_cols,
    color=alt.condition(
        (alt.datum.Year >= y_min) & (alt.datum.Year <= y_max) &
        (alt.datum.Votes >= v_min) & (alt.datum.Votes <= v_max),
        alt.Color('Rank:Q',
             scale=alt.Scale(scheme='lighttealblue')
         ),
        alt.value('lightgrey')
    ),
    size='Rating:Q',
    opacity=opacity
).add_selection(
    brush,
#     single
).properties(
    width=chart_width,
    height=chart_height.value
)

filter_present = (selected_years.value[1] - selected_years.value[0] > 0) and \
                (selected_votes.value[1] - selected_votes.value[0] > 0) 
if filter_present:
    # Filtered chart for movies with stacked vote counts
    pie = alt.Chart(filtered_movies).mark_arc().encode(
        color=alt.Color('Category', scale=alt.Scale(scheme="category20")),
        theta='sum(Votes)',
        tooltip=tooltip_cols,
        opacity=opacity,
#     ).add_selection(
#         brush,
#     #     single
    ).properties(
        width=chart_width,
        height=chart_height.value
    )
    

# Vertically stack the two charts
final = (points & pie) if filter_present else points
final.configure_axis(
    labelFontSize=14,
    titleFontSize=20,
)

In [16]:
# Display the filtered table of movies
if filter_present:
    display_movies = filtered_movies.set_index('Rank')
display_movies[['Movie Name', 'Category', 'Votes','Year']] if filter_present else None

Unnamed: 0_level_0,Movie Name,Category,Votes,Year
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,The Godfather,R,1860471,1972
2,The Silence of the Lambs,R,1435344,1991
3,Star Wars: Episode V - The Empire Strikes Back,PG,1294805,1980
4,The Shawshank Redemption,R,2683302,1994
5,The Shining,R,1025560,1980
...,...,...,...,...
95,The Usual Suspects,R,1087832,1995
96,Cool Hand Luke,GP,178888,1967
97,Eternal Sunshine of the Spotless Mind,R,1011004,2004
98,City Lights,G,186059,1931


Made with ❤️ by Group 18.

February 2023