## Assignment: Box Office Winner

In this assignment, your task is to reverse engineer a provided visualization from raw data. Specifically, we will visualize the daily box office winners in 2023. The raw data comes from [BoxOfficeMojo](https://www.boxofficemojo.com/daily/2023/?view=year). The target visualization is the following.

![Box Office Winner 2023](https://private-user-images.githubusercontent.com/3606672/304127856-f404debd-b1bf-4a98-933e-d3b27e3b3921.svg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc3NTAyNjAsIm5iZiI6MTcwNzc0OTk2MCwicGF0aCI6Ii8zNjA2NjcyLzMwNDEyNzg1Ni1mNDA0ZGViZC1iMWJmLTRhOTgtOTMzZS1kM2IyN2UzYjM5MjEuc3ZnP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIxMiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMTJUMTQ1OTIwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZGZlZTU1OTc4YWNhMDEyNGM5MGM5NjcxOGY4NDM1ZGQ1YTQyZmM1MjY3MzYzNzhiNzUzODc3N2VjZGY0N2Q0MyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.Dj9k8qT8S0xRrVT6TM6OoNaAVerrwJLHRbzd9VAvbWs)

Our temporal axis, spanning from January 1st, 2023 to December 31st, 2023, is represented along the X-axis. Meanwhile, the Y-axis delineates the daily top release for each day. We employ rounded bars to visually signify the duration of a release's dominance at the box office. Each top release is distinguished by a unique color, accompanied by its title displayed preceding the corresponding bar. These releases are organized chronologically, following the order of their initial ascent to the top position.

In [1]:
import altair as alt
import pandas as pd

url = "https://github.com/qnzhou/practical_data_visualization_in_python/files/14239903/box_office_2023.csv"
df = pd.read_csv(url)

In [2]:
# Reformat and sort by date
df['Date'] = pd.to_datetime(df['Date'], format='%b %d %Y')
df = df.sort_values(by='Date').reset_index(drop=True)
df.head()

Unnamed: 0,Date,Holiday,Day of Week,Top 10 Gross,Number of Releases,Top Release,Gross
0,2023-01-01,New Year's Day,Sunday,36210982,31,Avatar: The Way of Water,24519161
1,2023-01-02,,Monday,32548656,30,Avatar: The Way of Water,21411622
2,2023-01-03,,Tuesday,16965068,31,Avatar: The Way of Water,10544729
3,2023-01-04,,Wednesday,12131291,30,Avatar: The Way of Water,7475308
4,2023-01-05,,Thursday,10864987,30,Avatar: The Way of Water,6830651


In [3]:
# Get "date chunks" for each release
top_releases = df['Top Release'].unique()
date_chunks = []

for release in top_releases:
    release_df = df[df['Top Release'] == release]
    release_df = release_df.sort_values(by='Date')

    # Find consecutive date chunks for the current release
    start_date, end_date, previous_date = None, None, None

    for _, row in release_df.iterrows():
        current_date = row['Date']
        if start_date is None:
            # Start of a new chunk
            start_date = current_date
            end_date = current_date + pd.Timedelta(days=1)
        elif current_date == previous_date + pd.Timedelta(days=1):
            # Continuation of the current chunk
            end_date = current_date + pd.Timedelta(days=1)
        else:
            # End of the current chunk, start of a new one
            date_chunks.append({'Top Release': release, 'Start Date': start_date, 'End Date': end_date})
            start_date = None
        previous_date = current_date

    # Add the last chunk for the movie
    if start_date is not None:
        date_chunks.append({'Top Release': release, 'Start Date': start_date, 'End Date': end_date})

# Convert the chunks list to a DataFrame
df_chunks = pd.DataFrame(date_chunks)
# Sort by the "Start Date" for each "Top Release" to ensure correct order
df_chunks = df_chunks.sort_values(by=['Start Date'], ascending=True)
df_chunks.head()

Unnamed: 0,Top Release,Start Date,End Date
0,Avatar: The Way of Water,2023-01-01,2023-01-06
4,M3GAN,2023-01-06,2023-01-07
1,Avatar: The Way of Water,2023-01-08,2023-01-25
5,Pathaan,2023-01-25,2023-01-26
2,Avatar: The Way of Water,2023-01-27,2023-02-02


In [4]:
# Create the base chart with the date chunks
base_chart = alt.Chart(df_chunks).mark_bar().encode(
    x=alt.X('Start Date:T',
            axis=alt.Axis(title=None, orient='top'),
            scale=alt.Scale(
                domain=[pd.to_datetime('2023-01-01'), pd.to_datetime('2024-01-01')])),
    x2='End Date:T',
    y=alt.Y('Top Release:N', sort='x', axis=alt.Axis(labels=False, ticks=False,
                                                     title=None)),
    color=alt.Color('Top Release:N', legend=None)).properties(width=1000, height=800)

# Add text labels for the first chunk of each release
first_chunks = df_chunks.drop_duplicates(subset=['Top Release'], keep='first')
first_chunk_text = alt.Chart(first_chunks).mark_text(
    fontSize=10,
    fontWeight=500,
    align='right',
    baseline='middle',
    dx=-5,
).encode(x='Start Date:T', y=alt.Y('Top Release:N', sort='x'), text='Top Release:N', color='Top Release:N')

# Create the title for the chart
title = alt.TitleParams(
    text='Box Office Winners',
    fontSize=32,
    color='grey',
    fontWeight=700,
    align='center',
    anchor='end',
    subtitle='Daily top release in 2023',
    subtitleColor='grey',
    subtitleFontSize=14,
    subtitleFontWeight=500,
    dx=-300,
    dy=205
)

# Combine all elements to create the final chart
final_chart = (base_chart + first_chunk_text).properties(title=title).configure_mark(
    cornerRadiusTopLeft=6, cornerRadiusBottomLeft=6, cornerRadiusTopRight=6, cornerRadiusBottomRight=6)

# Display the final chart
final_chart


  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


In [5]:
final_chart.save('chart.png')

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
