<a href="https://drive.google.com/file/d/14gzedf2oP9FE340sD1AGBhti0HNjDrqv/view?usp=sharing">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

This tutorial needs data so if you working on colab follow the below data setup instruction

# Data Setup Instructions

These are the instructions for mounting the data from google drive to colab and accessing it in the colab.

STEP 1 - After opening the tutorial in  your colab, go to folder button and click on mount google drive

STEP 2 - drive folder will be mounted in the current directory of /content, you can access it as below 

In [None]:
# print current directory
%pwd

'/content'

In [None]:
%ls

[0m[01;34mdrive[0m/  [01;34msample_data[0m/


STEP 3 - Find your data folder where you saved the data and sym link it to /content folder so as to simplify data access

In the current case the Data folder is located at this path in google drive (Use your own data path in your case)

/content/drive/Othercomputers/My MacBook Pro/Data/

We can sym link it to /content folder using the following command

In [1]:
# sym linked the original data folder to new location at /content
!ln -s "/content/drive/Othercomputers/My MacBook Pro/Data" "/content"

Now we can access the data from this folder by simply giving the file path name after /Data

# **Install jupyter-dash and dash library**

In [2]:
# install dash and jupyter dash
!pip install jupyter-dash
!pip install dash

Collecting jupyter-dash
  Downloading jupyter_dash-0.4.2-py3-none-any.whl (23 kB)
Collecting dash
  Downloading dash-2.6.1-py3-none-any.whl (9.9 MB)
Collecting ansi2html
  Downloading ansi2html-1.8.0-py3-none-any.whl (16 kB)
Collecting retrying
  Downloading retrying-1.3.3.tar.gz (10 kB)
Collecting plotly>=5.0.0
  Downloading plotly-5.10.0-py2.py3-none-any.whl (15.2 MB)
Collecting flask-compress
  Downloading Flask_Compress-1.12-py3-none-any.whl (7.9 kB)
Collecting dash-core-components==2.0.0
  Downloading dash_core_components-2.0.0-py3-none-any.whl (3.8 kB)
Collecting dash-table==5.0.0
  Downloading dash_table-5.0.0-py3-none-any.whl (3.9 kB)
Collecting dash-html-components==2.0.0
  Downloading dash_html_components-2.0.0-py3-none-any.whl (4.1 kB)
Collecting tenacity>=6.2.0
  Downloading tenacity-8.0.1-py3-none-any.whl (24 kB)
Collecting brotli
  Downloading Brotli-1.0.9-cp39-cp39-win_amd64.whl (383 kB)
Building wheels for collected packages: retrying
  Building wheel for retrying (setu

# **Building an interactive dashboard**

The question we should ask as a Data Analyst is

What would the people in business would like to see in order to assess the movie preference on their own?

The answer to this question is an interactive dashboard

The reason for having a dashboard in this case is that we want to analyse the whole dataset based on two different rating system across different dimensions.

If we are to build plots for each dimension separately with specific requirement it creates a lot of effort and loss of valuable time in getting the results as well. So we would want to build one single dashboard from where we can analyse these two ratings systems across different dimensions like genre, year released and money earned.
 

We will build a dashboard showing IMDB and rotten tomato score for movies on a scatter plot across following dimensions
* Year released
* genre
* Income earned

This dashboard will help business people filter movies on their own and explore those movies ratings. 

Using this preliminary analysis they can do their own further research. 

Below we have written the code to build such an interactive dashboard.

For this dashboard we will utilise the combined_data_final.csv file.

# Dashboard code

In [3]:
# import the libraries
import pandas as pd
from jupyter_dash import JupyterDash
#import dash_html_components as html
from dash import html
#import dash_core_components as dcc
from dash import dcc
from dash.dependencies import Input, Output
import pandas as pd
import plotly.express as px


In [4]:
# read the file
# if running from repo on local, use the Data folder location in repo final dashboard file is saved there - 'Data/combined_deployment_data.csv'
dashboard_data = pd.read_csv('../Data/combined_deployment_data.csv') 
dashboard_data.head()

Unnamed: 0,original_title,year,genre,duration,country,language,imdb_score,worldwide_gross_income,tomatometer_rating,imdb_scaled
0,The Kid,1921,"Comedy, Drama, Family",68,USA,"English, None",8.3,0.026916,100.0,83.0
1,A Woman of Paris: A Drama of Fate,1923,"Drama, Romance",82,USA,"None, English",7.0,0.011233,92.0,70.0
2,The Gold Rush,1925,"Adventure, Comedy, Drama",95,USA,"English, None",8.2,0.026916,100.0,82.0
3,Metropolis,1927,"Drama, Sci-Fi",153,Germany,German,8.3,1.349711,97.0,83.0
4,Sunrise: A Song of Two Humans,1927,"Drama, Romance",94,USA,English,8.1,0.121107,98.0,81.0


In [5]:
dashboard_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7148 entries, 0 to 7147
Data columns (total 10 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   original_title          7148 non-null   object 
 1   year                    7148 non-null   int64  
 2   genre                   7148 non-null   object 
 3   duration                7148 non-null   int64  
 4   country                 7148 non-null   object 
 5   language                7142 non-null   object 
 6   imdb_score              7148 non-null   float64
 7   worldwide_gross_income  7148 non-null   float64
 8   tomatometer_rating      7143 non-null   float64
 9   imdb_scaled             7148 non-null   float64
dtypes: float64(4), int64(2), object(4)
memory usage: 558.6+ KB


In [6]:
# create genre list
def convert_genre_list(genre):
  split_genre = genre.split(',')
  remove_spaces_genre_list = [x.strip() for x in split_genre]
  return remove_spaces_genre_list

# for forming the similar dataframe for tomatometer rating we can do so through following steps
list_genre = dashboard_data[['genre']].copy()
list_genre['genre_list'] = list_genre.apply(lambda row:convert_genre_list(row['genre']),axis=1)
list_genre.drop(['genre'],axis=1,inplace=True)
list_genre_explode = list_genre.explode('genre_list') 
list_genre_groupby = list_genre_explode.groupby('genre_list').size().reset_index().drop([0],axis=1)

# final genre lis
genre = list(list_genre_groupby['genre_list'].unique()) + ['All Genre']

In [9]:
# Create the Dash app
app = JupyterDash(__name__)

# Set up the app layout
app.layout = html.Div(children=[
    html.H1(children='Imdb vs Rotten tomatoes Ratings Dashboard'),
    html.H2(children='Year Released'),
    dcc.RangeSlider(
            id='year-released-range-slider',
            min=dashboard_data.year.min(),
            max=dashboard_data.year.max(),
            marks={str(y): str(y) for y in range(int(dashboard_data.year.min()), int(dashboard_data.year.max()), 5)},
            value=[dashboard_data.year.min(), dashboard_data.year.max()]
        ),
    html.Br(),
    html.H2(children='Box Office Earnings (in millions)'),
    dcc.RangeSlider(
            id='box-office-range-slider',
            min=dashboard_data.worldwide_gross_income.min(),
            max=dashboard_data.worldwide_gross_income.max(),
            marks={str(y): str(y) for y in range(int(dashboard_data.worldwide_gross_income.min()), \
                                                 int(dashboard_data.worldwide_gross_income.max()), 200)},
            value=[dashboard_data.worldwide_gross_income.min(), dashboard_data.worldwide_gross_income.max()]
        ),
    html.Br(),
    html.H2(children='Genre'),
    dcc.Dropdown(
        id = 'genre-dropdown',
        options=[{'label':i,'value':i} for i in genre],
        value='All Genre'
    ),
    html.Br(),
    dcc.Graph(id='rating-graph')
])


# Set up the callback function
@app.callback(
    Output(component_id='rating-graph', component_property='figure'),
    [
     Input(component_id='year-released-range-slider', component_property='value'),
     Input(component_id='box-office-range-slider',component_property='value'),
     Input(component_id='genre-dropdown',component_property='value')
    ]
)
def update_graph(selected_year,gross_income,genre_name):
    year_released_start, year_released_end = selected_year
    gross_income_start,gross_income_end = gross_income
    filtered_df1 = dashboard_data.loc[(dashboard_data['year'] >= year_released_start)&(dashboard_data['year'] <= year_released_end)]
    filtered_df2 = filtered_df1.loc[(filtered_df1['worldwide_gross_income']>=gross_income_start)&(filtered_df1['worldwide_gross_income']<=gross_income_end)]
    if genre_name == 'All Genre':
      genre_name_select = ''
    else:
      genre_name_select = genre_name
    filtered_final = filtered_df2.loc[filtered_df2['genre'].str.contains(genre_name_select)]
    scatter_fig = px.scatter(filtered_final,
                       x='imdb_scaled', y='tomatometer_rating',hover_name='original_title',
                       hover_data=['genre','worldwide_gross_income','year'],
                       range_x = [0,100],range_y=[-10,110],
                       title=f'Rating comparison - years selected {selected_year} - box office range {gross_income} - genre {genre_name}')
    return scatter_fig


# Run local server
app.run_server(mode='inline')

We will deploy this final dashboard to public webpage.