<a href="https://colab.research.google.com/drive/1n1JQIbrbVZFE4JpATo5xw4-6A0t6Kd7Y?usp=sharing">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

This tutorial needs data so if you are working on colab follow the below data setup instruction

# Data Setup Instructions

These are the instructions for mounting the data from google drive to colab and accessing it in the colab.

STEP 1 - After opening the tutorial in  your colab, go to folder button and click on mount google drive

STEP 2 - drive folder will be mounted in the current directory of /content, you can access it as below 

In [None]:
# print current directory
%pwd

'/content'

In [None]:
%ls

[0m[01;34mdrive[0m/  [01;34msample_data[0m/


STEP 3 - Find your data folder where you saved the data and sym link it to /content folder so as to simplify data access

In the current case the Data folder is located at this path in google drive (Use your own data path in your case)

/content/drive/Othercomputers/My MacBook Pro/Data/

We can sym link it to /content folder using the following command

In [2]:
# sym linked the original data folder to new location at /content
!ln -s "/content/drive/Othercomputers/My MacBook Pro/Data" "/content"

Now we can access the data from this folder by simply giving the file path name after /Data

# Importing pandas library and data loading

In [3]:
import pandas as pd

In this lesson we are will be using movies_cleaned.csv file.

In the lesson instructions for Pandas - Advanced Real World Data Analysis, we have mentioned that you need to rename the file 

Movies_cleaned_lesson2.csv (created in lesson 2 of Pandas - Data Cleaning) -> movies_cleaned.csv

The file is saved in the path where rest of the IMDB dataset is saved. i.e. 

"Data/IMDB_rotten_tomato_dataset/IMDB/movies-cleaned.csv"

You can read this file in the below way.

In [4]:
# if you are working with this tutorial on local machine use the file path where the data is saved in your computer
movies_cleaned = pd.read_csv("Data/IMDB_rotten_tomato_dataset/IMDB/movies_cleaned.csv")
# We can use .head command to quickly observe the first 5 rows of the dataset
movies_cleaned.head()

Unnamed: 0,imdb_title_id,original_title,year,date_published,genre,duration,country,language,imdb_score,votes,budget,usa_gross_income,worldwide_gross_income,metascore,movie_age
0,tt0000009,Miss Jerry,1894,1894-10-09,Romance,45,USA,,5.9,154,,,,,127
1,tt0000574,The Story of the Kelly Gang,1906,1906-12-26,"Biography, Crime, Drama",70,Australia,,6.1,589,$ 2250,,,,115
2,tt0001892,Den sorte drøm,1911,1911-08-19,Drama,53,"Germany, Denmark",,5.8,188,,,,,110
3,tt0002101,Cleopatra,1912,1912-11-13,"Drama, History",100,USA,English,5.2,446,$ 45000,,,,109
4,tt0002130,L'Inferno,1911,1911-03-06,"Adventure, Drama, Fantasy",68,Italy,Italian,7.0,2237,,,,,110


# **Installing important dashboard libraries**

In this tutorial we will directly jump into the code of creating a dashboard using dash.

We will not understand the complete nitty gritty of working with dash library.

We will cover it in high level flow only.

Before we start working on creating dashboard with dash library.

We need to do pip install of following two libraries in colab.

In [5]:
!pip install dash
!pip install jupyter-dash

Collecting dash
  Downloading dash-2.0.0-py3-none-any.whl (7.3 MB)
[K     |████████████████████████████████| 7.3 MB 23.5 MB/s 
[?25hCollecting plotly>=5.0.0
  Downloading plotly-5.3.1-py2.py3-none-any.whl (23.9 MB)
[K     |████████████████████████████████| 23.9 MB 13 kB/s 
[?25hCollecting dash-core-components==2.0.0
  Downloading dash_core_components-2.0.0.tar.gz (3.4 kB)
Collecting dash-table==5.0.0
  Downloading dash_table-5.0.0.tar.gz (3.4 kB)
Collecting flask-compress
  Downloading Flask_Compress-1.10.1-py3-none-any.whl (7.9 kB)
Collecting dash-html-components==2.0.0
  Downloading dash_html_components-2.0.0.tar.gz (3.8 kB)
Collecting tenacity>=6.2.0
  Downloading tenacity-8.0.1-py3-none-any.whl (24 kB)
Collecting brotli
  Downloading Brotli-1.0.9-cp37-cp37m-manylinux1_x86_64.whl (357 kB)
[K     |████████████████████████████████| 357 kB 47.8 MB/s 
[?25hBuilding wheels for collected packages: dash-core-components, dash-html-components, dash-table
  Building wheel for dash-core-

Above we have installed two libraries.
* jupyter-dash
* dash

**dash** is the main library we will use to make the final dashboard. We will be making the dashboard on following places.
* Jupyter notebook or colab
* On your local browser using .py script file
* final dashboard deployed online.

In this tutorial we are just focussing on the first way of building the dashboard i.e. on jupyter notebook or colab. We will learn about second way and the last way in the final chapter of our course.

If we have to work with dashboard in the jupyter notebook then we will have to work with **jupyter-dash** library.

# **Dashboard - Building Simple Layout**

Below is the code to create the first simple dashboard using dash. (this dashboard does not contain any graph we are only building the layout of dashboard basically how the dashboard will look like)

In [6]:
# Import libraries
from jupyter_dash import JupyterDash
#import dash_html_components as html
from dash import html
#import dash_core_components as dcc
from dash import dcc
from dash.dependencies import Input, Output
import pandas as pd
import plotly.express as px

# Have the dataset loaded - movies_cleaned is the dataframe

# create a new object for the dash app as shown below
app = JupyterDash(__name__)

# Layout section of dashboard basically how the app would look like
app.layout = html.Div(children=[
    html.H1(children='Imdb Ratings Dashboard'),
    html.H2(children='Year Released'),
    dcc.RangeSlider(
            id='year-released-range-slider',
            min=movies_cleaned.year.min(),
            max=movies_cleaned.year.max(),
            marks={str(y): str(y) for y in range(int(movies_cleaned.year.min()), int(movies_cleaned.year.max()), 5)},
            value=[movies_cleaned.year.min(), movies_cleaned.year.max()]
        )
]
)

# run the app
app.run_server(mode='inline')


<IPython.core.display.Javascript object>

Let's understand each part of the above code one by one.

![dash_simple_code.png](https://drive.google.com/uc?export=view&id=1gVJqlhEDcEKbNvwg9xoAG6TB_a87-v9f)

The syntax part for PART 1,2 and 4 will always remain same for any dashboard we make later on as well.

**PART-1**<br>
In the importing part of code, we are basically importing following libraries and functions
1. first we are importing jupyter-dash (this is an equivalent dash library for making dashboard in jupyter notebook) 
2. next we are importing three dash library functions
  * html (two types of import statement for html are mentioned there - can use anyone)
  * dcc (two types of import statement for dcc are mentioned there - can use anyone)
  * Input and Output
3. Finally we are importing pandas and plotly library.

Our dataframe movies is already loaded so we don't have to do anything on that.

**PART-2**<br>
The only thing that you need to understand at this stage in PART -2 is the fact that the dashboard made using 'dash' or jupyter-dash library is basically stored inside the computer as the variable 'app' defined in PART - 2.

**PART - 4**<br>
In PART-4, the function run_server is used to start the dashboard or basically it will show the dashboard as output.

**PART-3**<br>
Going deeper into the dashboard in PART - 3. Each part of the code of PART-3 determines how the dashboard looks like as shown below.

![dashboard_layout.png](https://drive.google.com/uc?export=view&id=1TMd1UHhevdi9mb7K-KJAWKtZMpKApzGY)

**The blue shaded part of code**<br>
We have basically defined 3 elements of the dashboard here with three items inside the list given to 'children' attribute of function 'html.Div' (html is imported at the top from dash)

The list that is given to children contain following 3 elements:<br>
1. html.H1(children='Imdb Ratings Dashboard')
  * This makes the first heading of the dashboard

2. html.H2(children='Year Released')
  * This makes the second heading of the dashboard

3. Range slider code<br>
This is used to create the 'year' range slider. 

When we slide the bar on year range slider. Internally the code generates two years.
1. year start - basically the initial point of range-slider
2. year end - basicallt the end point of range-slider

The different components of range_slider is defined as below.

In [None]:
# range slider is made using 'dcc' function - this is imported at the top of code
# dcc - dash core components
# dcc.RangeSlider - this creates a range slider UI element in the dashboard
# we have to give 5 attributes to the UI element of dashboard

dcc.RangeSlider(
            ## used for accessing range-slider in later part of code 
            id='year-released-range-slider', 
            ## the minimum value in range-slider, it is min. year value from dataframe
            min=movies_cleaned.year.min(),   
            ## the maximum value in range-slider, it is max. year value from dataframe 
            max=movies_cleaned.year.max(),   
            ## marks is basically a dictionary of all the marks on range slide - made using dictionary comprehension
            ## key of dictionary is the name of mark and value of dictionary is year value
            marks={str(y): str(y) for y in range(int(movies_cleaned.year.min()), int(movies_cleaned.year.max()), 5)},
            ## value is the initial value our code can use when working with range-slider
            ## initial values are year column minimum and maximum
            value=[movies_cleaned.year.min(), movies_cleaned.year.max()]
        )


# **Dashboard - Building Scatter Plot with Year Range as Input**

Now the above code has just created the layout of the dashboard.

We would want at the end of the dashboard is basically a graph which we can interact with i.e. if we change something in dashboard our graph also changes.

Let's suppose we want to create a dashboard which shows the scatter plot of duration and imdb_score for movies selected from a given year range (in this case the year range will be selected using range-slider).

We can improve the above code in the below way, to create the year filtered movie dashboard.

In [7]:
# Import libraries
from jupyter_dash import JupyterDash
#import dash_html_components as html
from dash import html
#import dash_core_components as dcc
from dash import dcc
from dash.dependencies import Input, Output
import pandas as pd
import plotly.express as px

# Have the dataset loaded - movies_cleaned is the dataframe

# create a new object for the dash app as shown below
app = JupyterDash(__name__)

# Layout section of dashboard basically how the app would look like
app.layout = html.Div(children=[
    html.H1(children='Imdb Ratings Dashboard'),
    html.H2(children='Year Released'),
    dcc.RangeSlider(
            id='year-released-range-slider',
            min=movies_cleaned.year.min(),
            max=movies_cleaned.year.max(),
            marks={str(y): str(y) for y in range(int(movies_cleaned.year.min()), int(movies_cleaned.year.max()), 5)},
            value=[movies_cleaned.year.min(), movies_cleaned.year.max()]
        ),
    dcc.Graph(id='rating-duration-graph')
]
)

# add interactivity to the code - basically create a scatter plot
# we will filter out movies based in certain year using range-slider
# then for those movies we will make scatter plot of imdb_score and duration
# Set up the callback function
@app.callback(
    Output(component_id='rating-duration-graph', component_property='figure'),
    [
     Input(component_id='year-released-range-slider', component_property='value')
    ]
)
def update_graph(selected_year):
    year_released_start, year_released_end = selected_year
    
    filtered_df = movies_cleaned.loc[(movies_cleaned['year'] >= year_released_start)&(movies_cleaned['year'] <= year_released_end)]
    
    scatter_fig = px.scatter(filtered_df,
                       x='duration', y='imdb_score',hover_name='original_title',
                       hover_data=['genre','year'],
                       title='Ratings and duration for movies from {} to {}'.format(year_released_start,year_released_end))
    return scatter_fig



# run the app
app.run_server(mode='inline')


<IPython.core.display.Javascript object>

Below image shows the functions of the new elements added to above code.

![dashboard_interactivity_code_flow.png](https://drive.google.com/uc?export=view&id=1ZaYOTpBfvF0G5_F5i-XapJI6zJAdIaIm)

We have basically added 3 new elements to the above code.
1. graph UI element added using dcc.graph
2. @app.callback is a decorator that connects the function update_graph and app layout
3. function update_graph basically draws the scatter plot of duration and imdb_score based on given year range.

Below image shows how @app.callback works. (We are not going in-depth into how a decorator works, but the below image just shows the input and output of decorator @app.callback)

![dashboard_decorator_code_flow.png](https://drive.google.com/uc?export=view&id=1HOdhVNyJEwcaQzC3geemTNWLpaxlw7DK)

Every time we work with a graph in dashboard, we need to define @app.callback. 

@app.callback will always have Input and Output. <br>
Input - it is imported at the top from dash<br>
Output - it is imported at the top from dash <br>

Input and Output will always have a relation with the layout and the below graph making function.


Coming to the scatter plot function 'update_graph'. This function basically takes in input as 2 year values from one variable only - selected_year.

In [8]:
def update_graph(selected_year):
    year_released_start, year_released_end = selected_year
    
    filtered_df = movies_cleaned.loc[(movies_cleaned['year'] >= year_released_start)&(movies_cleaned['year'] <= year_released_end)]
    
    scatter_fig = px.scatter(filtered_df,
                       x='duration', y='imdb_score',hover_name='original_title',
                       hover_data=['genre','year'],
                       title='Ratings and duration for movies from {} to {}'.format(year_released_start,year_released_end))
    return scatter_fig

update_graph((2010,2015)).show()

# **Dashboard - Building Scatter Plot with Genre and Year Range as Input**

In the above code we learnt how to take one Input from the user (i.e. year in our case) and change the final graph output based on that.

Now we are going to add the code to take one more Input from user and change the final graph based on that.

We will add a menu dropdown element to select a genre. 

Now, Based on the selected genre and selected range we will filter out movies. 

Then based on these filtered out movies we will make the scatter plot.

For making the dropdown for all the genres in the dataset,we first need to make the list of genre somewhere.

Below code uses groupby to create the list of unique genre. We also add one more item to this list called 'All Genre'

In [9]:
# create genre list
def convert_genre_list(genre):
  split_genre = genre.split(',')
  remove_spaces_genre_list = [x.strip() for x in split_genre]
  return remove_spaces_genre_list

# for forming the similar dataframe for tomatometer rating we can do so through following steps
list_genre = movies_cleaned[['genre']].copy()
list_genre['genre_list'] = list_genre.apply(lambda row:convert_genre_list(row['genre']),axis=1)
list_genre.drop(['genre'],axis=1,inplace=True)
list_genre_explode = list_genre.explode('genre_list') 
list_genre_groupby = list_genre_explode.groupby('genre_list').size().reset_index().drop([0],axis=1)

# final genre list
genre = list(list_genre_groupby['genre_list'].unique()) + ['All Genre']
print(genre)

['Action', 'Adult', 'Adventure', 'Animation', 'Biography', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Family', 'Fantasy', 'Film-Noir', 'History', 'Horror', 'Music', 'Musical', 'Mystery', 'News', 'Reality-TV', 'Romance', 'Sci-Fi', 'Sport', 'Thriller', 'War', 'Western', 'All Genre']


We will use the above genre list when we are making the dropdown UI element in dashboard layout.

Let's first observe how we have changed the update_graph function. This function is changed to take two inputs as arguments
1. selected_year - basically the year range
2. genre_name - the genre of the movie given for selection

In [13]:
def update_graph(selected_year,genre_name):
    year_released_start, year_released_end = selected_year
    # first filtering is done over year range of movies
    filtered_df1 = movies_cleaned.loc[(movies_cleaned['year'] >= year_released_start)&(movies_cleaned['year'] <= year_released_end)]
    
    #if genre_name is 'All Genre' then the genre_name_select becomes '' (empty string)
    # which means we are selecting all the genres
    # otherwise genre_name_select is basically the argument genre_name given to function
    if genre_name == 'All Genre':
      genre_name_select = ''
    else:
      genre_name_select = genre_name
    
    #selection done over filtered_df1 - movies with a given year range is selected here
    filtered_final = filtered_df1.loc[filtered_df1['genre'].str.contains(genre_name_select)]
    
    scatter_fig = px.scatter(filtered_final,
                       x='duration', y='imdb_score',hover_name='original_title',
                       hover_data=['genre','year'],
                       title='Ratings and duration for movies from {} to {} for genre {}'.format(year_released_start,year_released_end,genre_name))
    return scatter_fig

update_graph((2010,2015),'Drama').show()

Now the below code shows how we take two input viz. year range and genre from dashboard and then build scatter plot on it.

In [14]:
# Import libraries
from jupyter_dash import JupyterDash
#import dash_html_components as html
from dash import html
#import dash_core_components as dcc
from dash import dcc
from dash.dependencies import Input, Output
import pandas as pd
import plotly.express as px

# Have the dataset loaded - movies_cleaned is the dataframe

# create a new object for the dash app as shown below
app = JupyterDash(__name__)

# Layout section of dashboard basically how the app would look like
app.layout = html.Div(children=[
    html.H1(children='Imdb Ratings Dashboard'),
    html.H2(children='Year Released'),
    dcc.RangeSlider(
            id='year-released-range-slider',
            min=movies_cleaned.year.min(),
            max=movies_cleaned.year.max(),
            marks={str(y): str(y) for y in range(int(movies_cleaned.year.min()), int(movies_cleaned.year.max()), 5)},
            value=[movies_cleaned.year.min(), movies_cleaned.year.max()]
        ),
    html.H2(children='Genre'),
    dcc.Dropdown(
        id = 'genre-dropdown',
        options=[{'label':i,'value':i} for i in genre],
        value='All Genre'
    ),
    dcc.Graph(id='rating-duration-graph')
]
)

# add interactivity to the code - basically create a scatter plot
# we will filter out movies based in certain year using range-slider
# then for those movies we will make scatter plot of imdb_score and duration
# Set up the callback function
@app.callback(
    Output(component_id='rating-duration-graph', component_property='figure'),
    [
     Input(component_id='year-released-range-slider', component_property='value'),
     Input(component_id='genre-dropdown',component_property='value')
    ]
)
def update_graph(selected_year,genre_name):
    year_released_start, year_released_end = selected_year
   
    filtered_df1 = movies_cleaned.loc[(movies_cleaned['year'] >= year_released_start)&(movies_cleaned['year'] <= year_released_end)]
    
    if genre_name == 'All Genre':
      genre_name_select = ''
    else:
      genre_name_select = genre_name
    
    filtered_final = filtered_df1.loc[filtered_df1['genre'].str.contains(genre_name_select)]
    
    scatter_fig = px.scatter(filtered_final,
                       x='duration', y='imdb_score',hover_name='original_title',
                       hover_data=['genre','year'],
                       title='Ratings and duration for movies from {} to {} for genre {}'.format(year_released_start,year_released_end,genre_name))
    return scatter_fig


# run the app
app.run_server(mode='inline')


<IPython.core.display.Javascript object>

In the above code we have basically changed two things other than update_graph function that we have already seen.

One is the change in app.layout where we have added following two elements
1. html.H2 - this is the heading for genre dropdown UI element
2. dcc.dropdown - this creates the genre dropdown UI element.

In [None]:
html.H2(children='Genre'),

# create genre dropdown UI element
dcc.Dropdown(
    # name of genre dropdown UI element
    id = 'genre-dropdown',
    # a list of dictionary made using list comprehension of all the genre values that will appear in dropdown
    options=[{'label':i,'value':i} for i in genre],
    # initial default value of the genre dropdown element
    value='All Genre'
),

The other change is in @app.callback where we have added one more Input element to it. This Input element takes input from genre dropdown UI element in app.layout. (you can see the id name is same here as well as app.layout of dropdown)

In [None]:
@app.callback(
    Output(component_id='rating-duration-graph', component_property='figure'),
    [
     Input(component_id='year-released-range-slider', component_property='value'),
     # takes input from genre dropdown
     Input(component_id='genre-dropdown',component_property='value')
    ]
)

Now this finishes our tutorial on dash.