# Welcome to introduction to dashboards with Plotly and Dash

### Workshop facilitators: Laura Gutierrez Funderburk, Hanh Tong

### About this workshop

In this workshop we will explore key characteristics of the housing market in Vancouver, BC. 

It is important to note that this workshop assumes:

1. Data cleaning and exploration was completed prior to developing the dashboard
2. Some comfort with `pandas` and visualization is assumed
3. Comfort navigating the Jupyter environment is needed

### Workshop schedule:

#### 1. Part I: Data exploration

In this section, we will first spend time getting familiar with the data. We will use the `pandas` and `plotly` libraries, we will also explore the `DEX` feature within Noteable to ease getting a good sense for what the data contains.

In this section, we will also explore the notion of factoring code into functions, and the notion of writing a Python script that we can use to easily recreate our results. 

#### 2. Part II: Dashboard components

In this section, we will take what we built together 

In [None]:
!pip install plotly
!pip install pandas_profiling
!pip install scipy==1.5.2

In [None]:
import pandas as pd
import pathlib
import plotly.express as px
import sys
sys.path.append(r'./scripts/')

In [None]:
# Read data
url = 'https://raw.githubusercontent.com/Vancouver-Datajam/dashboard-workshop-dash/main/data/delinquency_mortgage_population_2021_2020.csv'
data_pop_del_mort_df = pd.read_csv(url, index_col=0)
data_pop_del_mort_df.head(10)


## Exercise: Get familiar with the table

1. Run all cells

2. Execute the cell containing the variable `data_pop_del_mort_df`

3. On the bottom right corner, switch from `Table` to `DEX`.

4. Select the `Table` feature. 


#### Questions

a) What are relevant variables in the data?

b)What is the extent (range), mean and median of columns `DelinquencyRate`, `AverageMortgageAmount` and `PopulationSize`?

c) What is the time range and frequency of the data?

In [None]:
data_pop_del_mort_df

## Exercise: Data exploration

1. Execute all cells from top to bottom to read data. 

2. Explore data content in variable data_pop_del_mort_df

3. On the bottom right corner of the cell, switch from Table to DEX to generate visualizations.

4. A user menu will pop up. 

5. Select comparison charts. 

6. Change the X axis to have `AverageMortgageAmount` and the Y axis to have `PopulationSize`.

7. Click on the `Color` menu. Select `Geography. `

8. Change the Y axis to have `DelinquencyRate`.

9. In `Circle Size Metric` change it from `None` to `PopulationSize`.

In [None]:
data_pop_del_mort_df

## Exercise: Data Interpretation

What are interesting insights you find? Use the following questions to guide you:

1. Do you see a trend when comparing the average mortgage amount vs delinquency rate or population size?

2. Are there provinces with specific or interesting patterns with respect to their delinquency rate, average mortgage price or population size?

## Using Python and Plotly to generate interactive plots

In this section we are going to write a few commands to get started with visualizations.

In [None]:
# First attempt
px.line(data_pop_del_mort_df, 
        x = "Time",
        y="DelinquencyRate")


The plot above is quite difficult to read. Let's colour the values by Geography, and add a title.

In [None]:
# Second attempt
px.line(data_pop_del_mort_df, 
        x = "Time",
        y="DelinquencyRate",
       color="Geography",
       title = "Chart: line plot of Time and DelinquencyRate by Geography")


#### Exercise: Let's take a look at the average mortgage amount and population size

Complete the code below to visualize the average mortgage amount. 

Change the code to visualize changes in population size.

In [None]:
variable = 

px.line(data_pop_del_mort_df, 
        x = "Time",
        y=variable,
       color="Geography",
       title = f"Chart: line plot of Time and {variable} by Geography")


Let's take a look at their distribution by using a box plot.

In [None]:
px.box(data_pop_del_mort_df, 
       x = 'Geography', 
       y = 'DelinquencyRate',
      color = 'Geography',
      title  = 'Chart: box plot of Delinquency rate by Geoography.')

#### Exercise: Let's take a look at distribution of average mortgage amount and population size

Complete the code below to visualize the average mortgage amount and population size. 

In [None]:
variable = 
px.box(data_pop_del_mort_df, 
       x = 'Geography', 
       y = variable,
      color = 'Geography',
      title  = f'Chart: box plot of {variable} by Geoography.')

Let's work on a scatter plot to see if there is a relationship between average mortgage amount and delinquency.

In [None]:
px.scatter(data_frame=data_pop_del_mort_df,
          y = "AverageMortgageAmount",
          x = "DelinquencyRate",
          title="Average mortgage rate to delinquency rate")

#### Exercise: modify the code above to colour the dots by Geography, add hover name with Time

In [None]:
px.scatter(data_frame=data_pop_del_mort_df,
      y = "AverageMortgageAmount",
      x = "DelinquencyRate",
      title="Average mortgage rate to delinquency rate",
      color=, 
      hover_name=)

## Using dictionaries to access different kind of functions

We need to do quite a bit of work refactoring our code in preparation for our dashboard.

We will use dictionaries to access different plotting functions.

Recall, a dictionary is a data structure with `keys` and `values`. The syntax of a dictionary is as follows:

    dictionary =  { key1 : value1,
                    key2 : value2,
                    key3 : value3}
                    
Where keys are typically a string, and values can be a data structure such as a string, list, set, tuple, or a function.

In [None]:
sample_dictionary = {"list_numbers" : [1, 2, 3, 4, 5],
                     "set_numbers": set([1, 2, 3, 4, 5]),
                     "tuple_numbers": tuple([1, 2, 3, 4, 5]),
                     "function_sum": sum}

To access the values within a dictionary, we use the following notation

    dictionary[key]
    
For example

In [None]:
sample_dictionary['list_numbers']

In [None]:
sample_dictionary['set_numbers']

In [None]:
sample_dictionary['tuple_numbers']

In [None]:
sample_dictionary['function_sum']

To use the function `sum`, simply pass a list of numbers you want to add.

In [None]:
sum([1,2,3])

We can obtain the same result with our dictionary as follows:

In [None]:
sample_dictionary['function_sum']([1,2,3])

We can use the following dictionary to generate different kinds of plots.

In [None]:
# Dictionary
plot_dict = {'box': px.box,'violin': px.violin, 'scatter': px.scatter, 'line':px.line}

We can then use the dictionary to try different kinds of plots.

In [None]:
plot_dict['scatter'](data_pop_del_mort_df, 
        x = "Time",
        y="DelinquencyRate",
       color="Geography",
       title = "Chart: line plot of Time and DelinquencyRate by Geography")

#### Exercise 1: change the key `scatter` for `line` , `box` and `violin` and run the cell

#### Exercise 2: change the `x` variable to be one of `Geography` or `Time`

#### Exercise 3: Change the `y` variable to be one of `PopulationSize`, `DelinquencyRate` or `AverageMortgageAmount`

In [None]:
plot_dict['violin'](data_pop_del_mort_df, 
        x = "Geography",
        y="DelinquencyRate",
       color="Geography",
       title = "Playing with several kinds of charts")

## Refactoring code into functions

In the next section we will refactor our code to ease reproducibility and also to ensure our Dash app is cleaner. 

In [None]:
# a function for box or violin graphs
def graph_region(region_df, graph_type: str, dimension1, dimension2):
    """
    region_df: reshaped data frame object with mortage data
    graph_type: "box", "violin"
    title: title of the graph
    """
    
    plot_dict = {'box': px.box,'violin': px.violin, 'scatter': px.scatter, 'line':px.line}
        
    try:
        fig = plot_dict[graph_type](region_df, 
                                    x=dimension1, 
                                    y=dimension2, 
                                    color = "Geography",
                                   hover_name = "Time")
        title_string = f'Chart: {graph_type} plot of {dimension1} and {dimension2} by Geography'
        fig.update_layout(title = title_string)
        fig.update_xaxes(tickangle=-45)
        fig.show()
    
    except KeyError:
        print("Key not found. Make sure that 'graph_type' is in ['box','violin', 'scatter', 'line']")
    except ValueError:
        print("Dimension is not valid. Try one of  'Time', 'AverageMortgageAmount', 'PopulationSize', 'DelinquencyRate'")

In [None]:
graph_region(data_pop_del_mort_df, 'line', "Time", "AverageMortgageAmount")

In [None]:
graph_region(data_pop_del_mort_df, 'box', "Geography", "DelinquencyRate")

In [None]:
graph_region(data_pop_del_mort_df, 'scatter', "AverageMortgageAmount", "DelinquencyRate")