<font size="+3"><strong>Interactive Dashboard</strong></font>

Builing a model based on the highest-variance features in our dataset and creating several visualizations to communicate our results. In this project, I'm going to combine all of these elements into a dynamic web application that will allow users to choose their own features, build a model, and evaluate its performance through a graphic user interface. In other words, I'll create a tool that will allow anyone to build a model without code.

In [4]:
!pip install dash


Collecting dash
  Downloading dash-2.13.0-py3-none-any.whl (10.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.4/10.4 MB[0m [31m82.5 MB/s[0m eta [36m0:00:00[0m
Collecting Werkzeug<2.3.0 (from dash)
  Downloading Werkzeug-2.2.3-py3-none-any.whl (233 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.6/233.6 kB[0m [31m25.7 MB/s[0m eta [36m0:00:00[0m
Collecting dash-html-components==2.0.0 (from dash)
  Downloading dash_html_components-2.0.0-py3-none-any.whl (4.1 kB)
Collecting dash-core-components==2.0.0 (from dash)
  Downloading dash_core_components-2.0.0-py3-none-any.whl (3.8 kB)
Collecting dash-table==5.0.0 (from dash)
  Downloading dash_table-5.0.0-py3-none-any.whl (3.9 kB)
Collecting retrying (from dash)
  Downloading retrying-1.3.4-py3-none-any.whl (11 kB)
Collecting ansi2html (from dash)
  Downloading ansi2html-1.8.0-py3-none-any.whl (16 kB)
Installing collected packages: dash-table, dash-html-components, dash-core-components, W

In [6]:
!pip install jupyter_dash

Collecting jupyter_dash
  Downloading jupyter_dash-0.4.2-py3-none-any.whl (23 kB)
Collecting jedi>=0.16 (from ipython->jupyter_dash)
  Downloading jedi-0.19.0-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m12.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: jedi, jupyter_dash
Successfully installed jedi-0.19.0 jupyter_dash-0.4.2


In [7]:

import pandas as pd
import plotly.express as px
from dash import Input, Output, dcc, html
from IPython.display import VimeoVideo
from jupyter_dash import JupyterDash
from scipy.stats.mstats import trimmed_var
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.metrics import silhouette_score
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler


JupyterDash.infer_jupyter_proxy_config()

# Prepare Data

## Import

In [1]:
def wrangle(filepath):

    """Read SCF data file into ``DataFrame``.

    Returns only credit fearful households whose net worth is less than $2 million.

    Parameters
    ----------
    filepath : str
        Location of CSV file.
    """

    df = pd.read_csv(filepath)

    mask = (df['TURNFEAR']==1) & (df['NETWORTH']<2e6)
    df = df[mask]

    return df

In [9]:
df = wrangle("SCFP2019.csv")

print(df.shape)
df.head()

(385, 351)


Unnamed: 0,YY1,Y1,WGT,HHSEX,AGE,AGECL,EDUC,EDCL,MARRIED,KIDS,...,NWCAT,INCCAT,ASSETCAT,NINCCAT,NINC2CAT,NWPCTLECAT,INCPCTLECAT,NINCPCTLECAT,INCQRTCAT,NINCQRTCAT
5,2,21,3790.476607,1,50,3,8,2,1,3,...,1.0,2.0,1.0,2.0,1.0,1.0,4.0,4.0,2.0,2.0
6,2,22,3798.868505,1,50,3,8,2,1,3,...,1.0,2.0,1.0,2.0,1.0,1.0,4.0,3.0,2.0,2.0
7,2,23,3799.468393,1,50,3,8,2,1,3,...,1.0,2.0,1.0,2.0,1.0,1.0,4.0,4.0,2.0,2.0
8,2,24,3788.076005,1,50,3,8,2,1,3,...,1.0,2.0,1.0,2.0,1.0,1.0,4.0,4.0,2.0,2.0
9,2,25,3793.066589,1,50,3,8,2,1,3,...,1.0,2.0,1.0,2.0,1.0,1.0,4.0,4.0,2.0,2.0


# Build Dashboard

## Application Layout

In [10]:
app = JupyterDash(__name__)


JupyterDash is deprecated, use Dash instead.
See https://dash.plotly.com/dash-in-jupyter for more details.



## Variance Bar Chart

A `get_high_var_features` function that returns the five highest-variance features in a DataFrame. Use the docstring for guidance.

In [11]:
def get_high_var_features(trimmed=True, return_feat_names=False):
    """Returns the five highest-variance features of ``df``.

    Parameters
    ----------
    trimmed : bool, default=True
        If ``True``, calculates trimmed variance, removing bottom and top 10%
        of observations.

    return_feat_names : bool, default=False
        If ``True``, returns feature names as a ``list``. If ``False``
        returns ``Series``, where index is feature names and values are
        variances.
    """
    if trimmed:
        top_five_features=(
            df.apply(trimmed_var).sort_values().tail(5)
        )

    else:
        top_five_features=df.var().sort_values().tail(5)

    if return_feat_names:
        top_five_features=top_five_features.index.tolist()

    return top_five_features

In [12]:
get_high_var_features()

NETWORTH    3.356085e+09
DEBT        4.111453e+09
HOUSES      4.804796e+09
NFIN        1.095386e+10
ASSET       1.326330e+10
dtype: float64

A `serve_bar_chart` function that returns a plotly express bar chart of the five highest-variance features. We should use `get_high_var_features` as a helper function.

In [13]:
@app.callback(
    Output('bar-chart', 'figure'),
    Input('trim-button', 'value')
)
def serve_bar_chart(trimmed=True):

    """Returns a horizontal bar chart of five highest-variance features.

    Parameters
    ----------
    trimmed : bool, default=True
        If ``True``, calculates trimmed variance, removing bottom and top 10%
        of observations.
    """

    top_five_features = get_high_var_features(trimmed=trimmed, return_feat_names=False)
    fig = px.bar(x=top_five_features, y=top_five_features.index, orientation='h')
    fig.update_layout(xaxis_title='Variance', yaxis_title='Feature')

    return fig

In [14]:
serve_bar_chart()

## K-means Slider and Metrics

A `get_model_metrics` function that builds, trains, and evaluates `KMeans` model.

In [15]:
def get_model_metrics(trimmed=True, k=2, return_metrics=False):

    """Build ``KMeans`` model based on five highest-variance features in ``df``.

    Parameters
    ----------
    trimmed : bool, default=True
        If ``True``, calculates trimmed variance, removing bottom and top 10%
        of observations.

    k : int, default=2
        Number of clusters.

    return_metrics : bool, default=False
        If ``False`` returns ``KMeans`` model. If ``True`` returns ``dict``
        with inertia and silhouette score.

    """

    features=get_high_var_features(trimmed=trimmed, return_feat_names=True)
    X=df[features]
    model = make_pipeline(StandardScaler(), KMeans(n_clusters=k, random_state=42))
    model.fit(X)

    if return_metrics:
        i=model.named_steps['kmeans'].inertia_
        ss=silhouette_score(X, model.named_steps['kmeans'].labels_)
        metrics={
            'inertia': round(i),
            'silhouette': round(ss, 3)
        }

        return metrics

    return model

In [16]:
get_model_metrics(return_metrics=True)





{'inertia': 1015, 'silhouette': 0.727}

In [17]:
@app.callback(
    Output('metrics', 'children'),
    Input('trim-button', 'value'),
    Input('k-slider', 'value')
)
def serve_metrics(trimmed=True, k=2):

    """Returns list of ``H3`` elements containing inertia and silhouette score
    for ``KMeans`` model.

    Parameters
    ----------
    trimmed : bool, default=True
        If ``True``, calculates trimmed variance, removing bottom and top 10%
        of observations.

    k : int, default=2
        Number of clusters.
    """

    metrics  =get_model_metrics(trimmed=trimmed, k=k, return_metrics=True)
    text=[
        html.H3(f"Inertia: {metrics['inertia']}"),
        html.H3(f"Silhouette Score: {metrics['silhouette']}")
    ]

    return text

In [18]:
serve_metrics()





[H3('Inertia: 1015'), H3('Silhouette Score: 0.727')]

## PCA Scatter Plot

We just made a slider that can change the inertia and silhouette scores, but not everyone will be able to understand what those changing numbers mean.

In [19]:
def get_pca_labels(trimmed=True, k=2):

    """
    ``KMeans`` labels.

    Parameters
    ----------
    trimmed : bool, default=True
        If ``True``, calculates trimmed variance, removing bottom and top 10%
        of observations.

    k : int, default=2
        Number of clusters.
    """

    features = get_high_var_features(trimmed=trimmed, return_feat_names=True)
    X = df[features]
    pca = PCA(n_components=2, random_state=42).fit_transform(X)
    X_pca = pd.DataFrame(pca, columns=['PC1', 'PC2'])

    model = get_model_metrics(trimmed=trimmed, k=k, return_metrics=False)
    X_pca['labels'] =model.named_steps['kmeans'].labels_.astype(str)
    X_pca.sort_values('labels', inplace=True)

    return X_pca

In [20]:
@app.callback(
    Output('pca-scatter', 'figure'),
    Input('trim-button', 'value'),
    Input('k-slider', 'value')
)
def serve_scatter_plot(trimmed=True, k=2):

    """Build 2D scatter plot of ``df`` with ``KMeans`` labels.

    Parameters
    ----------
    trimmed : bool, default=True
        If ``True``, calculates trimmed variance, removing bottom and top 10%
        of observations.

    k : int, default=2
        Number of clusters.
    """
    X_pca = get_pca_labels(trimmed=trimmed, k=k)
    fig=px.scatter(data_frame=X_pca,
                   x='PC1',
                   y='PC2',
                   color='labels',
                   title='PCA Representation of Clusters')
    fig.update_layout(xaxis_title='PC1', yaxis_title='PC2')

    return fig

## Application Deployment

In [21]:
app.layout = html.Div(
    [
        html.H1('Survey of Consumer Finances'),
        html.H2("High Variance Features"),
        dcc.Graph(figure=serve_bar_chart(), id='bar-chart'),
        dcc.RadioItems(
            options=[
                {'label': 'trimmed', 'value': True},
                {'label': 'not trimmed', 'value': False}
            ],
            value=True,
            id='trim-button'
        ),
        html.H3('K-means Clusetring'),
        html.H4('Number of Clusters (k)'),
        dcc.Slider(min=2, max=12, step=1, value=2, id='k-slider'),
        html.Div(id='metrics'),
        dcc.Graph(figure=serve_scatter_plot(), id='pca-scatter')
    ]
)





In [22]:
app.run_server(host="0.0.0.0", mode="external")

<IPython.core.display.Javascript object>

Dash app running on:


<IPython.core.display.Javascript object>