# [Local Consumer Commerce](https://www.jpmorganchase.com/corporate/institute/lcc-index.htm)

In the JPMorgan Chase Institute, we use billions of transactions to explore and analyze the financial decisions of households, firms, and market actors. Among the analyses we perform is the measurement of local commercial activity across 14 metro areas.  We refer to this activity as [Local Consumer Commerce](https://www.jpmorganchase.com/corporate/institute/lcc-faqs.htm).  It is local because the lens captures activity that occurs within the metro areas we track. It's described as "consumer" because it captures (predominantly) the purchasing decisions of end-user consumers. 

These data provide a public, a freely available economic lens of unprecedented spatial and temporal granularity. They enable measurement of an important swath of economic activity inside each metro in each month. Consequently, they will be of use to you in your final capstone exercise, in which you are attempting to argue for Amazon HQ2 in your chosen city. This Notebook will provide a way to interactively explore the LCC [data file](https://www.jpmorganchase.com/corporate/institute/document/lcc_fulldata.zip) we make available on the website.

In [1]:
# Data manipulation
import numpy as np
import pandas as pd
# Data visualization
# import bokeh.plotting as bp
# import bokeh.models as bm
# import bokeh.core.properties as bcp
# import bokeh.io as bio
import plotly as pl
import plotly.graph_objs as pgo
%pylab inline
# Interactive widget functionality
from IPython.display import display, HTML
from ipywidgets import widgets, Layout, Box, VBox

# Set paths
fig_dir = '../figs/'
data_dir = '../data/'

# Refresh data flag
refresh_data = False

# Display result in an iframe
def show_iframe(url, height=400, width=1000):
    display_string = '<iframe src={url} width={w} height={h}></iframe>'.format(url=url, w=width, h=height)
    print(display_string)
    return HTML(display_string)

Populating the interactive namespace from numpy and matplotlib


## Data Acquisition

As indicated above, the [data file](https://www.jpmorganchase.com/corporate/institute/document/lcc_fulldata.zip) is on the website. We can acquire and unzip the file with shell commands.  Note that we will preserve the option to not update the data with the most recent file via the `refresh_data` flag.

In [12]:
!pwd

/home/choct155/projects/telling_stories_with_data/examples/lcc/src


In [14]:
man ls

In [3]:
if refresh_data:
    !wget -O ../data/lcc_fulldata.zip https://www.jpmorganchase.com/corporate/institute/document/lcc_fulldata.zip
    !unzip -d ../data/ ../data/lcc_fulldata.zip

In [16]:
! head -10 ../data/lcc_fulldata.csv

periodid,area_name,dimname,category,growth_rate,spend_share,growth_contribution
201310,National,age,<25,0.1874,0.1008,0.0170
201310,National,age,25-34,0.0878,0.1492,0.0129
201310,National,age,35-44,0.0605,0.1978,0.0120
201310,National,age,45-54,0.0489,0.2128,0.0106
201310,National,age,55-64,0.0440,0.1676,0.0075
201310,National,age,65+,0.0435,0.1719,0.0076
201311,National,age,<25,0.1716,0.0997,0.0154
201311,National,age,25-34,0.0908,0.1489,0.0131
201311,National,age,35-44,0.0607,0.1985,0.0120


In [15]:
!ls ../data

lcc_fulldata.csv  lcc_fulldata.zip


## Prepare Data

In [23]:
sorted(set(lcc['dimname']))

['Total', 'age', 'bizsize', 'income', 'location', 'product']

In [17]:
# Read in data
lcc = pd.read_csv('../data/lcc_fulldata.csv')

# Convert months to pandas periods
lcc['month'] = lcc['periodid'].apply(lambda x: pd.Period(x, freq='M'))

# Map income categories to quintiles
int_dict = {
    '<21': 'q1',
    '21-40': 'q2',
    '41-60': 'q3',
    '61-80': 'q4',
    '81-100': 'q5'
}
lcc['category'] = lcc['category'].replace(int_dict)

# Capture category labels for each dimension in the right order
dim_labs = {
    'age': ['<25', '25-34', '35-44', '45-54', '55-64', '65+'],
    'income': ['q1', 'q2', 'q3', 'q4', 'q5'],
    'bizsize': ['SMALL', 'MEDIUM', 'LARGE'],
    'location': ['Same Neighborhood', 'Same Region', 'Different Region'],
    'product': ['Durables', 'Fuel', 'Nondurables', 'Other Services', 'Restaurants']
}

# Construct lists of areas, dimensions and measures
areas = sorted(set(lcc['area_name']))
dims = sorted(set(lcc['dimname']))
measures = ['growth_rate', 'spend_share', 'growth_contribution']

# Provide integer representation of months
months = sorted(set(lcc['month']))
months_int = range(len(months))
months_dict = dict(zip(months, months_int))
lcc['month_int'] = lcc['month'].map(months_dict)

# Define LCC palette
lcc_colors = ['#00a0dd', '#a2dadb', '#bbd976', '#ffe18b', '#fbaf5d', '#f57f32']

lcc.head()

Unnamed: 0,periodid,area_name,dimname,category,growth_rate,spend_share,growth_contribution,month,month_int
0,201310,National,age,<25,0.1874,0.1008,0.017,2013-10,0
1,201310,National,age,25-34,0.0878,0.1492,0.0129,2013-10,0
2,201310,National,age,35-44,0.0605,0.1978,0.012,2013-10,0
3,201310,National,age,45-54,0.0489,0.2128,0.0106,2013-10,0
4,201310,National,age,55-64,0.044,0.1676,0.0075,2013-10,0


With our data in hand, it would be nice to visualize the components that we want.  Before we get to interactivity, let's just create static views of the content first.  The first step is to be able to isolate the subset of the data we want.  In particular, we want to be able to capture the information for the subset that corresponds to a particular area, dimension, and growth measure.

In [30]:
lcc[lcc['area_name'] == 'National'].set_index(['month', 'category'])['growth_contribution'].unstack('category')

category,nan,25-34,35-44,45-54,55-64,65+,<25,Different Region,Durables,Fuel,...,Other Services,Restaurants,SMALL,Same Neighborhood,Same Region,q1,q2,q3,q4,q5
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2013-10,,0.0129,0.012,0.0106,0.0075,0.0076,0.017,-0.0034,0.0105,-0.0068,...,0.0212,0.0188,0.0316,0.0358,0.0353,0.0121,0.0105,0.0105,0.0121,0.0225
2013-11,,0.0131,0.012,0.0095,0.0056,0.0021,0.0154,-0.0051,0.0029,-0.0006,...,0.014,0.019,0.0174,0.0376,0.0255,0.0104,0.0096,0.0094,0.0099,0.0185
2013-12,,0.0046,0.0018,-0.0008,-0.0012,0.0,0.0114,-0.0067,-0.0067,0.0017,...,0.0134,0.007,0.0007,0.0291,-0.0065,0.0052,0.0023,0.0001,0.0004,0.0077
2014-01,,0.0097,0.0068,0.0036,0.0006,-0.0013,0.0127,-0.0082,-0.0047,0.0023,...,0.0119,0.0128,0.0072,0.0349,0.0056,0.0088,0.0067,0.0047,0.0037,0.0083
2014-02,,0.0124,0.0078,0.0047,0.0027,-0.0015,0.0135,-0.0058,-0.0015,-0.004,...,0.0142,0.0151,0.0077,0.0323,0.0133,0.0133,0.0092,0.006,0.0044,0.0069
2014-03,,0.0094,0.0065,0.0036,0.0018,-0.0012,0.0114,-0.0058,-0.0027,0.0011,...,0.0152,0.0137,0.0033,0.0329,0.0046,0.009,0.006,0.0046,0.0043,0.0076
2014-04,,0.0116,0.0103,0.0101,0.0086,0.0052,0.012,-0.0032,-0.0043,0.0083,...,0.0111,0.0165,0.0104,0.0375,0.0235,0.0129,0.011,0.0105,0.0105,0.013
2014-05,,0.0128,0.0112,0.0109,0.0068,0.0009,0.0121,-0.0043,-0.0031,0.0083,...,0.0078,0.0194,0.0091,0.0353,0.0237,0.0115,0.0106,0.0101,0.0097,0.0128
2014-06,,0.0067,0.0044,0.0049,0.002,0.0004,0.0087,-0.0048,-0.0078,0.0045,...,0.0098,0.0117,0.0006,0.0242,0.0078,0.0073,0.0051,0.0038,0.0037,0.0074
2014-07,,0.0103,0.0079,0.0081,0.005,0.0015,0.0114,-0.0022,-0.0052,0.0046,...,0.01,0.0172,0.0053,0.0271,0.0193,0.0118,0.0094,0.008,0.0071,0.0079


In [31]:
def lcc_sub(area, dim, measure, df=lcc):
    # Define area and dimension subset conditions
    a = (df['area_name'] == area)
    d = (df['dimname'] == dim)
    # Subset data
    df_sub = df[a & d][['month', 'category', measure]]
    # Set month and category to the index
    df_sub.set_index(['month', 'category'], inplace=True)
    # Unstack category (aka - convert from long to wide format)
    df_sub = df_sub.unstack('category') * 100
    # Drop multiindex columns in favor of category names
    df_sub.columns = [c[1] for c in df_sub.columns]
    # If we are look at anything but the total growth rate, fix label order
    if dim != 'Total':
        df_sub = df_sub[dim_labs[dim]]
    return df_sub.reset_index()

lcc_sub('Chicago, IL - Metro Area', 'age', 'growth_contribution').head()

Unnamed: 0,month,<25,25-34,35-44,45-54,55-64,65+
0,2013-10,1.55,1.14,1.06,0.69,0.19,0.19
1,2013-11,1.48,1.24,1.21,0.86,0.26,-0.1
2,2013-12,1.19,0.48,0.29,0.1,-0.22,-0.12
3,2014-01,1.14,0.78,0.62,0.36,-0.2,-0.58
4,2014-02,1.18,0.96,0.67,0.37,0.13,-0.31


Now we need to plot these data.  If we are looking at growth contributions, we want a stacked bar chart.  If we are looking at growth rates, we want a line chart.  So, we need to define two functions to handle these scenarios.

In [32]:
# def plot_sub_bar(area, dim, measure, df=lcc, colors=lcc_colors, out_file='../figs/lcc_stacked_bar.html'):
#     # Define location of the output file
#     bp.output_file(out_file)
#     # Capture relevant subset (note how we embed our first function)
#     df_sub = lcc_sub(area, dim, measure, df)
#     # Cast subset as a ColumnDataSource
#     source = bm.ColumnDataSource(df_sub)
#     # Create figure
#     fig = bp.figure(x_range=(list(months_int)[0], list(months_int)[-1]), y_range=(-6, 6), plot_width=900, plot_height=500, 
#                     title='{m} by {d} in {a}'.format(m=measure, d=dim, a=area))
#     # Generate vertical stacked chart
#     fig.vbar_stack(dim_labs[dim], x='month_int', width=0.9, source=source, 
#                    color=colors, legend=[bcp.value(d) for d in dim_labs[dim]])
#     # Save file
#     bp.save(fig)
#     return out_file
    
# stack_bar = plot_sub_bar('Chicago, IL - Metro Area', 'age', 'growth_contribution')

# show_iframe(stack_bar, height=550, width=950)

In [37]:
df_sub = lcc_sub('Chicago, IL - Metro Area', 'age', 'growth_contribution')
df_sub['month_str'] = df_sub['month'].apply(lambda x: x.strftime("%m-%Y"))
df_sub.head()

Unnamed: 0,month,<25,25-34,35-44,45-54,55-64,65+,month_str
0,2013-10,1.55,1.14,1.06,0.69,0.19,0.19,10-2013
1,2013-11,1.48,1.24,1.21,0.86,0.26,-0.1,11-2013
2,2013-12,1.19,0.48,0.29,0.1,-0.22,-0.12,12-2013
3,2014-01,1.14,0.78,0.62,0.36,-0.2,-0.58,01-2014
4,2014-02,1.18,0.96,0.67,0.37,0.13,-0.31,02-2014


In [38]:
for i,d in enumerate(dim_labs['age']):
    print(d)
    tmp_bar = pgo.Bar(x=df_sub['month_str'], y=df_sub[d], name=d, marker=dict(color=colors[i]))
    

<25


TypeError: 'function' object is not subscriptable

In [40]:
[l.upper() for l in 'abcde']

['A', 'B', 'C', 'D', 'E']

In [7]:
def plot_sub_bar(area, dim, measure, df=lcc, colors=lcc_colors, out_file='../figs/lcc_stacked_bar.html'):
    # Capture relevant subset (note how we embed our first function)
    df_sub = lcc_sub(area, dim, measure, df)
    # Convert months to string
    df_sub['month_str'] = df_sub['month'].apply(lambda x: x.strftime("%m-%Y"))
    # Generate data traces for plotly
    data = [pgo.Bar(x=df_sub['month_str'], y=df_sub[d], name=d, marker=dict(color=colors[i])) for i,d in enumerate(dim_labs[dim])]
    # Define plotly layout
    layout = pgo.Layout(barmode='stack', title='{m} in {a} by {d}'.format(a=area, d=dim, m=measure),  yaxis=dict(title='Year-over-Year Growth Contributions (pp)'))
    # Define map figure
    stack_fig = {
        'data': data,
        'layout': layout
    }
    return stack_fig
    
stack_fig = plot_sub_bar('Chicago, IL - Metro Area', 'age', 'growth_contribution')

# Plot data
pl.plotly.iplot(stack_fig, filename='lcc_stacked_bar')

In [8]:
def plot_sub_line(area, dim, measure, df=lcc, colors=lcc_colors, out_file='../figs/lcc_line.html'):
    # Capture relevant subset (note how we embed our first function)
    df_sub = lcc_sub(area, dim, measure, df)
    # Convert months to string
    df_sub['month_str'] = df_sub['month'].apply(lambda x: x.strftime("%m-%Y"))
    # Generate data traces for plotly
    data = [pgo.Scatter(x=df_sub['month_str'], y=df_sub[d], name=d, line=dict(color=colors[i])) for i,d in enumerate(dim_labs[dim])]
    # Define plotly layout
    layout = pgo.Layout(title='{m} in {a} by {d}'.format(a=area, d=dim, m=measure), yaxis=dict(title='Year-over-Year Growth Rate (%)'))
    # Define map figure
    line_fig = {
        'data': data,
        'layout': layout
    }
    return line_fig
    
line_fig = plot_sub_line('Chicago, IL - Metro Area', 'age', 'growth_contribution')

# Plot data
pl.plotly.iplot(line_fig, filename='lcc_line')

## Prepare Interactive Widgets

To enable interactive exploration of the data, we are going to employ [IPython widgets](https://ipywidgets.readthedocs.io/en/latest/). These allow the user to change the content of the desired chart.

In [9]:
area_select = widgets.Dropdown(
                options = areas,
                value = 'National',
                description = 'Area:',
                disabled = False)

compare_select = widgets.SelectMultiple(
                    options = areas,
                    value = ['Atlanta, GA - Metro Area'],
                    description = 'Comparison Areas:',
                    disabled = False,
                    rows=5)

dim_select = widgets.Dropdown(
                options = dims,
                value = 'age',
                description = 'Dimension:',
                disabled = False)

meas_select = widgets.Dropdown(
                options = measures,
                value = 'growth_contribution',
                description = 'Measure:',
                disabled = False)


# Construct composite box to hold all selectors
selections = [area_select, dim_select, meas_select]

primary_layout = Layout(display='flex', flex_flow='columns', justify_content='flex-start', align_items='stretch')

primary_select_box = VBox(children=selections, layout=primary_layout)
 
secondary_layout = Layout(display='flex', flex_flow='columns', justify_content='flex-start',
                          align_items='stretch')

compare_select_box = Box(children=[compare_select],
                         layout=secondary_layout)

select_box = Box(children=[primary_select_box, compare_select_box], layout=primary_layout)

display(select_box)

In [41]:
line_fig = plot_sub_line(area_select.value, dim_select.value, meas_select.value)

# Plot data
pl.plotly.iplot(line_fig, filename='lcc_line')