# [Local Consumer Commerce](https://www.jpmorganchase.com/corporate/institute/lcc-index.htm)

In the [JPMorgan Chase Institute](https://www.jpmorganchase.com/corporate/institute/institute.htm), we use billions of transactions to explore and analyze the financial decisions of households, firms, and market actors. Among the analyses we perform is the measurement of local commercial activity across 14 metro areas. We refer to this activity as Local Consumer Commerce. This Notebook will provide a way to interactively explore the LCC data file we make available on the website.

In [26]:
# Data manipulation
import numpy as np
import pandas as pd
# Data visualization
import plotly as pl
import plotly.graph_objs as pgo
# Credentials
from getpass import getpass

# Set paths
out_dir = '../../out/'
data_dir = '../../data/'

# Refresh data flag
refresh_data = True

Since we are using [plotly](https://plot.ly/) in this demo, we need to establish our [credentials](https://plot.ly/python/getting-started/). Note that your API key must be retrieved from your [API Settings](https://plot.ly/settings/api#/) page in your plotly account. You have to login to the plotly site to get there and generate a key.

In [50]:
user = 'marvinward'
pw = getpass('Enter Password: ')

pl.tools.set_credentials_file(username=user, api_key=pw)

Enter Password: ········


## Data Acqusition

The [data file](https://www.jpmorganchase.com/corporate/institute/document/lcc_fulldata.zip) is located on the website. We can acquire said file and unzip it with shell commands.

In [18]:
if refresh_data:
    !wget -O {data_dir}lcc_fulldata.zip https://www.jpmorganchase.com/corporate/institute/document/lcc_fulldata.zip
    !unzip -d {data_dir} {data_dir}lcc_fulldata.zip

--2018-07-28 11:04:45--  https://www.jpmorganchase.com/corporate/institute/document/lcc_fulldata.zip
Resolving www.jpmorganchase.com (www.jpmorganchase.com)... 159.53.116.46
Connecting to www.jpmorganchase.com (www.jpmorganchase.com)|159.53.116.46|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 204321 (200K) [application/zip]
Saving to: ‘../../data/lcc_fulldata.zip’


2018-07-28 11:04:46 (1.31 MB/s) - ‘../../data/lcc_fulldata.zip’ saved [204321/204321]

Archive:  ../../data/lcc_fulldata.zip
  inflating: ../../data/lcc_fulldata.csv  


In [20]:
!head -5 {data_dir}lcc_fulldata.csv

periodid,area_name,dimname,category,growth_rate,spend_share,growth_contribution
201310,National,age,<25,0.1882,0.1011,0.0171
201310,National,age,25-34,0.0879,0.1491,0.0129
201310,National,age,35-44,0.0605,0.1978,0.012
201310,National,age,45-54,0.049,0.2128,0.0106


## Data Preparation

To ease manipulation of the data, we can capture some relevant information and convert periodids to pandas periods. The latter makes it easier to deal with time series operations.

In [21]:
# Read in data
lcc = pd.read_csv(data_dir + 'lcc_fulldata.csv')

# Convert months to pandas periods
lcc['month'] = lcc['periodid'].apply(lambda x: pd.Period(x, freq='M'))

# Map income categories to quintiles
int_dict = {
    '<21': 'q1',
    '21-40': 'q2',
    '41-60': 'q3',
    '61-80': 'q4',
    '81-100': 'q5'
}
lcc['category'] = lcc['category'].replace(int_dict)

# Capture category labels for each dimension in the right order
dim_labs = {
    'age': ['<25', '25-34', '35-44', '45-54', '55-64', '65+'],
    'income': ['q1', 'q2', 'q3', 'q4', 'q5'],
    'bizsize': ['SMALL', 'MEDIUM', 'LARGE'],
    'location': ['Same Neighborhood', 'Same Region', 'Different Region'],
    'product': ['Durables', 'Fuel', 'Nondurables', 'Other Services', 'Restaurants']
}

# Construct lists of areas, dimensions and measures
areas = sorted(set(lcc['area_name']))
dims = sorted(set(lcc['dimname']))
measures = ['growth_rate', 'spend_share', 'growth_contribution']

# Define LCC palette
lcc_colors = ['#00a0dd', '#a2dadb', '#bbd976', '#ffe18b', '#fbaf5d', '#f57f32']

lcc.head()

Unnamed: 0,periodid,area_name,dimname,category,growth_rate,spend_share,growth_contribution,month
0,201310,National,age,<25,0.1882,0.1011,0.0171,2013-10
1,201310,National,age,25-34,0.0879,0.1491,0.0129,2013-10
2,201310,National,age,35-44,0.0605,0.1978,0.012,2013-10
3,201310,National,age,45-54,0.049,0.2128,0.0106,2013-10
4,201310,National,age,55-64,0.044,0.1675,0.0075,2013-10


It might be helpful to craft a function that makes it easy to grab specific subsets we can easily display.

In [25]:
def get_lcc_sub(area, dim, measure, df=lcc):
    # Define area and dimension subset conditions
    a = (df['area_name'] == area)
    d = (df['dimname'] == dim)
    # Subset data
    df_sub = df[a & d][['month', 'category', measure]]
    # Set month and category to the index
    df_sub.set_index(['month', 'category'], inplace=True)
    # Unstack category (aka - convert from long to wide format)
    df_sub = df_sub.unstack('category') * 100
    # Drop multiindex columns in favor of category names
    df_sub.columns = [c[1] for c in df_sub.columns]
    # If we are look at anything but the total growth rate, fix label order
    if dim != 'Total':
        df_sub = df_sub[dim_labs[dim]]
    return df_sub.reset_index()

get_lcc_sub('Chicago, IL - Metro Area', 'age', 'growth_contribution').head()

Unnamed: 0,month,<25,25-34,35-44,45-54,55-64,65+
0,2013-10,1.56,1.13,1.06,0.7,0.19,0.19
1,2013-11,1.49,1.23,1.21,0.87,0.27,-0.1
2,2013-12,1.19,0.48,0.29,0.1,-0.22,-0.12
3,2014-01,1.13,0.79,0.63,0.36,-0.2,-0.58
4,2014-02,1.17,0.96,0.67,0.38,0.13,-0.3


## Plotting LCCI Data

### Helper Functions

Our goal is to leverage the plotly API to generate charts that will land in plotly [Chart Studio](https://plot.ly/online-chart-maker/), ostensibly making them available for non-programming folk to modify as needed. Construction of plotly charts involves some generic operations (e.g. getting data traces), so helper functions can be useful here. Since they will generally have common inputs and hang together in the creation of a chart, will just house them in a small class.

In [88]:
class plotlyChartGenerator:
    
    def __init__(self, a, d, m):
        # Parametric inputs
        self.area = a
        self.dim = d
        self.measure = m
        # Default values (which can be reset)
        self.df = lcc
        self.min_per = lcc.month.min()
        self.max_per = lcc.month.max()
        self.colors = lcc_colors
        
    def to_string(self, n=5):
        '''
        Method returns a string representation of attribute states and the input data.
        '''
        attr_dict = {
            'Area': self.area,
            'Dimension': self.dim,
            'Measure': self.measure,
            'Minimum Period': self.min_per,
            'Maximum Period': self.max_per,
            'Color Palette': self.colors
        }
        s = 'plotlyChartGenerator(\n'
        for attr in attr_dict:
            s += '\t{k} = {v}\n'.format(k=attr, v=attr_dict[attr])
        s += ')\n\nInput Data Sample:'
        print(s)
        df_sub = get_lcc_sub(self.area, self.dim, self.measure, self.df)
        print(df_sub.head(n))
       

    def get_traces(self, plot_func):
        '''
        Method returns a list of traces that can be paired with a layout for plotly charts.
        
        Note that this method leverages higher-lever function capability. One must choose the kind of
        trace they seek (e.g. Scatter or Bar). Right now this is flimsily built to only support a 
        couple chart types
        '''
        # Grab relevant subset
        df_sub = get_lcc_sub(self.area, self.dim, self.measure, self.df)
        df_sub = df_sub[(df_sub['month'] >= self.min_per) & (df_sub['month'] <= self.max_per)]
        # Generate string version of the month variable because it must be serializable
        # (it will be converted to javascript)
        df_sub['month_str'] = df_sub['month'].apply(lambda x: x.strftime('%m-%Y'))
        # Capture traces for each category from subset (expanded for conceptual clarity)
        traces = []
        for i,d in enumerate(dim_labs[self.dim]):
            if plot_func == pgo.Scatter:
                next_trace = plot_func(x=df_sub['month_str'], y=df_sub[d], name=d, line=dict(color=self.colors[i]))
            elif plot_func == pgo.Bar:
                next_trace = plot_func(x=df_sub['month_str'], y=df_sub[d], name=d, marker=dict(color=self.colors[i]))
            traces.append(next_trace)
        return traces
    
    def get_layout(self, barmode=None):
        '''
        Method returns a layout with plot title and y-axis label tailored to the given
        combination of area, dimension, and measure.
        '''
        meas_str = ' '.join(self.measure.split('_')).title()
        dim_str = self.dim.title()
        ttl = '{m} in {a} by {d}'.format(a=self.area, d=dim_str, m=meas_str)
        y_ttl='Year-over-Year {} (%)'.format(meas_str)
        yaxis_config = {
            'title': y_ttl
        }
        return pgo.Layout(title=ttl, yaxis=yaxis_config, barmode=barmode)
    
    def get_fig(self, plot_func, barmode=None):
        '''
        Method returns a plotly figure that is ready for plotting, building on the traces
        and layout returned in above methods.
        '''
        traces = self.get_traces(plot_func)
        layout = self.get_layout(barmode)
        return pgo.Figure(data=traces, layout=layout)        
    
pcg_rate = plotlyChartGenerator('Chicago, IL - Metro Area', 'age', 'growth_rate')

pcg_rate.to_string()
# pcg_rate.get_traces()
# pcg_rate.get_layout()
# pcg_rate.get_line_fig()

plotlyChartGenerator(
	Area = Chicago, IL - Metro Area
	Dimension = age
	Measure = growth_rate
	Minimum Period = 2013-10
	Maximum Period = 2018-03
	Color Palette = ['#00a0dd', '#a2dadb', '#bbd976', '#ffe18b', '#fbaf5d', '#f57f32']
)

Input Data Sample:
    month    <25  25-34  35-44  45-54  55-64   65+
0 2013-10  18.66   7.44   5.04   3.18   1.13  1.19
1 2013-11  17.41   8.28   5.75   3.87   1.55 -0.64
2 2013-12  14.69   3.30   1.40   0.45  -1.20 -0.76
3 2014-01  15.67   5.19   3.00   1.56  -1.11 -3.58
4 2014-02  20.18   5.89   3.05   1.64   0.73 -1.97


In [90]:
chi_age_rate = pcg_rate.get_fig(pgo.Scatter)
pl.plotly.iplot(chi_age_rate)

In [91]:
pcg_contr = plotlyChartGenerator('Chicago, IL - Metro Area', 'age', 'growth_contribution')

pcg_contr.to_string()

plotlyChartGenerator(
	Area = Chicago, IL - Metro Area
	Dimension = age
	Measure = growth_contribution
	Minimum Period = 2013-10
	Maximum Period = 2018-03
	Color Palette = ['#00a0dd', '#a2dadb', '#bbd976', '#ffe18b', '#fbaf5d', '#f57f32']
)

Input Data Sample:
    month   <25  25-34  35-44  45-54  55-64   65+
0 2013-10  1.56   1.13   1.06   0.70   0.19  0.19
1 2013-11  1.49   1.23   1.21   0.87   0.27 -0.10
2 2013-12  1.19   0.48   0.29   0.10  -0.22 -0.12
3 2014-01  1.13   0.79   0.63   0.36  -0.20 -0.58
4 2014-02  1.17   0.96   0.67   0.38   0.13 -0.30


In [96]:
chi_age_contr = pcg_contr.get_fig(pgo.Bar, barmode='relative')
pl.plotly.iplot(chi_age_contr)