# A Critical Analysis of Engel and Rogers (1996) Using Data Science

**Michal Fabinger and Quentin Batista**  
_The University of Tokyo_

Using CPI data for U.S. and Canadian cities, Engel and Rogers (1996) argued that the variation of price is much higher for two cities located in different countries than for two equidistant cities in the same country. While the paper does provide some potential explanation for this border effect, such as nomial price stickiness, the question is left open-ended. In a follow-up paper, Gorodnichenko and Tesar (2009) showed that the border effect identified
by Engel and Rogers (1996) is in fact entirely driven by the difference in the distribution of prices within the United States and Canada. Below, we complement this paper by carefully examining the data. The data patterns we observe suggest that there might be some flaws in how the data was collected. The data was obtained directly from Engel's website (https://www.ssc.wisc.edu/~cengel/Data/Border/BorderData.htm).

## Engel and Rogers (1996) Methodology

- All prices are converted into U.S. dollars using a monthly average exchange rate.
- Hypothesis: Volatility of the prices of similar goods sold in different location is related to the distance between locations and other explanatory variables including a dummy variable for wheter the cities are in different countries.
- Volatility of log of relative price across cities on log distance, country dummy and cities dummy. 
- Filtered measure: regress the log of relative prices across cities on 12 seasonal dummies and six monthly lags.



# Data Preprocessing

In [1]:
import pandas as pd
import numpy as np

# Import Data
US_data_url = 'http://www.ssc.wisc.edu/~cengel/Data/Border/USA.xls'
US_price_data = pd.read_excel(US_data_url, na_values=np.nan).stack(dropna=False).reset_index()
US_L2M_price_data = pd.read_excel(US_data_url, na_values=np.nan).shift(2).stack(dropna=False).reset_index()

CAN_data_url = 'http://www.ssc.wisc.edu/~cengel/Data/Border/CAN.xls'
CAN_price_data = pd.read_excel(CAN_data_url, na_values=np.nan).stack(dropna=False).reset_index()
CAN_L2M_price_data = pd.read_excel(CAN_data_url, na_values=np.nan).shift(2).stack(dropna=False).reset_index()

# Process US Data
# Create common index and merge
US_price_data['JoinIndex'] = US_price_data['level_0'] + \
 US_price_data['level_1']
US_L2M_price_data['JoinIndex'] = US_L2M_price_data['level_0'] + \
 US_L2M_price_data['level_1']
US_price_data = US_price_data.merge(US_L2M_price_data[['JoinIndex', 0]],
                                    how='left', on='JoinIndex')

# Add country column
US_price_data['Country'] = 'US'

# Split date into two columns
US_price_data['Year'], US_price_data['Month'] = \
 zip(*US_price_data['level_0'].map(lambda x: x.split(':')))

# Split city and good code into two columns
US_price_data['CityCode'], US_price_data['GoodCode'] = \
 zip(*US_price_data['level_1'].map(lambda x: (x[:2], x[2:])))

# Process Canadian Data
# Create common index and merge
CAN_price_data['JoinIndex'] = CAN_price_data['level_0'] + \
 CAN_price_data['level_1']
CAN_L2M_price_data['JoinIndex'] = CAN_L2M_price_data['level_0'] + \
 CAN_L2M_price_data['level_1']
CAN_price_data = CAN_price_data.merge(CAN_L2M_price_data[['JoinIndex', 0]],
                                      how='left', on='JoinIndex')

# Add country column
CAN_price_data['Country'] = 'Canada'

# Split date into two columns
CAN_price_data['Year'], CAN_price_data['Month'] = \
 zip(*CAN_price_data['level_0'].map(lambda x: x.split(':')))

# Split city and good code into two columns
CAN_price_data['CityCode'], CAN_price_data['GoodCode'] = \
 zip(*CAN_price_data['level_1'].map(lambda x: (x[:1], x[1:])))

# Merging and cleaning up the dataframe
price_data = pd.concat([US_price_data, CAN_price_data])
price_data = price_data.drop(['level_1', 'JoinIndex'], axis=1)

# Reformat date column
price_data['level_0'] = pd.to_datetime(price_data['level_0'].str.replace(':',
                                                                         '-'))

# Rename columns
price_data.columns = ['Date', 'Price', 'PriceL2M', 'Country', 'Year', 'Month',
                      'CityCode', 'GoodCode']

# Replace negative values by np.nan
price_data.loc[price_data['Price'] < 0, 'Price'] = np.nan
price_data.loc[price_data['PriceL2M'] < 0, 'PriceL2M'] = np.nan

# Reorganize columns
price_data = price_data[['Date', 'Year', 'Month', 'Country', 'CityCode',
                        'GoodCode', 'Price', 'PriceL2M']]

# Reset index
price_data = price_data.reset_index(drop=True)

In [3]:
price_data.sample(n=15)

Unnamed: 0,Date,Year,Month,Country,CityCode,GoodCode,Price,PriceL2M
57835,1977-02-01,1977,2,Canada,V,0,,
89155,1994-12-01,1994,12,Canada,T,9,131.7,133.6
69784,1983-11-01,1983,11,Canada,R,14,91.96824,91.13989
85144,1992-09-01,1992,9,Canada,E,1,114.6,114.0
1356,1976-07-01,1976,7,US,CH,6,66.0,65.5
76250,1987-08-01,1987,8,Canada,Q,3,105.5,104.2
85549,1992-11-01,1992,11,Canada,O,14,127.9526,128.68064
19548,1983-03-01,1983,3,US,WA,3,99.1,98.4
34929,1988-12-01,1988,12,US,PH,9,123.8,127.5
10256,1979-10-01,1979,10,US,SF,11,62.7,60.4


In [4]:
# Create dictionaries containing good descriptions and city names

goods_desriptions = {"0": "City CPI",
                     "1": "Food at home",
                     "2": "Food away from home",
                     "3": "Alcoholic beverages",
                     "4": "Shelter",
                     "5": "Fuel and other utilities",
                     "6": "Household furnishings & operations",
                     "7": "Men's and boy's apparel",
                     "8": "Women's and girl's apparel",
                     "9": "Footwear",
                     "10": "Private transporation",
                     "11": "Public transporation",
                     "12": "Medical care",
                     "13": "Personal care",
                     "14": "Entertainment"}

city_names = {"CH": "Chicago",
              "LA": "Los Angeles",
              "NY": "New York",
              "PH": "Philadelphia",
              "DA": "Dallas",
              "DT": "Detroit",
              "HS": "Houston",
              "PI": "Pittsburgh",
              "SF": "San Francisco",
              "BA": "Baltimore",
              "BO": "Boston",
              "MI": "Miami",
              "ST": "St. Louis",
              "WA": "Washington, DC",
              "Q": "Quebec",
              "M": "Montreal",
              "O": "Ottawa",
              "T": "Toronto",
              "W": "Winnipeg",
              "R": "Regina",
              "E": "Edmonton",
              "C": "Calgary",
              "V": "Vancouver"}

# Inverse mappings
inv_goods_desriptions = {v: k for k, v in goods_desriptions.items()}
inv_city_names = {v: k for k, v in city_names.items()}

price_data['GoodDescription'] = price_data['GoodCode'].map(goods_desriptions)
price_data['CityName'] = price_data['CityCode'].map(city_names)

In [5]:
from bokeh.plotting import figure, show, output_notebook, gridplot
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.palettes import all_palettes

output_notebook()

TOOLS = "crosshair,pan,wheel_zoom,reset,tap,save"

colors = all_palettes['Category20'][len(goods_desriptions)]

grid = []
plot_list = []

for city_code in city_names:
    hover = HoverTool(tooltips=[
        ("index", "$index"),
        ("good type", "@good"),
        ("(x,y)", "($x, $y)"),
    ])

    p = figure(x_axis_type="datetime", tools=[TOOLS, hover], plot_width=400,
               plot_height=400)
    p.title.text = city_names[city_code]
    p.title.align = 'center'

    for good_code in goods_desriptions:
        condition = (price_data['CityCode'] == city_code) & \
         (price_data['GoodCode'] == good_code)
        source = ColumnDataSource(data=dict(
            x=price_data['Date'][condition],
            y=price_data['Price'][condition],
            good=price_data['GoodDescription'][condition]))

        p.line(x='x', y='y', color=colors[int(good_code)], source=source)

    if len(plot_list) < 2:
        plot_list.append(p)
    else:
        grid.append(plot_list)
        plot_list = []
        plot_list.append(p)

if plot_list:
    grid.append(plot_list)

p = gridplot(grid)

show(p)

In [6]:
def mean_init_price_index(price_type, city_code, good_code):
    index = price_data[(price_data['CityCode'] == city_code) &
                       (price_data['GoodCode'] == good_code) &
                       price_data['Year'].isin(['1980', '1981'])][
                           price_type].mean()
    return index


def data_normalization(df, col_to_normalize, city_names, goods_desriptions):
    for city_code in city_names:
        for good_code in goods_desriptions:
            condition = (df['CityCode'] == city_code) & \
             (df['GoodCode'] == good_code)
            df.loc[condition, col_to_normalize + 'N'] = \
                df[col_to_normalize][condition] / \
                mean_init_price_index(col_to_normalize, city_code, good_code)

data_normalization(price_data, 'Price', city_names, goods_desriptions)
data_normalization(price_data, 'PriceL2M', city_names, goods_desriptions)

In [10]:
from ipywidgets import interact
import flexx
from bokeh.models import Legend

TOOLS = "crosshair,hover,pan,wheel_zoom,reset,tap,save"

p_cities = figure(x_axis_type="datetime", tools=TOOLS, plot_width=800,
                  plot_height=600, toolbar_location="above")

colors = all_palettes['Category20'][len(goods_desriptions)]

lines = []
legend_it = []
for good_code in goods_desriptions:
    condition = (price_data['CityCode'] == 'CH') & \
     (price_data['GoodCode'] == good_code)
    temp_line = p_cities.line(x=price_data['Date'][condition],
                              y=price_data['PriceN'][condition],
                              color=colors[int(good_code)])
    lines.append(temp_line)
    legend_it.append((goods_desriptions[good_code], [temp_line]))


def city_plot_update(City):
    for line, good_code in zip(lines, goods_desriptions):
        condition = (price_data['CityCode'] == inv_city_names[City]) & \
         (price_data['GoodCode'] == good_code)
        line.data_source.data['x'] = price_data['Date'][condition]
        line.data_source.data['y'] = price_data['PriceN'][condition]
    show(p_cities)

legend = Legend(items=legend_it, location=(0, 100))
legend.click_policy = "hide"

p_cities.add_layout(legend, 'right')

interact(city_plot_update, City=city_names.values())

<function __main__.city_plot_update>

In [11]:
from bokeh.palettes import magma

p_good = figure(x_axis_type="datetime", tools=TOOLS, plot_width=800,
                plot_height=600, toolbar_location="above")
lines = []
legend_it = []
colors = magma(len(city_names))

for (i, city) in enumerate(city_names):
    condition = (price_data['CityCode'] == city) & \
     (price_data['GoodCode'] == '0')
    temp_line = p_good.line(x=price_data['Date'][condition],
                            y=price_data['PriceN'][condition],
                            color=colors[i])
    lines.append(temp_line)
    legend_it.append((city_names[city], [temp_line]))


def good_plot_update(Good):
    for line, city in zip(lines, city_names):
        condition = (price_data['CityCode'] == city) & \
         (price_data['GoodCode'] == inv_goods_desriptions[Good])
        line.data_source.data['x'] = price_data['Date'][condition]
        line.data_source.data['y'] = price_data['PriceN'][condition]
    show(p_good)

legend = Legend(items=legend_it, location=(0, 25))
legend.click_policy = "hide"

p_good.add_layout(legend, 'right')

interact(good_plot_update, Good=goods_desriptions.values())

<function __main__.good_plot_update>

# Observations

- Sudden increase in food away from home for Canadian cities around 1992.
- 2-year cyclicality for Fuel and other utilities for some US cities such as Chicago, LA and Philadelphia.
- Sudden price increase for Women's and girl's apparel in some Canadian cities around 1992 (Montreal, Quebec, Regina).
- Temporary peaks in public transportation prices for canadian cities from 1985 to 1990.
- Sudden increase in medical care prices in Regina around 1997.
- Sudden increase in alcoholic beverages prices around 1992 in Chicago, LA, Philadelphia and SF.
- Women's and girl's apparel and footwear are 30% lower in Philadelphia than in New York from 1990.

# References

Engel, Charles, and John H. Rogers. 1996. “How Wide Is the Border?” American Economic Review 86(5):1112–25.

Gorodnichenko, Yuriy, and Linda L. Tesar. 2009. "Border Effect or Country Effect? Seattle May Not Be So Far from Vancouver After All." American Economic Journal: Macroeconomics, 1(1): 219-41.