# Trade Area Modeling
According to [Market Business News](https://marketbusinessnews.com/financial-glossary/trade-area-definition-meaning/), a trade (or market) area is the geographical area  where all or most of a business' sales volume occurs. It can be used to make decisions about the optimum location for business growth and expansion.

There are different trade area model and in this use case, we will be using the widely accepted [Huff model](https://en.wikipedia.org/wiki/Huff_model). The Huff model uses the **distance** between customers and existing business locations, or locations that are planned for the future, along with the **attractiveness of those locations**, to determine the likelihood of customers visiting those locations.

## Case Study
A beauty and personal care brand (we'll call them Lotionfy) has thrived as an exclusively online business for the past two years. They serve a predominantly US market, have done some market research and this is what they have discovered.
1. While the e-commerce share of total retail sales continues to rise, in-store shopping is still the preferred method for most US cusstomers and makes up a very large chunk of total retail sales. [[Reference](https://capitaloneshopping.com/research/online-vs-in-store-shopping-statistics/)]
2. More online shoppers are opting to pick up orders in-store and shoppers are generally returning to stores; retailers are reaching customers with small-format stores to encourage impulse purchases and reach new markets, among other benefits[[Reference](https://www2.deloitte.com/us/en/pages/consulting/articles/q1-2023-consumer-trends-report.html)]
3. Online businesses are developing physical footprints to keep their customers loyal and give them additional touchpoints for the brand; customers themselves now prefer omnichannel experiences [[Reference](https://www.inc.com/rebecca-deczynski/the-future-of-retail-isnt-direct-to-consumer-brands-embracing-brick-mortar-2023.html)]

In light of these, Lotionfy has decided to build their physical presence. To start, they have also decided to leverage partnerships with already existing [top department stores](https://www.junglescout.com/wp-content/uploads/2023/09/Jungle-Scout-Consumer-Trends-Report-Q3-2023.pdf) in the US like Walmart, Target, Kohl's etc. They will be leveraging the small-format store approach by drilling down to neighborhood locations.
They have decided to use a trade area model to know which cities to focus on and which store locations should be their areas of focus.

### Huff Equation
The equation is as follows:
$
P_{ij} = \frac{A_j^{\alpha} \cdot \left(\frac{1}{D_{ij}}\right)^{\beta}}{\sum_{k=1}^{n} A_k^{\alpha} \cdot \left(\frac{1}{D_{ik}}\right)^{\beta}}
$

where $P_{ij}$ is the probability that a customer at location i will visit location j; $A_j$ is the attractiveness of location j; $D_{ij}$ is the distance from location i to j; ${\beta}$ is the distance decay parameter (a constant value of 2); and ${\alpha}$ is the attractiveness parameter (a constant value of 1).

### Table of Contents
1. [Modules and Clients](#modules-and-clients)
2. [Data Loading](#data-loading)
3. [Data Inspection and Cleaning](#data-inspection-and-cleaning)
4. [Attractiveness Measure, $A_j$](#attractiveness-measure-$A_j$)
5. [Huff Model Numerator](#huff-model-numerator)
6. [Huff Model Denominator](#huff-model-denominator)
7. [Probability of walk-ins, $P_{ij}$](#probability-of-walk-ins-$P_ij$)
8. [Probable customer walk-ins](#probable-customer-walk-ins)

#### Modules and Clients
Here, we import the modules needed to run the Python script.

In [1]:
# import relevant packages
import pandas as pd

#### Data Loading
We load the data saved from the data mining [notebook](https://github.com/ItunuoluwaOlowoye/Portfolio/blob/main/projects/Trade%20Area%20Modelling/Trade%20Area%20Data%20Mining.ipynb).

In [2]:
# load datasets
customer_communities = pd.read_csv('data/customers/customer_communities.csv')
store_distances = pd.read_csv('data/stores/store_distances.csv')
store_preferences = pd.read_csv('data/stores/store_preferences.csv')
store_data = pd.read_csv('data/stores/store_data.csv')

#### Data Inspection and Cleaning
Here, we inspect the data and performany necessary data cleaning methods.

In [3]:
#inspect customer communities
customer_communities.head()

Unnamed: 0,location,neighborhoods,geometry,population,table_date
0,"Albuquerque, New Mexico",Los Alamos,MULTIPOLYGON (((-106.63633315970289 35.1406749...,70,2023-12-02
1,"Albuquerque, New Mexico",Antelope Run,MULTIPOLYGON (((-106.49708351447421 35.1590821...,426,2023-12-02
2,"Albuquerque, New Mexico",Paradise Heights,MULTIPOLYGON (((-106.67536884580812 35.2135858...,233,2023-12-02
3,"Albuquerque, New Mexico",Valley Gardens,MULTIPOLYGON (((-106.71619303003331 35.0117186...,789,2023-12-02
4,"Albuquerque, New Mexico",Ladera West,MULTIPOLYGON (((-106.71257358120968 35.1273605...,695,2023-12-02


In [4]:
# inspect store data
store_data.head(2)

Unnamed: 0,element_type,osmid,name,address,geometry,building:levels,store_location,city_location
0,way,33881655,Walmart Neighborhood Market,"115 E Dunlap Ave, Phoenix","POLYGON ((-112.0725772 33.5667405, -112.072514...",1.0,"Walmart Neighborhood Market, 115 E Dunlap Ave,...","Phoenix, Arizona"
1,way,65742643,Walmart Supercenter,"4747 E Cactus Rd, Phoenix","POLYGON ((-111.9779737 33.5957764, -111.977992...",1.0,"Walmart Supercenter, 4747 E Cactus Rd, Phoenix","Phoenix, Arizona"


In [5]:
# filter for population in communities with store distances
community_pop = customer_communities.loc[(customer_communities['location'].isin(store_data['city_location'])),
                                             ['neighborhoods','population']]

In [6]:
# inspect store distances
store_distances.head()

Unnamed: 0,neighborhoods,"Walmart Neighborhood Market, 115 E Dunlap Ave, Phoenix","Walmart Supercenter, 4747 E Cactus Rd, Phoenix","Target, 21001 N Tatum Blvd, Phoenix","Target, 7409 W Virginia Ave, Phoenix","Walmart Supercenter, 1825 W Bell Rd, Phoenix","Walmart Supercenter, 5250 W Indian School Rd, Phoenix","Target, 9830 W Lower Buckeye Rd, Tolleson","Kohl's, 3000-3010 S 99th Ave, Tolleson","Walmart Supercenter, 3721 E Thomas Rd, Phoenix",...,"Walmart Supercenter, 1607 W Bethany Home Rd, Phoenix","Target, 5715 N 19th Ave, Phoenix","Target, 2727 W Agua Fria Fwy, Phoenix","Walmart Supercenter, 9600 N Metro Pkwy W, Phoenix","Walmart Supercenter, 2501 W Happy Valley Rd Suite 34, Phoenix","Marshalls, 4729 E Ray Rd, Phoenix","Target, 2140 E Baseline Rd, Phoenix","Walmart Neighborhood Market, 2435 E Baseline Rd, Phoenix","Marshalls, 21001 N Tatum Blvd, Phoenix","Marshalls, 10130 W McDowell Rd, Avondale"
0,Maryvale,28.9,39.0,49.9,2.8,34.0,4.8,13.1,13.4,25.1,...,21.7,14.5,37.6,25.6,41.6,40.9,28.5,29.1,50.5,8.6
1,North Mountain,5.3,14.3,23.3,27.9,5.4,18.2,35.8,36.1,28.7,...,10.1,8.3,12.3,4.0,15.1,44.3,32.2,32.7,23.9,32.2
2,Ahwatukee Foothills,53.4,56.7,64.4,37.5,60.9,37.9,36.0,35.7,36.7,...,48.6,47.6,64.8,52.5,67.6,11.6,29.5,29.2,65.0,41.8
3,Rio Vista,49.9,56.3,47.5,69.1,39.5,59.5,73.9,74.3,71.4,...,52.6,52.5,34.5,45.2,30.4,85.5,73.4,74.0,48.1,69.8
4,Desert View,37.7,44.1,27.1,56.9,27.3,47.2,61.7,62.0,59.1,...,40.3,40.3,22.3,33.0,18.2,73.3,61.2,61.7,27.0,57.6


In [7]:
# set neighborhoods as index in community expense and store distances
community_pop.set_index('neighborhoods', inplace=True)
store_distances.set_index('neighborhoods', inplace=True)

In [8]:
# inspect store preferences
store_preferences.head()

Unnamed: 0,store,avg_travel_time,business_communities,highways,design,accessibility,parking_space,store_size
0,"Walmart Neighborhood Market, 115 E Dunlap Ave,...",29.1,3,6,16,110,1,3719.55
1,"Walmart Supercenter, 4747 E Cactus Rd, Phoenix",32.1,1,2,26,346,2,9359.74
2,"Target, 21001 N Tatum Blvd, Phoenix",30.3,1,4,44,42,1,13911.77
3,"Target, 7409 W Virginia Ave, Phoenix",32.7,1,2,26,65,3,11542.02
4,"Walmart Supercenter, 1825 W Bell Rd, Phoenix",27.7,1,3,8,47,1,21883.87


In [9]:
# set store as index
store_preferences.set_index('store', inplace=True)

#### Attractiveness Measure, $A_j$
Here we calculate the attractiveness by summing up the values of the different store preference metrics for each store. Before summing them up, we first rank them relative to other stores on a scale of 0 to 1. The scale is inversed for travel time since the longer the time spent on the road, the less attractive the location will be.

In [10]:
# rank travel time in descending order
travel_time_rank = store_preferences[['avg_travel_time']].rank(pct=True, ascending=False)
# rank other store preferences
store_preferences_rank = store_preferences.drop(['avg_travel_time'], axis=1).rank(pct=True)
# create ranked dataframe
store_preferences_rank = pd.concat([travel_time_rank,store_preferences_rank], axis=1)

In [11]:
# measure attractiveness
attractiveness = store_preferences_rank.sum(axis=1)

#### Huff Model Numerator
Here the numerator of the huff equation is calculated as the attractiveness weighted inverse distances of potential locations, taking the distance decay parameter into consideration.

In [12]:
# create the dataframe that serves as the numerator in the Huff model
numerator_df = pd.DataFrame([], index=store_distances.index)

In [13]:
# calculate the numerator
for col in store_distances.columns:
    numerator_df[col] = attractiveness[col] / (store_distances[col])**2

#### Huff Model Denominator
The denominator is calculated as the sum of the attractiveness weighted inverse distances (and its decay parameter) for all potential locations.

In [14]:
# calculate the denominator
denominator = numerator_df.sum(axis=1)

#### Probability of walk-ins, $P_{ij}$
This is the result of the Huff model equation.

In [15]:
# create the probability dataframe
prob_df = pd.DataFrame([], index=store_distances.index)

In [16]:
for col in numerator_df.columns:
    prob_df[col] = numerator_df[col]/denominator

#### Probable Customer Walk-ins
Now that the probability has been estimated, it needs to be weighed vis-a-vis the customer population in potential locations. That is, if location A has 3,000 customers and the walk-in probability is 90%, this means that 2,700 customers are likely to walk in. If location B has 10,000 customers and the walk-in probability is 40%, this means that 4,000 customers are likely to walk in. While location B has a lower probability, more customers are attracted, so location B will be more preferable to location A.

The probable customer walk-ins is calculated as the probability of walk-ins multiplied by the customer population for all potential locations.

In [17]:
# create the population dataframe
pop_df = pd.DataFrame([], index=store_distances.index)

In [18]:
# calculate the probable number of store walk-ins in each neighborhood
for col in prob_df.columns:
    pop_df[col] = prob_df[col] * community_pop['population']

In [19]:
# sum up the total probable customer walk-ins at each store location
population = pop_df.astype(int).sum(axis=0).to_dict()

In [20]:
# select the store with the max probable sales
max_population = max(population.values())
optimum_stores = [key for key in population.keys() if population[key] == max_population]
# print result
if len(optimum_stores) == 1:
    print(f'''The optimum store location is {optimum_stores[0]} with probable walk-ins of {max_population:,} existing customers''')
else:
    optimum_stores = "\n".join(optimum_stores)
    print(f'''The optimum store locations with probable walk-ins of {max_population:,} existing customers are:\n\n{optimum_stores}''')

The optimum store location is Target, 2727 W Agua Fria Fwy, Phoenix with probable walk-ins of 75,397 existing customers
