# Categorizing Stores Using NTILE Window Function

The data is a three year record of revenue generated from 144 stores in 37 cities. The task here is to enable the business owners catgeorize the stores as one of the following options: super_center, city_store and community_center. The categorization depends on the total revenue generated from each store.

This notebook uses the ntile window function to achieve the categorization. The output of the queries shows the ids of the stores and the city_ids where they are located.

In [1]:
#import libraries for the notebook
import os
from sqlalchemy import create_engine
import pandas as pd

In [2]:
# parameters for connecting to the database
host = os.getenv('HOST')
database = os.getenv('SQL_DATABASE')
user = os.getenv('SQL_USER')
password = os.getenv('SQL_PASSWORD')

In [3]:
connection_string = f"postgresql://{user}:{password}@{host}/{database}"

In [4]:
engine = create_engine(connection_string)

## Queries

### Store Categories 
This query splits the 144 stores into twelve groups based on the total revenue of each store for the three years under review. The first twelve stores are categorized as super_centers while the last twelve are categories are categorized as community_stores. The stores not in these two categories are categorized under city_stores.

In [5]:
store_category = pd.read_sql(
'''
select store_id, city_id,
case
    when categories=1 then 'super_center'
    when categories=12 then 'community_store'
    else 'city_store'
end as store_category
from(
select *,
ntile(12) over(order by store_revenue desc) as categories
from(select sales.store_id, city_id, cast(sum(revenue) as decimal 
(10,2)) store_revenue 
from sales join store_cities
on sales.store_id=store_cities.store_id
group by sales.store_id, city_id
) result) 
x
''',
engine)
store_category

Unnamed: 0,store_id,city_id,store_category
0,S0085,C014,super_center
1,S0097,C014,super_center
2,S0026,C014,super_center
3,S0062,C014,super_center
4,S0020,C014,super_center
...,...,...,...
139,S0006,C024,community_store
140,S0136,C005,community_store
141,S0007,C014,community_store
142,S0047,C031,community_store


### Super_Centers
This query returns only the stores in the super_center category

In [6]:
super_centers = pd.read_sql(
'''
select store_id, city_id
from(select store_id, city_id,
case
    when categories=1 then 'super_center'
    when categories=12 then 'community_store'
    else 'city_store'
end as store_category
from(
select *,
ntile(12) over(order by store_revenue desc) as categories
from(
    select sales.store_id, city_id, 
    cast(sum(revenue) as decimal (10,2)) store_revenue 
    from sales join store_cities
    on sales.store_id=store_cities.store_id
    group by sales.store_id, city_id
) 
result) 
x) y
where y.store_category='super_center'
''',
engine)
super_centers

Unnamed: 0,store_id,city_id
0,S0085,C014
1,S0097,C014
2,S0026,C014
3,S0062,C014
4,S0020,C014
5,S0038,C004
6,S0095,C014
7,S0115,C020
8,S0001,C031
9,S0112,C031


### Community_Stores
This query returns only the stores in the community_store category

In [7]:
community_centers = pd.read_sql(
'''
select store_id, city_id
from(select store_id, city_id,
case
    when categories=1 then 'super_center'
    when categories=12 then 'community_store'
    else 'city_store'
end as store_category
from(
select *,
ntile(12) over(order by store_revenue desc) as categories
from(
    select sales.store_id, city_id, 
    cast(sum(revenue) as decimal (10,2)) store_revenue 
    from sales join store_cities
    on sales.store_id=store_cities.store_id
    group by sales.store_id, city_id
) 
result) 
x) y
where y.store_category='community_store'
''',
engine)
community_centers

Unnamed: 0,store_id,city_id
0,S0030,C006
1,S0141,C005
2,S0086,C003
3,S0134,C020
4,S0130,C037
5,S0041,C013
6,S0127,C029
7,S0006,C024
8,S0136,C005
9,S0007,C014


The stores are successfully categorized.