**POI Clustering**

*Step1: Get the category corresponding to POIs*

- 1.1 Split raw POIs data
- **1.2 Add CAT_NUMBER to micode sheet(merged by TRADE_GROUP)**
- 1.3 Add CAT_NUMBER to POI data (merged by MICODE)
- 1.4 Retain POI data with CAT_NUMBER between 101 and 115

# 1 Import package

In [1]:
import pandas as pd

# 2 Create a table including category numbers and names

In [2]:
# Create a correspondence table
trade_group_data = {
    'CAT_NUMBER': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115],
    'TRADE_GROUP': [
        'Apparel and Accessory Stores',
        'Automotive Dealers and Gasoline Service Stations',
        'Eating and Drinking Places',
        'Home Furniture, Furnishings and Equipment Stores',
        'Amusement and Recreation Services',
        'Automotive Repair, Services and Parking',
        'Educational Services',
        'Health Services',
        'Hotels, Rooming Houses, Camps, and Other Lodging Places',
        'Leisure',
        'Motion Pictures',
        'Museums, Art Galleries and Botanical and Zoological Gardens',
        'Shopping',
        'Tourism',
        'Sports'
    ]
}

# Create a DataFrame for the corresponding table
trade_group_df = pd.DataFrame(trade_group_data)

In [3]:
# Check trade group
trade_group_df

Unnamed: 0,CAT_NUMBER,TRADE_GROUP
0,101,Apparel and Accessory Stores
1,102,Automotive Dealers and Gasoline Service Stations
2,103,Eating and Drinking Places
3,104,"Home Furniture, Furnishings and Equipment Stores"
4,105,Amusement and Recreation Services
5,106,"Automotive Repair, Services and Parking"
6,107,Educational Services
7,108,Health Services
8,109,"Hotels, Rooming Houses, Camps, and Other Lodgi..."
9,110,Leisure


# 3 Merge data

- add CAT_NUMER to micode_sheet

In [4]:
# Read Excel file
input_file = '../../../../data/Locomizer/micode_sheet.xlsx'
df = pd.read_excel(input_file)

In [5]:
df.head()

Unnamed: 0,TRADE_DIVISION,TRADE_GROUP,CLASS,SUB_CLASS,SIC,SIC8,MICODE,SIC8_DESCRIPTION
0,"Division A. - Agriculture, Forestry, and Fishing",Agricultural Production - Crops,Cash Grains,Wheat,111.0,1110000.0,10050111,Wheat
1,"Division A. - Agriculture, Forestry, and Fishing",Agricultural Production - Crops,Cash Grains,Rice,112.0,1120000.0,10050112,Rice
2,"Division A. - Agriculture, Forestry, and Fishing",Agricultural Production - Crops,Cash Grains,Corn,115.0,1150000.0,10050115,Corn
3,"Division A. - Agriculture, Forestry, and Fishing",Agricultural Production - Crops,Cash Grains,Soybeans,116.0,1160000.0,10050116,Soybeans
4,"Division A. - Agriculture, Forestry, and Fishing",Agricultural Production - Crops,Cash Grains,"Cash Grains, nec",119.0,1190000.0,10050119,"Cash grains, nec"


In [6]:
# Merge TRADE_GROUP with the corresponding table and add CAT_NUMBER columns
df = df.merge(trade_group_df, on='TRADE_GROUP', how='left')

In [7]:
df.head()

Unnamed: 0,TRADE_DIVISION,TRADE_GROUP,CLASS,SUB_CLASS,SIC,SIC8,MICODE,SIC8_DESCRIPTION,CAT_NUMBER
0,"Division A. - Agriculture, Forestry, and Fishing",Agricultural Production - Crops,Cash Grains,Wheat,111.0,1110000.0,10050111,Wheat,
1,"Division A. - Agriculture, Forestry, and Fishing",Agricultural Production - Crops,Cash Grains,Rice,112.0,1120000.0,10050112,Rice,
2,"Division A. - Agriculture, Forestry, and Fishing",Agricultural Production - Crops,Cash Grains,Corn,115.0,1150000.0,10050115,Corn,
3,"Division A. - Agriculture, Forestry, and Fishing",Agricultural Production - Crops,Cash Grains,Soybeans,116.0,1160000.0,10050116,Soybeans,
4,"Division A. - Agriculture, Forestry, and Fishing",Agricultural Production - Crops,Cash Grains,"Cash Grains, nec",119.0,1190000.0,10050119,"Cash grains, nec",


# 3 Export dataframe

In [8]:
# Export dataframe with CAT_NUMBER to a new Excel file
output_file = '../../../../data/Locomizer_edited/micode_sheet_edited.xlsx'
df.to_excel(output_file, index=False)