**POI Clustering**

*Step1: Get the category corresponding to POIs*

- **1.1 Split raw POIs data**
- 1.2 Add CAT_NUMBER to micode sheet(merged by TRADE_GROUP)
- 1.3 Add CAT_NUMBER to POI data (merged by MICODE)
- 1.4 Retain POI data with CAT_NUMBER between 101 and 115

# 1 Import packages

In [3]:
import pandas as pd
import os

# 2 Split raw pois csv files and export dataframes

In [6]:
# Define input and output file paths
input_folder = '../../../../data/Locomizer/pois'
output_folder = '../../../../data/Locomizer_edited/pois'

In [9]:
# Iterate through the subfolders under the input folder
for folder in ['2023-05', '2023-06', '2023-07', '2023-08', '2023-09', '2023-10', '2023-11', '2023-12', '2024-01', '2024-02', '2024-03', '2024-04']:
    # Build input and output file paths
    input_file = os.path.join(input_folder, folder, 'POIs_for_JCD_Finland_(micodes).csv')
    output_file = os.path.join(output_folder, f'{folder}_POIs_for_JCD_Finland_(micodes).csv')
    
    
    # Open the file and read all lines
    with open(input_file, 'r', encoding='utf-8') as file:
        lines = file.readlines()

    # The first line is the columns, split by tabs
    columns = lines[0].strip('"').split('\t')
    columns[-1] = columns[-1].rstrip('",\n')  # Clean the last column name

    # Subsequent lines are data, processed line by line
    data = [line.strip('"').split('\t') for line in lines[1:]]

    # Create a DataFrame
    df = pd.DataFrame(data, columns=columns)  # Create DataFrame with cleaned column names

    # Clean the last column data
    df[df.columns[-1]] = df[df.columns[-1]].str.rstrip('",\n')  # Clean the last column data


    # Create output directory if it doesn't exist
    os.makedirs(os.path.dirname(output_file), exist_ok=True)

    # Save the processed DataFrame to a new CSV file
    df.to_csv(output_file, index=False, encoding='utf-8')

    # Outputs a message indicating which folder processing is complete
    print(f'Finish processing the folder: {folder}')

Finish processing the folder: 2023-05
Finish processing the folder: 2023-06
Finish processing the folder: 2023-07
Finish processing the folder: 2023-08
Finish processing the folder: 2023-09
Finish processing the folder: 2023-10
Finish processing the folder: 2023-11
Finish processing the folder: 2023-12
Finish processing the folder: 2024-01
Finish processing the folder: 2024-02
Finish processing the folder: 2024-03
Finish processing the folder: 2024-04
