# Jeju Island Spending Patterns & Population Dynamics (2017-2018)
### Subtitle: Exploring economic shifts and demographic correlations in Jeju Island

---
**Author:** Jay Park
**Date:** February 2026  
**Category:** Exploratory Data Analysis (EDA) / Regional Economics

---
## Step 1. Import libraries and load data
In this section, we set up the environment by importing the necessary Python libraries and loading the datasets required for the Jeju card analysis.

* **Libraries Used:** `pandas` for data manipulation, `matplotlib` and `seaborn` for visualization.
* **Datasets:** We load the English-translated versions of the 2017/2018 spending data and the population data to ensure a consistent analysis environment.

In [1]:
# import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# load the data
jeju_reg_17_df = pd.read_csv('../data/jeju_card_region_2017_english.csv')
jeju_reg_18_df = pd.read_csv('../data/jeju_card_region_2018_english.csv')
jeju_pop_df = pd.read_csv('../data/jeju_population_english.csv')

## Step 2. Data Exploration (EDA)
Before diving into the analysis, we perform an initial data exploration. This step is crucial for understanding the data's distribution, checking for missing values, and identifying key categorical variables.

**Key Questions to Address:**
1. What are the main business sectors (industries) present in the data?
2. How is spending distributed across different districts (Si/Gun/Gu) in Jeju?
3. Are there any inconsistencies or outliers that need to be addressed before visualization?

### 2.1 Dataset Overview
* **Dimensions (shape)**: Check the number of rows and columns to understand the data volume.

* **Sample Data (head)**: Preview the top rows to verify if the files were loaded correctly.

In [2]:
print(jeju_reg_17_df.shape) 
print(jeju_reg_18_df.shape)
print(jeju_pop_df.shape)   

(26968, 7)
(27183, 7)
(527026, 6)


In [3]:
jeju_reg_17_df.head()

Unnamed: 0,Year_Month,City/County_Life_Name,Eup/myeon/dong_name,Industry_name,gender,Number_of_users,Usage_amount
0,2017-01-01,Seogwipo City,Namwon-eup,Health supplement retailing,male,11,137500
1,2017-01-01,Seogwipo City,Cheonji-dong,Health supplement retailing,female,61,12334400
2,2017-01-01,Seogwipo City,Daejeong-eup,General retail business focusing on other food...,male,555,17301300
3,2017-01-01,Seogwipo City,Daejeong-eup,Other bar business,male,324,71843080
4,2017-01-01,Seogwipo City,Daejeong-eup,Other foreign restaurant business,male,40,971000


In [4]:
jeju_reg_18_df.head()

Unnamed: 0,Year_Month,City/County_Life_Name,Eup/myeon/dong_name,Industry_name,gender,Number_of_users,Usage_amount
0,2018-01-01,Jeju City,Aradong,Vehicle gas station operation business,male,3954,205339045
1,2018-01-01,Jeju City,Samdo 1-dong,Vehicle gas station operation business,male,490,29469792
2,2018-01-01,Jeju City,Samdo 2-dong,meat retail,female,89,2386740
3,2018-01-01,Jeju City,Samdo 1-dong,Sports and recreational equipment rental business,male,106,12517300
4,2018-01-01,Jeju City,Samdo 1-dong,seafood retail,male,37,2621000


In [5]:
jeju_pop_df.head()

Unnamed: 0,year_month_day,City/County_Life_Name,Eup/myeon/dong_name,gender,age_range,Visiting_population
0,20170101,Jeju City,Hallim-eup,other,40s,19424
1,20170101,Jeju City,Aewol-eup,female,20s,27747
2,20170101,Jeju City,Gujwa-eup,other,70s,3459
3,20170101,Jeju City,Jocheon-eup,other,40s,36695
4,20170101,Jeju City,Hangyeong-myeon,other,70s,1174


### 2.2 Data Integrity and Types
* **Schema Check (info)**: Review data types (Object, Int) and identify any missing (null) values.

* **Memory Usage**: Monitor the memory footprint, especially for the large population dataset.

In [6]:
jeju_reg_17_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26968 entries, 0 to 26967
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   Year_Month             26968 non-null  object
 1   City/County_Life_Name  26968 non-null  object
 2   Eup/myeon/dong_name    26968 non-null  object
 3   Industry_name          26968 non-null  object
 4   gender                 26968 non-null  object
 5   Number_of_users        26968 non-null  int64 
 6   Usage_amount           26968 non-null  int64 
dtypes: int64(2), object(5)
memory usage: 1.4+ MB


In [7]:
jeju_reg_18_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27183 entries, 0 to 27182
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   Year_Month             27183 non-null  object
 1   City/County_Life_Name  27183 non-null  object
 2   Eup/myeon/dong_name    27183 non-null  object
 3   Industry_name          27183 non-null  object
 4   gender                 27183 non-null  object
 5   Number_of_users        27183 non-null  int64 
 6   Usage_amount           27183 non-null  int64 
dtypes: int64(2), object(5)
memory usage: 1.5+ MB


In [8]:
jeju_pop_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 527026 entries, 0 to 527025
Data columns (total 6 columns):
 #   Column                 Non-Null Count   Dtype 
---  ------                 --------------   ----- 
 0   year_month_day         527026 non-null  int64 
 1   City/County_Life_Name  527026 non-null  object
 2   Eup/myeon/dong_name    527026 non-null  object
 3   gender                 527026 non-null  object
 4   age_range              527026 non-null  object
 5   Visiting_population    527026 non-null  int64 
dtypes: int64(2), object(4)
memory usage: 24.1+ MB


### 2.3 Statistical Summary
* **Descriptive Statistics**: Use describe() to examine the distribution of numerical values like Usage_amount and Visiting_population.

* **Categorical Diversity**: Analyze the number of unique entries in columns such as Industry_name and Eup/myeon/dong_name to understand the variety of categories.

In [9]:
pd.options.display.float_format = '{:.3f}'.format

jeju_reg_17_df.describe(include='all')

Unnamed: 0,Year_Month,City/County_Life_Name,Eup/myeon/dong_name,Industry_name,gender,Number_of_users,Usage_amount
count,26968,26968,26968,26968,26968,26968.0,26968.0
unique,12,2,43,41,2,,
top,2017-11-01,Jeju City,Nohyeong-dong,Korean restaurant business,male,,
freq,2265,16076,795,1032,13518,,
mean,,,,,,1703.495,63231393.341
std,,,,,,4313.161,198798394.063
min,,,,,,1.0,10.0
25%,,,,,,43.0,2635000.0
50%,,,,,,281.0,12125170.0
75%,,,,,,1333.25,49059281.0


In [10]:
jeju_reg_18_df.describe(include='all')

Unnamed: 0,Year_Month,City/County_Life_Name,Eup/myeon/dong_name,Industry_name,gender,Number_of_users,Usage_amount
count,27183,27183,27183,27183,27183,27183.0,27183.0
unique,12,2,43,41,2,,
top,2018-09-01,Jeju City,Nohyeong-dong,Korean restaurant business,male,,
freq,2285,16228,812,1032,13645,,
mean,,,,,,1726.604,62044256.57
std,,,,,,4399.634,190042848.439
min,,,,,,1.0,10.0
25%,,,,,,45.0,2706600.0
50%,,,,,,288.0,12311900.0
75%,,,,,,1339.0,49426169.0


In [11]:
jeju_pop_df.describe(include='all')

Unnamed: 0,year_month_day,City/County_Life_Name,Eup/myeon/dong_name,gender,age_range,Visiting_population
count,527026.0,527026,527026,527026,527026,527026.0
unique,,2,43,2,9,
top,,Jeju City,Samdo 1-dong,female,60s,
freq,,318669,12264,263533,58592,
mean,20175307.456,,,,,9931.281
std,4999.286,,,,,9974.524
min,20170101.0,,,,,0.0
25%,20170620.0,,,,,3292.0
50%,20171207.0,,,,,6811.0
75%,20180526.0,,,,,12783.0


### üìç Key Findings

* **Data Format Inconsistency (Date Column):** A significant discrepancy was identified in the 'Date' column across datasets. 
    * The spending datasets (`jeju_reg_17_df`, `jeju_reg_18_df`) store dates as **strings (object)** in the `YYYY-MM-DD` format.
    * In contrast, the population dataset (`jeju_pop_df`) stores dates as **integers** in the `YYYYMMDD` format.
* **Impact:** This inconsistency prevents direct merging or time-series comparison between spending and population data. These formats must be standardized to a unified `datetime` type in the next step.

## Step 3. Data Preprocessing & Refinement
In this stage, we address the inconsistencies identified during the initial data exploration. Our primary goal is to align the datasets into a unified format to facilitate seamless merging and accurate time-series analysis.

**Main Tasks:**
1. **Date Format Standardization:** Unified disparate date types across spending and population data.
2. **Category Alignment:** Identifying and handling values unique to specific years to ensure fair comparison.
3. **Data Type Optimization:** Converting columns to appropriate formats (e.g., numeric, datetime) for efficient computation.

### 3.1. Standardizing Date Formats
To enable cross-dataset analysis, we will:
1. Convert the string-based dates in the spending data into `datetime` objects.
2. Parse the numeric dates in the population dataset (`YYYYMMDD`) into the same `datetime` format.
This standardization is essential for accurate data merging and chronological visualization.

In [12]:
# Convert Spending Data (Year_Month: String 'YYYY-MM-DD' -> Datetime)
jeju_reg_17_df['Year_Month'] = pd.to_datetime(jeju_reg_17_df['Year_Month'])
jeju_reg_18_df['Year_Month'] = pd.to_datetime(jeju_reg_18_df['Year_Month'])

# Convert Population Data (year_month_day: Integer 20170101 -> Datetime)
jeju_pop_df['year_month_day'] = pd.to_datetime(jeju_pop_df['year_month_day'].astype(str), format='%Y%m%d')

# Verify the results
print("--- Data Types After Conversion ---")
print(f"Spending 2017 Date Type: {jeju_reg_17_df['Year_Month'].dtype}")
print(f"Spending 2018 Date Type: {jeju_reg_18_df['Year_Month'].dtype}")
print(f"Population Date Type: {jeju_pop_df['year_month_day'].dtype}")

# Display the first few rows to confirm the visual format
display(jeju_reg_17_df[['Year_Month']].head(1))
display(jeju_reg_18_df[['Year_Month']].head(1))
display(jeju_pop_df[['year_month_day']].head(1))

--- Data Types After Conversion ---
Spending 2017 Date Type: datetime64[ns]
Spending 2018 Date Type: datetime64[ns]
Population Date Type: datetime64[ns]


Unnamed: 0,Year_Month
0,2017-01-01


Unnamed: 0,Year_Month
0,2018-01-01


Unnamed: 0,year_month_day
0,2017-01-01


 ### 3.2. Aligning Categorical Values
To perform a consistent Year-over-Year (YoY) analysis, we must handle values that are **exclusive to the 2017 dataset**. 

* **Action:** We will identify these unique categories and decide whether to include them in the aggregate analysis or filter them to focus on overlapping data points between 2017 and 2018. This ensures that any observed growth or decline is not skewed by inconsistent categorization.

In [13]:
def print_unique_values(df):
    object_columns = df.columns[df.dtypes == 'object']
    for col in object_columns:
        print(f"Number of unique values in the '{col}' column: {df[col].nunique()}")
        print(sorted(df[col].unique()), '\n')

print_unique_values(jeju_reg_17_df)

Number of unique values in the 'City/County_Life_Name' column: 2
['Jeju City', 'Seogwipo City'] 

Number of unique values in the 'Eup/myeon/dong_name' column: 43
['Aewol-eup', 'Andeok-myeon', 'Aradong', 'Bonggae-dong', 'Cheonji-dong', 'Daecheon-dong', 'Daejeong-eup', 'Daeryun-dong', 'Dodu-dong', 'Donghong-dong', 'Geonip-dong', 'Gujwa-eup', 'Hallim-eup', 'Hangyeong-myeon', 'Hwabuk-dong', 'Hyodon-dong', 'Ido 1-dong', 'Ido 2-dong', 'Ildo 1-dong', 'Ildo 2-dong', 'Jeongbang-dong', 'Jocheon-eup', 'Jungang-dong', 'Jungmun-dong', 'Lee Ho-dong', "Let's do it", 'Namwon-eup', 'Nohyeong-dong', 'Oedo-dong', 'Ora-dong', 'Pyoseon-myeon', 'Samdo 1-dong', 'Samdo 2-dong', 'Samyang-dong', 'Seohong-dong', 'Seongsan-eup', 'Songsan-dong', 'Udo-myeon', 'Yeongcheon-dong', 'Yerae-dong', 'Yongdam 1-dong', 'Yongdam 2-dong', 'peristalsis'] 

Number of unique values in the 'Industry_name' column: 41
['Bath business', 'Bread and confectionery retail', 'Chinese restaurant industry', 'Cosmetics and fragrance retail',

In [50]:
jeju_reg_17_df['Year_Month'] = jeju_reg_17_df['Year_Month'].str[:7]
jeju_reg_17_df.head()

Unnamed: 0,Year_Month,City/County_Life_Name,Eup/myeon/dong_name,Industry_name,gender,Number_of_users,Usage_amount
0,2017-01,Seogwipo City,Namwon-eup,Health supplement retailing,male,11,137500
1,2017-01,Seogwipo City,Cheonji-dong,Health supplement retailing,female,61,12334400
2,2017-01,Seogwipo City,Daejeong-eup,General retail business focusing on other food...,male,555,17301300
3,2017-01,Seogwipo City,Daejeong-eup,Other bar business,male,324,71843080
4,2017-01,Seogwipo City,Daejeong-eup,Other foreign restaurant business,male,40,971000


In [44]:
print_unique_values(jeju_reg_18_df)

Number of unique values in the 'Year_Month' column: 12
['2018-01-01', '2018-02-01', '2018-03-01', '2018-04-01', '2018-05-01', '2018-06-01', '2018-07-01', '2018-08-01', '2018-09-01', '2018-10-01', '2018-11-01', '2018-12-01'] 

Number of unique values in the 'City/County_Life_Name' column: 2
['Jeju City', 'Seogwipo City'] 

Number of unique values in the 'Eup/myeon/dong_name' column: 43
['Aewol-eup', 'Andeok-myeon', 'Aradong', 'Bonggae-dong', 'Cheonji-dong', 'Daecheon-dong', 'Daejeong-eup', 'Daeryun-dong', 'Dodu-dong', 'Donghong-dong', 'Geonip-dong', 'Gujwa-eup', 'Hallim-eup', 'Hangyeong-myeon', 'Hwabuk-dong', 'Hyodon-dong', 'Ido 1-dong', 'Ido 2-dong', 'Ildo 1-dong', 'Ildo 2-dong', 'Jeongbang-dong', 'Jocheon-eup', 'Jungang-dong', 'Jungmun-dong', 'Lee Ho-dong', "Let's do it", 'Namwon-eup', 'Nohyeong-dong', 'Oedo-dong', 'Ora-dong', 'Pyoseon-myeon', 'Samdo 1-dong', 'Samdo 2-dong', 'Samyang-dong', 'Seohong-dong', 'Seongsan-eup', 'Songsan-dong', 'Udo-myeon', 'Yeongcheon-dong', 'Yerae-dong', '

In [51]:
jeju_reg_18_df['Year_Month'] = jeju_reg_18_df['Year_Month'].str[:7]
jeju_reg_18_df.head()

Unnamed: 0,Year_Month,City/County_Life_Name,Eup/myeon/dong_name,Industry_name,gender,Number_of_users,Usage_amount
0,2018-01,Jeju City,Aradong,Vehicle gas station operation business,male,3954,205339045
1,2018-01,Jeju City,Samdo 1-dong,Vehicle gas station operation business,male,490,29469792
2,2018-01,Jeju City,Samdo 2-dong,meat retail,female,89,2386740
3,2018-01,Jeju City,Samdo 1-dong,Sports and recreational equipment rental business,male,106,12517300
4,2018-01,Jeju City,Samdo 1-dong,seafood retail,male,37,2621000


In [45]:
for item in jeju_reg_17_df['Industry_name'].unique():
    if item not in jeju_reg_18_df['Industry_name'].unique():
        print(f'Unique values in 2017 dataset: {item}')
        
for item in jeju_reg_18_df['Industry_name'].unique():
    if item not in jeju_reg_17_df['Industry_name'].unique():
        print(f'Unique values in 2018 dataset: {item}')

Unique values in 2017 dataset: Other gambling and betting businesses
Unique values in 2018 dataset: taxi transportation industry


In [46]:
print(jeju_reg_17_df[jeju_reg_17_df['Industry_name'] == 'Other gambling and betting businesses'].shape)
print(jeju_reg_18_df[jeju_reg_18_df['Industry_name'] == 'taxi transportation industry'].shape)

(1, 7)
(4, 7)


In [48]:
jeju_reg_17_df = jeju_reg_17_df[jeju_reg_17_df['Industry_name'] != 'Other gambling and betting businesses']
jeju_reg_18_df = jeju_reg_18_df[jeju_reg_18_df['Industry_name'] != 'taxi transportation industry']

In [47]:
for item in jeju_reg_17_df['Eup/myeon/dong_name'].unique():
    if item not in jeju_reg_18_df['Eup/myeon/dong_name'].unique():
        print(f'Unique values in 2017 dataset: {item}')
        
for item in jeju_reg_18_df['Eup/myeon/dong_name'].unique():
    if item not in jeju_reg_17_df['Eup/myeon/dong_name'].unique():
        print(f'Unique values in 2018 dataset: {item}')

### 3.3. Preparation for Data Integration
Finally, we will verify the data types and column names one last time. By ensuring that keys (Date, District, etc.) are identical across all DataFrames, we prepare the groundwork for a successful join between the spending and population datasets.

In [49]:
jeju_reg_df = pd.concat([jeju_reg_17_df, jeju_reg_18_df])

jeju_reg_df.shape

(54146, 7)

## Step 4. Data Visualization & Insights
Now, we visualize the processed data to uncover hidden patterns in Jeju's economy.

### 4.1. Spending by Industry
We analyze which business sectors contribute most to the local economy and how spending behavior differs between male and female users.

### 4.2. Regional Analysis
Comparison of spending patterns between Jeju-si and Seogwipo-si to identify regional economic hubs.

## Step 5. Conclusion
This section summarizes the key findings of the analysis.

* **Key Takeaway 1:** [Write your finding here, e.g., 'The food and beverage industry saw the highest growth in 2018.']
* **Key Takeaway 2:** [e.g., 'Spending in Seogwipo-si is highly seasonal compared to Jeju-si.']
* **Final Thoughts:** Based on these insights, we can suggest [Actionable Insight].