# Introduction

In this project, we are going to apply the K-Means clustering to solve for the following problems:

- How has the price of rice changed in different regions over time? 
- Which regions experience the most volatile rice prices? 
- Is there an existing seasonal pattern in rice price volatility? 

The [dataset](https://www.kaggle.com/datasets/usmanlovescode/philippines-food-prices-dataset/data), sourced from the World Food Programme Price Database, offers an in-depth look at food prices in the Philippines. Covering key staples such as maize, rice, beans, fish, and sugar, it serves as a valuable tool for analyzing the dynamics of the nation’s food economy.


| **Column Name** | **Description** |
|------------------|-----------------|
| `date`           | The date when the food price was recorded. Helps track trends over time. |
| `admin1`         | The first-level administrative region (e.g., province or region like "Metro Manila"). |
| `admin2`         | The second-level administrative region (e.g., city or municipality within a province). |
| `market`         | The name of the specific market where the price was observed. |
| `latitude`       | Geographic latitude of the market. Useful for mapping and regional analysis. |
| `longitude`      | Geographic longitude of the market. |
| `category`       | Broad classification of the food (e.g., cereals, meats, legumes). |
| `commodity`      | Specific food item being priced (e.g., "Rice (local)", "Sugar", "Tilapia"). |
| `unit`           | The unit of measurement for the price (e.g., "kg", "piece", "liter"). |
| `priceflag`      | Indicates how the price was obtained: `"actual"` (directly recorded) or `"aggregate"` (averaged or estimated). |
| `pricetype`      | Type of market price: typically `"retail"` or `"wholesale"`. |
| `currency`       | Currency in which the price is reported — usually `"Philippine Peso"`. |
| `price`          | The reported price of the commodity in the local currency and unit. |
| `usdprice`       | The same price converted to USD (based on exchange rates at the time). |

In [2]:
import pandas as pd

pd.set_option('display.max_columns', 500)
food_prices_data = pd.read_csv("wfp_food_prices_phl.csv", skiprows=[1])

# Data Cleaning and Exploration

In [3]:
# Number of data and features 
print(food_prices_data.shape)
print(len(str(food_prices_data.shape))*'-')

# Check how many types of data are in the dataset
print(food_prices_data.dtypes.value_counts())
print(len(str(food_prices_data.shape))*'-')

# Check the first 16 columns
food_prices_data.head(16)

(121512, 14)
------------
object     10
float64     4
Name: count, dtype: int64
------------


Unnamed: 0,date,admin1,admin2,market,latitude,longitude,category,commodity,unit,priceflag,pricetype,currency,price,usdprice
0,2000-01-15,National Capital region,Metropolitan Manila,Metro Manila,14.604167,120.982222,cereals and tubers,Maize flour (yellow),KG,actual,Retail,PHP,15.0,0.3717
1,2000-01-15,National Capital region,Metropolitan Manila,Metro Manila,14.604167,120.982222,cereals and tubers,"Rice (milled, superior)",KG,actual,Retail,PHP,20.0,0.4957
2,2000-01-15,National Capital region,Metropolitan Manila,Metro Manila,14.604167,120.982222,cereals and tubers,"Rice (milled, superior)",KG,actual,Wholesale,PHP,18.35,0.4548
3,2000-01-15,National Capital region,Metropolitan Manila,Metro Manila,14.604167,120.982222,cereals and tubers,"Rice (regular, milled)",KG,actual,Wholesale,PHP,16.35,0.4052
4,2000-01-15,Region III,Nueva Ecija,Palayan,15.5415,121.0848,cereals and tubers,"Rice (milled, superior)",KG,actual,Retail,PHP,19.0,0.4709
5,2000-01-15,Region III,Nueva Ecija,Palayan,15.5415,121.0848,cereals and tubers,"Rice (milled, superior)",KG,actual,Wholesale,PHP,18.0,0.4461
6,2000-01-15,Region III,Nueva Ecija,Palayan,15.5415,121.0848,cereals and tubers,"Rice (regular, milled)",KG,actual,Retail,PHP,18.1,0.4486
7,2000-01-15,Region IX,Zamboanga del Sur,Zamboanga City,6.910255,122.071715,cereals and tubers,"Rice (regular, milled)",KG,actual,Retail,PHP,16.9,0.4188
8,2000-01-15,Region VI,Iloilo,Iloilo City,10.696944,122.564444,cereals and tubers,"Rice (milled, superior)",KG,actual,Retail,PHP,20.4,0.5056
9,2000-01-15,Region VI,Iloilo,Iloilo City,10.696944,122.564444,cereals and tubers,"Rice (milled, superior)",KG,actual,Wholesale,PHP,15.6,0.3866


### Check for Missing Values

In [4]:
food_prices_data.isnull().sum()

date         0
admin1       0
admin2       0
market       0
latitude     0
longitude    0
category     0
commodity    0
unit         0
priceflag    0
pricetype    0
currency     0
price        0
usdprice     0
dtype: int64

### Check Column Values

In [5]:
commodity_counts = food_prices_data["commodity"].value_counts().reset_index()
commodity_counts.columns = ["commodity", "count"]
print(commodity_counts.to_string(index=False))


                          commodity  count
             Rice (regular, milled)   4969
                            Cabbage   4459
                          Eggplants   3528
                 Rice (well milled)   3518
                    Fish (milkfish)   3389
                       Bitter melon   3375
                   Fish (roundscad)   3272
                     Rice (special)   3235
                            Coconut   3186
                       Beans (mung)   3171
            Meat (pork, with bones)   3170
                     Fish (tilapia)   3074
                  Bananas (lakatan)   2876
                           Squashes   2875
            Rice (milled, superior)   2850
                          Anchovies   2764
                              Choko   2630
                             Ginger   2523
                           Tomatoes   2507
                               Eggs   2473
                 Bananas (latundan)   2468
                  Mangoes (carabao)   2412
           

In [6]:
food_prices_data["currency"].value_counts()

currency
PHP    121512
Name: count, dtype: int64

### Filter Unnecessary Data

In [7]:
food_prices_data = food_prices_data.drop(columns=["category", "currency"])

food_prices_data.head()

Unnamed: 0,date,admin1,admin2,market,latitude,longitude,commodity,unit,priceflag,pricetype,price,usdprice
0,2000-01-15,National Capital region,Metropolitan Manila,Metro Manila,14.604167,120.982222,Maize flour (yellow),KG,actual,Retail,15.0,0.3717
1,2000-01-15,National Capital region,Metropolitan Manila,Metro Manila,14.604167,120.982222,"Rice (milled, superior)",KG,actual,Retail,20.0,0.4957
2,2000-01-15,National Capital region,Metropolitan Manila,Metro Manila,14.604167,120.982222,"Rice (milled, superior)",KG,actual,Wholesale,18.35,0.4548
3,2000-01-15,National Capital region,Metropolitan Manila,Metro Manila,14.604167,120.982222,"Rice (regular, milled)",KG,actual,Wholesale,16.35,0.4052
4,2000-01-15,Region III,Nueva Ecija,Palayan,15.5415,121.0848,"Rice (milled, superior)",KG,actual,Retail,19.0,0.4709


Since we will only be focusing on rice prices, the category column is unnecessary because we will only have 1 category. Currency will also be removed since all valUes in price column are in PHP

### Change Data Types

Convert price, latitude, and longitude to numbers, and date to datetime format. Invalid entries are safely turned into NaN or NaT using errors="coerce". 

In [8]:
food_prices_data["price"] = pd.to_numeric(food_prices_data["price"], errors="coerce")
food_prices_data["latitude"] = pd.to_numeric(food_prices_data["latitude"], errors="coerce")
food_prices_data["longitude"] = pd.to_numeric(food_prices_data["longitude"], errors="coerce")
food_prices_data["date"] = pd.to_datetime(food_prices_data["date"], errors="coerce")

print(food_prices_data.dtypes.value_counts())

object            7
float64           4
datetime64[ns]    1
Name: count, dtype: int64


### Filter Rice From Commodities

In [9]:
all_rice_df = food_prices_data[food_prices_data["commodity"].str.startswith("Rice", na=False)]

all_rice_df["commodity"].value_counts()



commodity
Rice (regular, milled)     4969
Rice (well milled)         3518
Rice (special)             3235
Rice (milled, superior)    2850
Rice (paddy)                664
Rice (premium)              598
Name: count, dtype: int64

Since we only need rice commodities, we need to filter out all data that aren't rice

In [19]:
all_rice_df = all_rice_df[(all_rice_df['date'] >= '2018-01-01') & (all_rice_df['date'] <= '2023-12-31')]

all_rice_df.shape

(10397, 12)

All data entries must be between 2018 - 2023

In [12]:
food_prices_data["admin1"].value_counts()


admin1
Region III                              10697
Region VI                                9110
Region V                                 9002
Region VIII                              8510
Region XI                                8127
Cordillera Administrative region         7688
Region IV-A                              7253
Region X                                 7226
Region IV-B                              7118
Region XIII                              6959
Region XII                               6955
Region VII                               6731
Region II                                5887
Region I                                 5860
Region IX                                5837
Autonomous region in Muslim Mindanao     5613
National Capital region                  2939
Name: count, dtype: int64

In [13]:
food_prices_data["admin2"].value_counts()


admin2
Davao del Sur        4719
Zamboanga del Sur    3416
Cebu                 3381
Iloilo               3136
South Cotabato       3066
                     ... 
Dinagat Islands       878
Sulu                  843
Tawi-Tawi             698
Lanao del Sur         644
Camarines Sur         158
Name: count, Length: 79, dtype: int64

In [14]:
food_prices_data["market"].value_counts()

market
Davao City             3272
Metro Manila           2939
Cebu City              2087
Iloilo City            1946
Zamboanga City         1880
                       ... 
Butuan City             265
Cagayan de Oro City     253
Davao Occidental        196
Naga City               158
Calapan City            146
Name: count, Length: 108, dtype: int64

In [15]:
food_prices_data["pricetype"].value_counts()

pricetype
Retail       116481
Wholesale      4367
Farm Gate       664
Name: count, dtype: int64

In [16]:
food_prices_data["unit"].value_counts()

unit
KG        118156
Unit        3254
750 ML       102
Name: count, dtype: int64

In [17]:
food_prices_data["priceflag"].value_counts()

priceflag
actual       117396
aggregate      4116
Name: count, dtype: int64

In [21]:
all_rice_df.head(30)

Unnamed: 0,date,admin1,admin2,market,latitude,longitude,commodity,unit,priceflag,pricetype,price,usdprice
16330,2018-01-15,Autonomous region in Muslim Mindanao,Maguindanao,Shariff Aguak,6.864722,124.441667,"Rice (regular, milled)",KG,actual,Retail,39.2,0.7775
16351,2018-01-15,National Capital region,Metropolitan Manila,Metro Manila,14.604167,120.982222,"Rice (milled, superior)",KG,actual,Retail,42.13,0.8356
16352,2018-01-15,National Capital region,Metropolitan Manila,Metro Manila,14.604167,120.982222,"Rice (milled, superior)",KG,actual,Wholesale,38.11,0.7559
16353,2018-01-15,National Capital region,Metropolitan Manila,Metro Manila,14.604167,120.982222,"Rice (regular, milled)",KG,actual,Wholesale,38.02,0.7541
16362,2018-01-15,Region I,Pangasinan,Lingayen,16.016667,120.233333,"Rice (regular, milled)",KG,actual,Retail,36.76,0.7291
16363,2018-01-15,Region I,Pangasinan,Lingayen,16.016667,120.233333,"Rice (regular, milled)",KG,actual,Wholesale,35.0,0.6942
16374,2018-01-15,Region II,Cagayan,Tuguegarao City,17.6125,121.7281,"Rice (regular, milled)",KG,actual,Wholesale,36.4,0.722
16384,2018-01-15,Region III,Nueva Ecija,Palayan,15.5415,121.0848,"Rice (milled, superior)",KG,actual,Retail,42.9,0.8509
16385,2018-01-15,Region III,Nueva Ecija,Palayan,15.5415,121.0848,"Rice (milled, superior)",KG,actual,Wholesale,39.29,0.7793
16401,2018-01-15,Region IV-B,Palawan,Puerto Princesa,9.739167,118.735278,"Rice (regular, milled)",KG,actual,Retail,32.97,0.6539
