# Data Exploration: Somalia Agrifood Datasets

This notebook explores the raw datasets located in `data/raw/` to understand their structure and content.

In [1]:
import pandas as pd

# Set display options for better visibility
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

## 1. WFP Food Prices for Somalia

File: `wfp_food_prices_som.csv`

In [2]:
# Path to the dataset
fp_path = '../data/raw/wfp_food_prices_som.csv'

# Loading the data (skipping the second row which contains HXL tags)
df_prices = pd.read_csv(fp_path, skiprows=[1])

print(f"Dataset shape: {df_prices.shape}")
df_prices.info()
df_prices.head()

Dataset shape: (37538, 16)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37538 entries, 0 to 37537
Data columns (total 16 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   date          37538 non-null  object 
 1   admin1        37247 non-null  object 
 2   admin2        37247 non-null  object 
 3   market        37538 non-null  object 
 4   market_id     37538 non-null  int64  
 5   latitude      37247 non-null  float64
 6   longitude     37247 non-null  float64
 7   category      37538 non-null  object 
 8   commodity     37538 non-null  object 
 9   commodity_id  37538 non-null  int64  
 10  unit          37538 non-null  object 
 11  priceflag     37538 non-null  object 
 12  pricetype     37538 non-null  object 
 13  currency      37538 non-null  object 
 14  price         37538 non-null  float64
 15  usdprice      32763 non-null  float64
dtypes: float64(4), int64(2), object(10)
memory usage: 4.6+ MB


Unnamed: 0,date,admin1,admin2,market,market_id,latitude,longitude,category,commodity,commodity_id,unit,priceflag,pricetype,currency,price,usdprice
0,1995-01-15,Banadir,Banadir,Bakaara,6634,2.05,45.32,cereals and tubers,Sorghum (red),282,KG,actual,Retail,SOS,700.0,0.27
1,1995-02-15,Banadir,Banadir,Bakaara,6634,2.05,45.32,cereals and tubers,Sorghum (red),282,KG,actual,Retail,SOS,525.0,0.2
2,1995-03-15,Banadir,Banadir,Bakaara,6634,2.05,45.32,cereals and tubers,Sorghum (red),282,KG,actual,Retail,SOS,600.0,0.23
3,1995-04-15,Banadir,Banadir,Bakaara,6634,2.05,45.32,cereals and tubers,Sorghum (red),282,KG,actual,Retail,SOS,900.0,0.34
4,1995-05-15,Banadir,Banadir,Bakaara,6634,2.05,45.32,cereals and tubers,Sorghum (red),282,KG,actual,Retail,SOS,1025.0,0.39


## 2. Suite of Food Security Indicators

File: `suite-of-food-security-indicators_som.csv`

### Column Explanations:
- **Iso3**: Country ISO code (SOM).
- **StartDate / EndDate**: The date range for the indicator measurement.
- **Area Code / Area Code (M49)**: Numeric codes for the geographic area (FAO and UN M49 standards).
- **Area**: Name of the country (Somalia).
- **Item Code / Item**: Unique ID and name of the food security indicator (e.g., "Prevalence of undernourishment").
- **Element Code / Element**: Usually "Value", indicating the type of measurement reported.
- **Year Code / Year**: Numeric codes for the reference year or period.
- **Unit**: Unit of measurement (%, kcal/capita, etc.).
- **Value**: The actual data point. (Note: May be string type to handle ranges like '<2.5').
- **Flag**: Data status (e.g., 'E' for Estimate).
- **Note**: Additional context or source notes.

In [2]:
fs_path = '../data/raw/suite-of-food-security-indicators_som.csv'

# Loading the data (skipping the second row which contains HXL tags)
df_security = pd.read_csv(fs_path, skiprows=[1])

print(f"Dataset shape: {df_security.shape}")
df_security.info()
df_security.head()

Dataset shape: (973, 16)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 973 entries, 0 to 972
Data columns (total 16 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Iso3             973 non-null    object
 1   StartDate        973 non-null    object
 2   EndDate          973 non-null    object
 3   Area Code        973 non-null    int64 
 4   Area Code (M49)  973 non-null    object
 5   Area             973 non-null    object
 6   Item Code        973 non-null    object
 7   Item             973 non-null    object
 8   Element Code     973 non-null    int64 
 9   Element          973 non-null    object
 10  Year Code        973 non-null    int64 
 11  Year             973 non-null    int64 
 12  Unit             951 non-null    object
 13  Value            685 non-null    object
 14  Flag             973 non-null    object
 15  Note             36 non-null     object
dtypes: int64(4), object(12)
memory usage: 121.8+ KB


Unnamed: 0,Iso3,StartDate,EndDate,Area Code,Area Code (M49),Area,Item Code,Item,Element Code,Element,Year Code,Year,Unit,Value,Flag,Note
0,SOM,2000-01-01,2002-12-31,201,'706,Somalia,21010,Average dietary energy supply adequacy (percen...,6121,Value,20002002,2002,%,72,E,
1,SOM,2001-01-01,2003-12-31,201,'706,Somalia,21010,Average dietary energy supply adequacy (percen...,6121,Value,20012003,2003,%,72,E,
2,SOM,2002-01-01,2004-12-31,201,'706,Somalia,21010,Average dietary energy supply adequacy (percen...,6121,Value,20022004,2004,%,72,E,
3,SOM,2003-01-01,2005-12-31,201,'706,Somalia,21010,Average dietary energy supply adequacy (percen...,6121,Value,20032005,2005,%,72,E,
4,SOM,2004-01-01,2006-12-31,201,'706,Somalia,21010,Average dietary energy supply adequacy (percen...,6121,Value,20042006,2006,%,72,E,


### Unique Indicators (Item Codes)
Each `Item Code` corresponds to a specific food security or nutrition metric. Run the cell below to see the full list of indicators available in this dataset:

In [5]:
# Display unique Item names without showing the index/ID
import pandas as pd
pd.set_option('display.max_colwidth', None)

indicators = df_security['Item'].drop_duplicates().to_list()
for item in indicators:
    print(item)

Average dietary energy supply adequacy (percent) (3-year average)
Dietary energy supply used in the estimation of the prevalence of undernourishment (kcal/cap/day)
Dietary energy supply used in the estimation of the prevalence of undernourishment (kcal/cap/day) (3-year average)
Share of dietary energy supply derived from cereals, roots and tubers (percent) (3-year average)
Average protein supply (g/cap/day) (3-year average)
Average supply of protein of animal origin (g/cap/day) (3-year average)
Gross domestic product per capita, PPP, (constant 2021 international $)
Prevalence of undernourishment (percent) (3-year average)
Number of people undernourished (million) (3-year average)
Prevalence of severe food insecurity in the total population (percent) (3-year average)
Prevalence of severe food insecurity in the male adult population (percent) (3-year average)
Prevalence of severe food insecurity in the female adult population (percent) (3-year average)
Prevalence of moderate or severe fo

## 3. Nigeria Admin1 Crop Production

File: `nigeria-admin1-crop-production.csv`

In [3]:
cp_path = '../data/raw/somalia-admin1-crop-production.csv'

# Loading the data (skipping the second row which contains HXL tags)
df_crop = pd.read_csv(cp_path, skiprows=[1])

print(f"Dataset shape: {df_crop.shape}")
df_crop.info()
df_crop.head()

Dataset shape: (1256, 7)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1256 entries, 0 to 1255
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   country_name  1256 non-null   object 
 1   admin1_name   1256 non-null   object 
 2   latitude      1256 non-null   float64
 3   longitude     1256 non-null   float64
 4   aggregation   1256 non-null   object 
 5   indicator     1256 non-null   object 
 6   value         1256 non-null   float64
dtypes: float64(3), object(4)
memory usage: 68.8+ KB


Unnamed: 0,country_name,admin1_name,latitude,longitude,aggregation,indicator,value
0,Somalia,Middle Juba,1.75,41.75,none,crop-production.mai.noirr.USD,270258.0
1,Somalia,Middle Juba,1.75,42.25,none,crop-production.mai.noirr.USD,206423.0
2,Somalia,Middle Juba,1.25,42.75,none,crop-production.mai.noirr.USD,88035.0
3,Somalia,Middle Juba,0.75,42.75,none,crop-production.mai.noirr.USD,0.0
4,Somalia,Middle Juba,0.75,43.25,none,crop-production.mai.noirr.USD,53175.0


### Unique Indicators (Crop Production)
Run the cell below to see the unique indicators available in the crop production dataset:

In [8]:
# Get unique indicator values
unique_indicators = df_crop['indicator'].unique()
for indicator in sorted(unique_indicators):
    print(indicator)

crop-production.mai.firr.USD
crop-production.mai.noirr.USD
crop-production.ric.firr.USD
crop-production.ric.noirr.USD
crop-production.soy.firr.USD
crop-production.soy.noirr.USD
crop-production.whe.firr.USD
crop-production.whe.noirr.USD


## 4. Data Filter: Year 2018

Filtering both Food Prices and Food Security indicators for the period **01 January 2018 to 31 December 2018**.

In [6]:
# 1. Filter Food Prices for 2018
df_prices['date'] = pd.to_datetime(df_prices['date'])
df_prices_2018 = df_prices[df_prices['date'].between('2018-01-01', '2018-12-31')]

print(f"Food Prices (2018) shape: {df_prices_2018.shape}")

# 2. Filter Food Security Indicators for 2018
# Indicators that overlap with 2018 (either StartDate or EndDate falls within 2018, or the period covers 2018)
df_security['StartDate'] = pd.to_datetime(df_security['StartDate'])
df_security['EndDate'] = pd.to_datetime(df_security['EndDate'])

df_security_2018 = df_security[
    (df_security['StartDate'] <= '2018-12-31') & 
    (df_security['EndDate'] >= '2018-01-01')
]

print(f"Food Security Indicators (overlapping with 2018) shape: {df_security_2018.shape}")

Food Prices (2018) shape: (2086, 16)
Food Security Indicators (overlapping with 2018) shape: (161, 16)


In [7]:
print("--- Food Prices 2018 Sample ---")
display(df_prices_2018.head())

print("\n--- Food Security 2018 Sample ---")
display(df_security_2018.head())


--- Food Prices 2018 Sample ---


Unnamed: 0,date,admin1,admin2,market,market_id,latitude,longitude,category,commodity,commodity_id,unit,priceflag,pricetype,currency,price,usdprice
10896,2018-01-15,Awdal,Borama,Borama,1413,9.94,43.18,cereals and tubers,Maize (white),67,KG,actual,Retail,SOS,7200.0,12.52
10897,2018-01-15,Awdal,Borama,Borama,1413,9.94,43.18,cereals and tubers,Pasta,112,KG,actual,Retail,SOS,8000.0,13.91
10898,2018-01-15,Awdal,Borama,Borama,1413,9.94,43.18,cereals and tubers,Rice (imported),64,KG,actual,Retail,SOS,5600.0,9.74
10899,2018-01-15,Awdal,Borama,Borama,1413,9.94,43.18,cereals and tubers,Sorghum (red),282,KG,actual,Retail,SOS,5000.0,8.7
10900,2018-01-15,Awdal,Borama,Borama,1413,9.94,43.18,cereals and tubers,Sorghum (white),135,KG,actual,Retail,SOS,5000.0,8.7



--- Food Security 2018 Sample ---


Unnamed: 0,Iso3,StartDate,EndDate,Area Code,Area Code (M49),Area,Item Code,Item,Element Code,Element,Year Code,Year,Unit,Value,Flag,Note
16,SOM,2016-01-01,2018-12-31,201,'706,Somalia,21010,Average dietary energy supply adequacy (percen...,6121,Value,20162018,2018,%,78,E,
17,SOM,2017-01-01,2019-12-31,201,'706,Somalia,21010,Average dietary energy supply adequacy (percen...,6121,Value,20172019,2019,%,80,E,
18,SOM,2018-01-01,2020-12-31,201,'706,Somalia,21010,Average dietary energy supply adequacy (percen...,6121,Value,20182020,2020,%,81,E,
40,SOM,2018-01-01,2018-12-31,201,'706,Somalia,220001,Dietary energy supply used in the estimation o...,6128,Value,2018,2018,kcal/cap/d,1723,E,
62,SOM,2016-01-01,2018-12-31,201,'706,Somalia,22000,Dietary energy supply used in the estimation o...,6128,Value,20162018,2018,kcal/cap/d,1709,E,
