## FDW Crop Production Data Profiling - Malawi

In [1]:
import os, sys, glob, json
from itertools import product, compress, chain
from functools import reduce
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import requests
import numpy as np
import pandas as pd
import geopandas as gpd
from tools import save_hdf, PrintAdminUnits, PlotAdminShapes
from tools import FDW_PD_Sweeper, FDW_PD_AvalTable, FDW_PD_Compiling, FDW_PD_GrainTypeAgg, FDW_PD_ValidateFnidName
from tools import FDW_PD_CreateAdminLink, FDW_PD_RatioAdminLink, FDW_PD_ConnectAdminLink
from tools_graphic import PlotBarProduction, PlotLinePAY, PlotHeatCropSystem, PlotHeatSeasonData
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)
pd.options.mode.chained_assignment = None

In [2]:
# CPCV2 grain code ------------------------------ #
grain_code = pd.read_hdf('./data/crop/grain_cpcv2_code.hdf')
product_category = grain_code[['product', 'product_category']].set_index('product').to_dict()['product_category']
# ----------------------------------------------- #

# Load FEWS NET administrative boundaries ------- #
epsg = 'EPSG:32736'
fn_shapes = sorted(glob.glob('./data/shapefile/fewsnet/MW_Admin?_????.shp'))
shape_all = []
for fn in fn_shapes:
    name = fn[-18:-4]
    exec('%s = gpd.read_file("%s").to_crs("%s")' % (name, fn, epsg))
    exec('%s["area"] = %s["geometry"].area/10**6' % (name, name))
    exec('shape_all.append(%s)' % (name))
shape_all = pd.concat(shape_all, axis=0).reset_index(drop=True)
PrintAdminUnits(shape_all)
# ----------------------------------------------- #

# FDW API host address -------------------------- #
host = 'https://fdw.fews.net'
auth = tuple(json.loads(open('token.json', "r").read()))
parameters = {
    'format': 'json',
    'country': 'Malawi',
    'product': 'R011',
    'survey_type': 'crop:best'
}
endpoint = '/api/cropproductionindicatorvalue/'
response = requests.get(host + endpoint, auth=auth, params=parameters, proxies={})
response.raise_for_status()
df = pd.DataFrame.from_records(response.json())
df_origin = df.copy()
# ----------------------------------------------- #

# Manual Pre-processing before Sweeping --------- #
# 1. Default setting 
# a) None-type population group
df.loc[df['population_group'].isna(), 'population_group'] = 'none'
df.loc[df['population_group'] == '', 'population_group'] = 'none'
# ----------------------------------------------- #

# FDW Production Data Inspection ---------------- #
df, df_raw = FDW_PD_Sweeper(df)
table_dict = FDW_PD_AvalTable(df, shape_all)
# ----------------------------------------------- #

# FEWS NET Shapefile comparison ----------------- #
shape_used = pd.concat([MW_Admin2_1971, MW_Admin2_1998, MW_Admin2_2003], axis=0)
PlotAdminShapes(shape_used, label=True)
# ----------------------------------------------- #

- FEWS NET admin shapefiles ------------------- #
        Admin1  # units    Admin2  # units
year                                      
1971  MW1971A1        3  MW1971A2       24
1998  MW1998A1        3  MW1998A2       27
2003  MW2003A1        3  MW2003A2       28
2018  MW2018A1        3  MW2018A2       32
----------------------------------------------- #
- Remove missing records ---------------------- #
Orignial data points: 10,797
Removed 360 "Missing Historic Data" points
3,599/3,599 "Area Planted" points are retained.
3,419/3,599 "Quantity Produced" points are retained.
3,419/3,599 "Yield" points are retained.
Current data points: 10,437

- Minor changes are applied.. ----------------- #

- Basic information --------------------------- #
Data period: 1982 - 2017
7 grain types are found: Maize Grain (White), Millet, Millet (Finger), Millet (Pearl), Rice (Paddy), Sorghum, Wheat Grain
2 seasons are found: Main (04-01), Winter (09-01)
3 crop production system are found: none, Estate (P


- Malawi crop seasonal calendar

![FEWS NET](https://fews.net/sites/default/files/styles/large/public/seasonal-calendar-malawi.png?itok=O9LYAi2g)

- FDW data consists of `MW1971A2`, `MW1998A2`, and `MW2003A2`.

| Year | Admin-1 | # units  | Admin-2  | # units |
| :---: | :----:  | :----:   | :----:   | :---:  |
| 1971 | MW1971A1 | 3 | **`MW1971A2`** | 24 |
| 1998 | MW1998A1 | 3 | **`MW1998A2`** | 27 |
| 2003 | MW2003A1 | 3 | **`MW2003A2`** | 28 |
| 2018 | MW2018A1 | 3 | MW2018A2 | 32 |

- Comparison between admin boundaries</br>

![image](https://github.com/chc-ucsb/GlobalCropData/blob/main/figures/MW_admin_shapes.png?raw=true)

- In 1998, 3 districts are divided and added.

| 1971-1998 (original) |1998-2003 (changed)|1998-2003 (added)|
| :---:| :---:|:---:|
|Nkhata Bay (MW1971A20107)    | Nkhata Bay (MW1998A20103)    | Likoma (MW1998A20106)|
|Mulanje (MW1971A20315)       | Mulanje (MW1998A20308)       | Phalombe (MW1998A20309)|
|Machinga (MW1971A20316)      | Machinga (MW1998A20302)      | Balaka (MW1998A20312)|

 ** Nkhata Bay (MW1971A20107) is divided into a very big district (MW1998A20103, Nkhata Bay) with a very small island (MW1998A20106, Likoma).

- In 2003, 1 district is divided and added.

| 1998-2003 (original) |2003-current (changed)|2003-current (added)|
| :---:| :---:|:---:|
|Mwanza (MW1998A20314)    | Mwanza (MW2003A20306)    | Neno (MW2003A20313)|

- Four reporting units exist:

| Admin-2 | Reporting units |
| :---: | :---: |
| MW2003A20105 (Mzimba)|  MW2012R3010501 (Mzimba North RDP), MW2012R3010502 (Mzimba South RDP) |
| MW2003A20206 (Lilongwe)|  MW2014R3020601 (Lilongwe West RDP), MW2014R3020602 (Lilongwe East RDP) |

- **`MW2003A2`** is used to represent current admin-level 2 crop data.
- Malawi has two crop seasons: `Main` and `Winter` (from 2006).
- Malawi has three crop production systems: `None`, `Estate (PS)`, and `estate`.

In [3]:
# Define the latest shapefile ------------------- #
latest_level = 2
shape_latest = MW_Admin2_2003.copy().to_crs('epsg:4326')
# ----------------------------------------------- #

# Validation of FNIDs and Names ----------------- #
df = FDW_PD_ValidateFnidName(df, shape_used, shape_latest)
# ----------------------------------------------- #

# FDW Production Data Compiling ----------------- #
area, prod = FDW_PD_Compiling(df, shape_used)
area_all, prod_all = area.copy(), prod.copy()
mdx_pss = area.columns.droplevel([0,1]).unique()
# ----------------------------------------------- #

In [4]:
# Link admin boundaries ------------------------- #
link_1971, over_1971 = FDW_PD_CreateAdminLink(MW_Admin2_1971, MW_Admin2_2003, 'ADMIN2', 'ADMIN2', area, epsg)
assert all(np.unique([v['method'] for k,v in link_1971.items()]) == 'ABR')
link_1998, over_1998 = FDW_PD_CreateAdminLink(MW_Admin2_1998, MW_Admin2_2003, 'ADMIN2', 'ADMIN2', area, epsg)
assert all(np.unique([v['method'] for k,v in link_1998.items()]) == 'ABR')
# Crop specific ratios
link_ratio_1971 = FDW_PD_RatioAdminLink(link_1971, area, over_1971, mdx_pss)
link_ratio_1998 = FDW_PD_RatioAdminLink(link_1998, area, over_1998, mdx_pss)
# Merge link_ratio
assert link_ratio_1971.keys() == link_ratio_1998.keys()
link_merged = [link_ratio_1971, link_ratio_1998]
fnids_new = list(link_merged[0].keys())
link_ratio = dict()
for fnid in fnids_new:
    container = []
    for link in link_merged:
        container.append(link[fnid])
    link_ratio[fnid] = pd.concat(container, axis=1)
# Add current unit to link_ratio
for fnid_new in link_ratio.keys():
    link_ratio[fnid_new][fnid_new] = 1.0
    link_ratio[fnid_new] = link_ratio[fnid_new].sort_index(axis=1, ascending=False)
# Manual Editing
link_ratio['MW2003A20105'][['MW2012R3010501', 'MW2012R3010502']] = 1.0
link_ratio['MW2003A20206'][['MW2014R3020601', 'MW2014R3020602']] = 1.0
# Connect data with AdminLink
area_new, prod_new = FDW_PD_ConnectAdminLink(link_ratio, area, prod, validation=True)
# ----------------------------------------------- #

# Aggregate grain data by grain type ------------ #
[area_new, prod_new, area_all, prod_all] = FDW_PD_GrainTypeAgg([area_new, prod_new, area_all, prod_all], product_category)
# ----------------------------------------------- #

# Manual correction ----------------------------- #
crop_new = prod_new/area_new
# Potential typo: 567997.0 -> 156799.7
area_new.loc[2016,pd.IndexSlice['MW2003A20206',:,'Maize','Main',:]] = 156799
# Potential typo: 272.0 -> 2720
prod_new.loc[2016,pd.IndexSlice['MW2003A20311',:,'Maize','Main',:]] = 2720
# Potential typo: 96271.0 -> 46271.0
area_new.loc[2004,pd.IndexSlice['MW2003A20307',:,'Maize','Main',:]] = 46271
# ----------------------------------------------- #

# Complete long format DataFrame
df_area = area_new.T.stack().reset_index().rename({0:'value'},axis=1)
df_area['indicator'] = 'area'
df_prod = prod_new.T.stack().reset_index().rename({0:'value'},axis=1)
df_prod['indicator'] = 'production'
df_yield = (prod_new/area_new).T.stack().reset_index().rename({0:'value'},axis=1)
df_yield['indicator'] = 'yield'
stack = pd.concat([df_area, df_prod, df_yield], axis=0)
# Insert a country name
stack['country'] = 'Malawi'
stack = stack[['fnid','country','name','product','year','season_name','season_date','indicator','value']]
stack = stack.reset_index(drop=True)
# Change season_date to harvest_end
stack.rename(columns={'season_date':'harvest_end'},inplace=True)
stack['harvest_end'] = stack['harvest_end'].replace({
    '04-01':'07-01', # Main
    '09-01':'11-01'  # Winter
})

# Save data
save_hdf('./data/crop/adm_crop_production_raw_MW.hdf', df)
save_hdf('./data/crop/adm_crop_production_MW.hdf', stack)

- Aggregation of grain types ------------------ #
7 crops: Maize Grain (White), Millet, Millet (Finger), Millet (Pearl), Rice (Paddy), Sorghum, Wheat Grain
5 crops: Maize, Millet, Rice, Sorghum, Wheat

./data/crop/adm_crop_production_raw_MW.hdf is saved.
./data/crop/adm_crop_production_MW.hdf is saved.


## Visualization of production data

In [5]:
# Bar chart of national grain production
country_iso, country_name = 'MW', 'Malawi'
df = pd.read_hdf('./data/crop/adm_crop_production_%s.hdf' % country_iso)
product_order = ['Maize','Millet','Sorghum','Wheat','Rice']
for season_name in ['Main','Winter']:
    footnote = 'National grain production in %s - %s' % (country_name, season_name)
    fn_save = './figures/%s_bar_natgrainprod_%s.png' % (country_iso, season_name)
    sub = df[df['season_name'] == season_name]
    fig = PlotBarProduction(sub, product_order, footnote, fn_save)
    # fig.show()

./figures/MW_bar_natgrainprod_Main.png is saved.
./figures/MW_bar_natgrainprod_Winter.png is saved.


![image](https://github.com/chc-ucsb/GlobalCropData/blob/main/figures/MW_bar_natgrainprod_Main.png?raw=true)
![image](https://github.com/chc-ucsb/GlobalCropData/blob/main/figures/MW_bar_natgrainprod_Winter.png?raw=true)

In [6]:
# Lineplot of Production-Area-Yield (PAY) time-series
country_iso, country_name = 'MW', 'Malawi'
df = pd.read_hdf('./data/crop/adm_crop_production_%s.hdf' % country_iso)
product_season = [
    ['Maize','Main']
]
for product_name, season_name in product_season:
    footnote = 'Production-Area-Yield (PAY) time-series of %s - %s - %s' % (country_iso, product_name, season_name)
    fn_save = './figures/%s_line_pay_%s_%s.png' % (country_iso, product_name, season_name)
    sub = df[(df['product'] == product_name) & (df['season_name'] == season_name)]
    fig = PlotLinePAY(sub, footnote, fn_save)
    # fig.show()

./figures/MW_line_pay_Maize_Main.png is saved.


![image](https://github.com/chc-ucsb/GlobalCropData/blob/main/figures/MW_line_pay_Maize_Main.png?raw=true)

In [7]:
# Heatmap of seasonal data availability
country_iso, country_name = 'MW', 'Malawi'
df = pd.read_hdf('./data/crop/adm_crop_production_raw_%s.hdf' % country_iso)
code = {'Main':1,'Winter':10}
comb = {1:1,10:2,11:3,12:3,20:3,21:3,22:3}
comb_name = {1:'Main',2:'Winter',3:'Main + Winter'}
for product_name in ['Maize Grain (White)']:
    data = df[(df['product'] == product_name) & (df['season_name'].isin(code.keys()))]
    footnote = 'Seasonal data availability in %s - %s (uncorrected)' % (country_name, product_name)
    fn_save = './figures/%s_heat_seasondata_%s.png' % (country_iso, product_name)
    fig = PlotHeatSeasonData(data, code, comb, comb_name, footnote, fn_save)
    # fig.show()

./figures/MW_heat_seasondata_Maize Grain (White).png is saved.


![image](https://github.com/chc-ucsb/GlobalCropData/blob/main/figures/MW_heat_seasondata_Maize%20Grain%20(White).png?raw=true)

In [8]:
# Heatmap of crop production system(s) per crop type
country_iso, country_name = 'MW', 'Malawi'
df = pd.read_hdf('./data/crop/adm_crop_production_raw_%s.hdf' % country_iso)
product_season = [
    ['Maize Grain (White)', 'Main'],
    ['Maize Grain (White)', 'Winter'],
]
code = {'none':1,'Estate (PS)':10,'estate':100}
comb = {1:1,10:2,11:3,100:4,101:5,110:6,111:7}
comb_name = {1:'None',2:'Estate (PS)',3:'Estate (PS) + None',4:'estate',5:'estate + None',6:'estate + Estate (PS)',7:'All'}
for product_name, season_name in product_season:
    data = df[(df['product'] == product_name) & (df['season_name'] == season_name)]
    footnote = 'Reported crop production system(s) in %s - %s - %s (uncorrected)' % ('Somalia', product_name, season_name)
    fn_save = './figures/%s_heat_cropsystem_%s_%s.png' % (country_iso, product_name, season_name)
    fig = PlotHeatCropSystem(data, code, comb, comb_name, footnote, fn_save)
    # fig.show()

./figures/MW_heat_cropsystem_Maize Grain (White)_Main.png is saved.
./figures/MW_heat_cropsystem_Maize Grain (White)_Winter.png is saved.


![image](https://github.com/chc-ucsb/GlobalCropData/blob/main/figures/MW_heat_cropsystem_Maize%20Grain%20(White)_Main.png?raw=true)
![image](https://github.com/chc-ucsb/GlobalCropData/blob/main/figures/MW_heat_cropsystem_Maize%20Grain%20(White)_Winter.png?raw=true)