### Food Prices Preprocessing

Source:	EWURA, FPMA, MITI, MITM, Regional Agricultural Trade Intelligence Network via FAO: GIEWS
Contributor:WFP - World Food Programme
Time Period of the Dataset: January 15, 2006-February 15, 2024 ... More
Expected Update Frequency:Every month 
Location:  	
United Republic of Tanzania
Visibility 	
Public
License 	Creative Commons Attribution for Intergovernmental Organisations
Methodology 	Registry
Caveats / Comments 	

Although the frequency of updates is set to every month, updates may occur more or less frequently
[cite](https://data.humdata.org/dataset/wfp-food-prices-for-united-republic-of-tanzania) on 18th April 2024


[asap](https://data.humdata.org/dataset/asap-hotspots-monthly)



You can look this later on

[Crop Production in Tanzania](https://tasis.nbs.go.tz/statHtml/statHtml.do?orgId=255&tblId=DT_SA_077&language=en&conn_path=I3)

## Import Libraries

In [1]:
import pandas as pd
from pandas import read_csv
import seaborn as sns
import matplotlib.pyplot as plt

### Helper Function

In [2]:
def data_exploration(df, columns_to_explore):
    
    '''
    Perform data exploration on specified columns of a DataFrame and print the results.

    Parameters:
    df (pandas.DataFrame): The input DataFrame.
    columns_to_explore (list): A list of column names to explore.
    
    '''
    for column in columns_to_explore:
        # Get unique values
        unique_values = df[column].unique()
        
        #value_counts = df[column].value_counts()
        
        # Number of unique values
        num_unique_values = len(unique_values)

        # Count missing values
        missing_values_count = df[column].isnull().sum()

        # Print exploration results
        print(f"Column Name : {column}")
        print("=================================================================================")
        print(f"Unique values: {unique_values}")
        print(f"Number of unique values: {num_unique_values}")
        print(f"Missing values count: {missing_values_count}")
        print("=================================================================================") 

In [3]:
def update_food_price(df):
    '''
    Update the food prices and unit for those food items with 100 KG,
    Converting the Unit to KG and calculate the price and USD prices per KG

    Parameters:
    df (pandas.DataFrame): The input DataFrame containing columns 'unit', 'price', and 'usd_price'.

    Returns:
    pandas.DataFrame: A new DataFrame with unit converted to KG and prices and USD price per KG
    
    '''
    
    # Identify rows where the unit is "100 KG"
    mask = df['unit'] == '100 KG'

    # Calculate the conversion rate based on the initial prices
    conversion_rate = df.loc[mask, 'price'] / df.loc[mask, 'usd_price']

    # Convert the price to price per KG
    df.loc[mask, 'price'] /= 100
    df.loc[mask, 'unit'] = 'KG'

    # Calculate the new price of USD per KG
    df.loc[mask, 'usd_price'] = df.loc[mask, 'price'] / conversion_rate

    return df


In [4]:
def plot_food_price_trends(dataframe, year, food_items=None, district=None, save_path=None):
    """
    Plot trends of food prices for specified year, food items, and district.

    Args:
        dataframe (DataFrame): The input dataframe.
        year (int): The year to visualize the trends.
        food_items (list of str, optional): List of food items to visualize. If None, all food items will be considered.
        district (str, optional): The name of the district to visualize. If None, data for all districts will be considered.
        save_path (str, optional): File path to save the graph. If None, the graph will not be saved.

    Returns:
        None
    """
    # Filter dataframe based on the specified year
    df_year = dataframe[dataframe['year'] == year]

    # Filter dataframe based on food items if specified
    if food_items:
        df_year = df_year[df_year['food_item'].isin(food_items)]

    # Filter dataframe based on district if specified
    if district:
        df_year = df_year[df_year['district'] == district]

    # Group dataframe by food item
    grouped = df_year.groupby('food_item')

    plt.figure(figsize=(8, 6))

    # Plot trends for each food item
    for food_item, group in grouped:
        plt.plot(group["month"], group["price"], marker='o', label=food_item)

    # Add title with district name if provided
    title = f"Price Trends in {year}"
    if district:
        title += f" - District: {district}"
    plt.title(title)

    plt.xlabel("Month")
    plt.ylabel("Price")
    plt.xticks(rotation=45)
    plt.legend()
    plt.grid(True)
    plt.tight_layout()

    if save_path:
        plt.savefig(f"{save_path}/{year}_trend.png")
    else:
        plt.show()

In [5]:
def merge_data(df_prices, df_district):
    '''
    Merges food prices data with district information.

    Parameters:
    - df_prices (DataFrame): A DataFrame containing food prices data.
    - df_district (DataFrame): A DataFrame containing district information.

    Returns:
    pandas.DataFrame: Merged dataframe containing district information and food prices.
    '''
    # Merge dataframes based on region and district columns
    merged_df = pd.merge(df_district, df_prices, on=['region', 'district'], how='left')
       
    return merged_df


In [6]:
def plot_violence_trend(df):
    """
    Plots the trend of violent events and fatalities per year on a single graph.

    Parameters:
    - df (DataFrame): A DataFrame containing columns 'year', 'month', 'events', and 'fatalities'.

    Returns:
    None
    """
    # Group the data by year and sum the number of events and fatalities
    trend_data = df.groupby('Year')[['Events', 'Fatalities']].sum().reset_index()

    # Plot the trend using seaborn
    plt.figure(figsize=(10, 6))
    sns.lineplot(data=trend_data, x='Year', y='Events', marker='o', color='orange', label='Events')
    sns.lineplot(data=trend_data, x='Year', y='Fatalities', marker='o', color='red', label='Fatalities')
    plt.title('Tanzania Trend of Civilian Targeting Violent Events and Fatalities per Year')
    plt.xlabel('Year')
    plt.ylabel('Number of Events / Fatalities')
    plt.xticks(rotation=45)
    plt.grid(True)
    plt.legend()
    plt.savefig('violent_civil_events_and_fatalities_trend.png')
    plt.show()

## Tanzania Food Prices Data Processing

In [15]:
tz_dir = 'tanzania_data/food_prices/'

In [16]:
#Load the data file containing the district information
df =  read_csv(tz_dir + 'tz_food_prices.csv', header=0,delimiter=',')
df.head()

Unnamed: 0,date,year,month,region,district,location_market,latitude,longitude,food_item,unit,currency,price,usd_price
0,2006-01-15,2006,1,Arusha,Arusha Urban,Arusha (urban),-3.366667,36.683333,Beans,100 KG,TZS,80714.29,68.7047
1,2006-01-15,2006,1,Dar es Salaam,Ilala,Dar Es Salaam,-6.8,39.283333,Rice,100 KG,TZS,64545.45,54.9416
2,2006-01-15,2006,1,Dodoma,Mpwapwa,Dodoma (Majengo),-7.29067,36.340375,Maize,100 KG,TZS,36107.14,30.7347
3,2006-01-15,2006,1,Dodoma,Mpwapwa,Dodoma (Majengo),-7.29067,36.340375,Rice,100 KG,TZS,60500.0,51.4981
4,2006-01-15,2006,1,Iringa,Iringa Urban,Iringa Urban,-7.766667,35.7,Rice,100 KG,TZS,58708.33,49.973


###  Data Exploration

In [17]:
columns = ['year','food_item','unit'] #list of column to explore
data_exploration(df, columns)

Column Name : year
Unique values: [2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
 2020 2021 2022 2023 2024]
Number of unique values: 19
Missing values count: 0
Column Name : food_item
Unique values: ['Beans' 'Rice' 'Maize']
Number of unique values: 3
Missing values count: 0
Column Name : unit
Unique values: ['100 KG' 'KG']
Number of unique values: 2
Missing values count: 0


In [22]:
#plot_food_price_trends(data, year = 2023, food_items = ['Maize','Rice','Beans'], district='Ilala', save_path=None)

#### Change the price of 100KG to KG

In [26]:
data = update_food_price(df)

In [27]:
data_exploration(df, ['unit']) #check the unit after convertion

Column Name : unit
Unique values: ['KG']
Number of unique values: 1
Missing values count: 0


In [28]:
data.head()

Unnamed: 0,date,year,month,region,district,location_market,latitude,longitude,food_item,unit,currency,price,usd_price
0,2006-01-15,2006,1,Arusha,Arusha Urban,Arusha (urban),-3.366667,36.683333,Beans,KG,TZS,807.1429,0.687047
1,2006-01-15,2006,1,Dar es Salaam,Ilala,Dar Es Salaam,-6.8,39.283333,Rice,KG,TZS,645.4545,0.549416
2,2006-01-15,2006,1,Dodoma,Mpwapwa,Dodoma (Majengo),-7.29067,36.340375,Maize,KG,TZS,361.0714,0.307347
3,2006-01-15,2006,1,Dodoma,Mpwapwa,Dodoma (Majengo),-7.29067,36.340375,Rice,KG,TZS,605.0,0.514981
4,2006-01-15,2006,1,Iringa,Iringa Urban,Iringa Urban,-7.766667,35.7,Rice,KG,TZS,587.0833,0.49973


### Waiting to be processed for missing value!