# FAOStat Dataset: Time Series

## Business Understanding 

### Overview

Kenya's food production plays a crucial role in ensuring food security for its population. The country's agricultural sector employs a significant portion of the population and contributes to the national economy. Kenya is known for its diverse agricultural activities, including crop cultivation, livestock rearing, and fisheries.
In recent years, Kenya has made strides to improve food production through various initiatives, including promoting modern farming techniques, investing in irrigation infrastructure, and supporting small-scale farmers. These efforts have led to increased agricultural productivity and improved crop yields.
However, despite these advancements, food production in Kenya still faces challenges that affect its sufficiency. Climate change, unpredictable weather patterns, and recurrent droughts pose significant risks to agricultural productivity. Additionally, limited access to affordable inputs, inadequate infrastructure, and post-harvest losses contribute to the food production challenges.
As a result, Kenya occasionally experiences food shortages and relies on imports to meet the country's food demands. Despite efforts to enhance domestic food production, there is a need for further investment in sustainable agriculture, resilient farming practices, and improved market access to ensure long-term food sufficiency in Kenya.
Overall, while Kenya has made progress in food production, there is still work to be done to achieve full sufficiency. Continued efforts to address challenges and invest in sustainable agricultural practices are essential to enhance food security and meet the growing demands of the population.

### Problem Statement

The current state of food production in Kenya poses challenges to ensuring sufficient food supply for the growing population. Despite efforts to improve agricultural productivity, factors such as climate change, unpredictable weather patterns, and limited access to resources continue to impact the ability to accurately forecast and meet the population's food needs.

There is a need for a reliable prediction model that can forecast food production in Kenya to assess whether it will be sufficient to meet the population's requirements. Such a model would help policymakers, agricultural stakeholders, and government agencies make informed decisions regarding food security, resource allocation, and import/export planning.

By leveraging historical data, real-time information, and advanced analytical techniques, the model would provide valuable insights into future food production levels, helping to identify potential shortfalls or surpluses.

The development of a prediction model would support proactive planning and decision-making processes, allowing stakeholders to take appropriate measures in advance to bridge any potential food supply gaps. It would aid in optimizing resource allocation, promoting sustainable farming practices, and implementing targeted interventions to ensure food sufficiency for Kenya's population.

Therefore, the problem at hand is the lack of a reliable prediction model that accurately forecasts food production, which hinders the ability to determine whether it will be sufficient to meet the growing population's needs. Developing such a model would greatly contribute to enhancing food security, optimizing resource allocation, and ensuring the well-being of the Kenyan population.

### Objectives

The objectives of the prediction model for food production in Kenya are as follows:

1. Forecasting Food Production: The primary objective of the model is to accurately predict food production levels in Kenya. By analyzing historical data, current conditions, and relevant variables, the model aims to provide forecasts that reflect the expected output of crops, livestock, and other food sources.

2. Assessing Food Sufficiency: The model seeks to determine whether the projected food production will be sufficient to meet the needs of the population. It aims to assess the adequacy of food supply in order to identify potential shortfalls or surpluses.

3. Informing Decision-Making: The model aims to provide valuable insights to policymakers, government agencies, and agricultural stakeholders. By offering reliable predictions, the model can inform decision-making processes related to resource allocation, import/export planning, and interventions to ensure food security.

4. Optimizing Resource Allocation: The model aims to optimize the allocation of resources by identifying areas of potential food shortages or surpluses. This can help in directing resources, such as irrigation, fertilizers, and agricultural investments, to areas that require them the most.

5. Promoting Sustainable Farming Practices: By considering various factors that impact food production, such as climate conditions and agricultural practices, the model can promote sustainable farming techniques. It can provide recommendations for resilient and environmentally-friendly practices that enhance productivity while minimizing negative impacts.

6. Enhancing Food Security: Ultimately, the objective of the prediction model is to contribute to improving food security in Kenya. By accurately forecasting food production and assessing sufficiency, the model aims to support proactive measures that ensure a consistent and adequate food supply for the growing population.

These objectives collectively aim to provide valuable insights, aid decision-making processes, and contribute to long-term food security in Kenya.

## Data Understanding

The data from this project comes from the FAOStas site.
[Food Balances](https://www.fao.org/faostat/en/#data/SCL)

The CSV has the following columns:

1. Area Code (M49): This column represents the standard area codes used by the United Nations for statistical purposes. The codes are developed and maintained by the United Nations Statistics Division.

2. Area: This column contains the country name or area name corresponding to the data.

3. Element Code: The element code represents the entities or categories based on which the data is collected. It is a numerical code used to identify specific elements.

4. Element: This column provides a description or name for the entities or categories represented by the element code.

5. Item Code (CPC): The item code refers to the Central Product Classification code assigned to a specific product or item. It is a standardized code used for classification purposes.

6. Item: This column contains the product or item classification name. It is based on the Central Product Classification (CPC) system promulgated by the United Nations Statistical Commission.

7. Year: This column represents the year when the data was collected or recorded.

8. Unit: The unit column specifies the measurement unit used for the corresponding item. It indicates the quantity or scale in which the data is measured (e.g., kilograms, tonnes, liters).

9. Value: This column provides the numerical value associated with a specific item, measured in the units specified in the "Unit" column. It represents the quantity or magnitude of the item for a given year and area.

10. Flag: The flag column describes how the values in the dataset were acquired by the FAO (Food and Agriculture Organization of the United Nations). The flag values indicate the data's source or quality.

11. Flag Description: This column provides additional information or descriptions related to the flags used in the dataset. The "E" flag represents estimated values, the "X" flag indicates figures from international organizations, and the "I" flag denotes imputed values.

In summary, the dataset contains food balance data from different countries or areas, including information about the area, element, item, year, measurement unit, value, and data quality flags. The dataset helps in understanding food consumption, production, and other related factors for various products and regions over time.

## Data Preparation

In [2]:
import pandas as pd

In [3]:
#previewing the dataset
faoDf = pd.read_csv('FaoStat_EA.csv')
faoDf.head(10)

Unnamed: 0,Area Code (M49),Area,Element Code,Element,Item Code (CPC),Item,Year,Unit,Value,Flag,Flag Description
0,108,Burundi,511,Total Population - Both sexes,F2501,Population,2014,1000 No,9844.3,X,Figure from international organizations
1,108,Burundi,511,Total Population - Both sexes,F2501,Population,2015,1000 No,10160.03,X,Figure from international organizations
2,108,Burundi,511,Total Population - Both sexes,F2501,Population,2016,1000 No,10488.0,X,Figure from international organizations
3,108,Burundi,511,Total Population - Both sexes,F2501,Population,2017,1000 No,10827.02,X,Figure from international organizations
4,108,Burundi,511,Total Population - Both sexes,F2501,Population,2018,1000 No,11175.37,X,Figure from international organizations
5,108,Burundi,511,Total Population - Both sexes,F2501,Population,2019,1000 No,11530.58,X,Figure from international organizations
6,108,Burundi,511,Total Population - Both sexes,F2501,Population,2020,1000 No,11890.78,X,Figure from international organizations
7,108,Burundi,5301,Domestic supply quantity,F2501,Population,2014,1000 t,0.0,I,Imputed value
8,108,Burundi,5301,Domestic supply quantity,F2501,Population,2015,1000 t,0.0,I,Imputed value
9,108,Burundi,5301,Domestic supply quantity,F2501,Population,2016,1000 t,0.0,I,Imputed value


In [4]:
faoDf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38257 entries, 0 to 38256
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Area Code (M49)   38257 non-null  int64  
 1   Area              38257 non-null  object 
 2   Element Code      38257 non-null  int64  
 3   Element           38257 non-null  object 
 4   Item Code (CPC)   38257 non-null  object 
 5   Item              38257 non-null  object 
 6   Year              38257 non-null  int64  
 7   Unit              38257 non-null  object 
 8   Value             38257 non-null  float64
 9   Flag              38257 non-null  object 
 10  Flag Description  38257 non-null  object 
dtypes: float64(1), int64(3), object(7)
memory usage: 3.2+ MB


In [5]:
df = faoDf.copy()

# Pivot the dataframe
pivot_df = df.pivot(index=['Area Code (M49)', 'Area', 'Element Code', 'Element', 'Item Code (CPC)', 'Item', 'Unit', 'Flag',
                          'Flag Description'],
                    columns='Year',
                    values='Value').reset_index()

In [11]:
pivot_df

Year,Area Code (M49),Area,Element Code,Element,Item Code (CPC),Item,Unit,Flag,Flag Description,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
0,108,Burundi,511,Total Population - Both sexes,F2501,Population,1000 No,X,Figure from international organizations,,,,,9844.30,10160.03,10488.00,10827.02,11175.37,11530.58,11890.78
1,108,Burundi,645,Food supply quantity (kg/capita/yr),F2511,Wheat and products,kg,E,Estimated value,,,,,4.74,2.48,5.26,6.41,7.02,7.63,5.63
2,108,Burundi,645,Food supply quantity (kg/capita/yr),F2513,Barley and products,kg,E,Estimated value,,,,,0.00,0.00,0.00,0.00,0.00,0.00,0.00
3,108,Burundi,645,Food supply quantity (kg/capita/yr),F2514,Maize and products,kg,E,Estimated value,,,,,13.57,16.08,23.66,23.67,28.49,23.92,21.54
4,108,Burundi,645,Food supply quantity (kg/capita/yr),F2515,Rye and products,kg,E,Estimated value,,,,,,,,0.00,0.00,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4479,834,United Republic of Tanzania,5911,Export Quantity,F2782,"Fish, Liver Oil",1000 t,E,Estimated value,,,,,,,,,,,0.00
4480,834,United Republic of Tanzania,5911,Export Quantity,F2782,"Fish, Liver Oil",1000 t,I,Imputed value,0.0,0.0,0.0,0.0,0.00,0.00,0.00,0.00,0.00,0.00,
4481,834,United Republic of Tanzania,5911,Export Quantity,F2807,Rice and products,1000 t,I,Imputed value,75.0,54.0,27.0,79.0,107.00,23.00,19.00,1.00,46.00,171.00,527.00
4482,834,United Republic of Tanzania,5911,Export Quantity,F2848,Milk - Excluding Butter,1000 t,I,Imputed value,0.0,0.0,0.0,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00


In [17]:
# Filter the dataframe
filtered_df = pivot_df[(pivot_df['Element'].str.contains('Domestic supply quantity'))
                 & (pivot_df['Item'] != 'Population')]

# Print the filtered dataframe
filtered_df

Year,Area Code (M49),Area,Element Code,Element,Item Code (CPC),Item,Unit,Flag,Flag Description,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
266,108,Burundi,5301,Domestic supply quantity,F2511,Wheat and products,1000 t,I,Imputed value,,,,,54.00,30.0,65.00,81.0,92.0,100.0,79.0
267,108,Burundi,5301,Domestic supply quantity,F2513,Barley and products,1000 t,I,Imputed value,,,,,14.00,14.0,15.00,16.0,17.0,18.0,17.0
268,108,Burundi,5301,Domestic supply quantity,F2514,Maize and products,1000 t,I,Imputed value,,,,,145.00,178.0,270.00,279.0,347.0,304.0,282.0
269,108,Burundi,5301,Domestic supply quantity,F2515,Rye and products,1000 t,I,Imputed value,,,,,,,,0.0,0.0,,
270,108,Burundi,5301,Domestic supply quantity,F2516,Oats,1000 t,I,Imputed value,,,,,0.00,0.0,0.00,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4160,834,United Republic of Tanzania,5301,Domestic supply quantity,F2781,"Fish, Body Oil",1000 t,I,Imputed value,0.01,0.05,0.02,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0
4161,834,United Republic of Tanzania,5301,Domestic supply quantity,F2782,"Fish, Liver Oil",1000 t,I,Imputed value,0.00,0.00,0.00,0.00,0.00,0.0,0.00,0.0,0.0,0.0,0.0
4162,834,United Republic of Tanzania,5301,Domestic supply quantity,F2807,Rice and products,1000 t,I,Imputed value,1509.00,1576.00,1642.00,1758.00,1761.00,1878.0,2001.00,2109.0,2544.0,2396.0,2780.0
4163,834,United Republic of Tanzania,5301,Domestic supply quantity,F2848,Milk - Excluding Butter,1000 t,I,Imputed value,1821.00,1912.00,2029.00,2100.00,2199.00,2272.0,2348.00,2302.0,2616.0,2894.0,3226.0


In [21]:
df_filled = filtered_df.fillna(0)

In [26]:
import pandas as pd

def melt_data(df):
    melted = pd.melt(df, id_vars=['Area Code (M49)', 'Area', 'Element Code', 'Element', 'Item Code (CPC)', 'Item', 'Unit', 'Flag', 'Flag Description'], 
                     value_vars=df.columns[10:], var_name='Year', value_name='Value')
    return melted

# Apply the function to your dataframe
melted_df = melt_data(df_filled)


In [28]:
melted_df

Unnamed: 0,Area Code (M49),Area,Element Code,Element,Item Code (CPC),Item,Unit,Flag,Flag Description,Year,Value
0,108,Burundi,5301,Domestic supply quantity,F2511,Wheat and products,1000 t,I,Imputed value,2011,0.0
1,108,Burundi,5301,Domestic supply quantity,F2513,Barley and products,1000 t,I,Imputed value,2011,0.0
2,108,Burundi,5301,Domestic supply quantity,F2514,Maize and products,1000 t,I,Imputed value,2011,0.0
3,108,Burundi,5301,Domestic supply quantity,F2515,Rye and products,1000 t,I,Imputed value,2011,0.0
4,108,Burundi,5301,Domestic supply quantity,F2516,Oats,1000 t,I,Imputed value,2011,0.0
...,...,...,...,...,...,...,...,...,...,...,...
6655,834,United Republic of Tanzania,5301,Domestic supply quantity,F2781,"Fish, Body Oil",1000 t,I,Imputed value,2020,0.0
6656,834,United Republic of Tanzania,5301,Domestic supply quantity,F2782,"Fish, Liver Oil",1000 t,I,Imputed value,2020,0.0
6657,834,United Republic of Tanzania,5301,Domestic supply quantity,F2807,Rice and products,1000 t,I,Imputed value,2020,2780.0
6658,834,United Republic of Tanzania,5301,Domestic supply quantity,F2848,Milk - Excluding Butter,1000 t,I,Imputed value,2020,3226.0
