Python script by [__Hassan Mojeed__](https://hassanmojeed.pages.dev)<br>

Email: mojeed.o.hassan@gmail.com<br>

Website: [https://hassanmojeed.pages.dev](https://hassanmojeed.pages.dev)


## **Inflation and Food Price in Nigeria - A Data Driven Approch (2017-2023)**

Food is a fundamental part of life, and its affordability can significantly impact people's well-being. 

This project piqued my interest in exploring food commodity prices in Nigeria over the past few years (January 2017 to May 2023). 

I've gathered data from various sources, but the challenge lies in their different formats.

## **Here's how I plan to tackle this project:**

### Data Wrangling Process: 

The first half of the project will be a data wrangling adventure! I'll be using Pandas, my trusty data analysis toolkit, to import and 
<br> combine these datasets with varying shapes.

### Unifying the Data Force: 

Phase 5 will be all about bringing these diverse datasets together, merging them into a single, unified force.

### Cleaning and Exporting the Output: 

The final phases will focus on cleaning the data to ensure its accuracy and exporting the results for further exploration.

I'm excited to embark on this data analysis journey and gain insights into Nigerian food commodity prices!

*Check out my [website](https://hassanmojeed.pages.dev) to gain insights from the dynamic visualization I developed for this project.*




In [25]:
import pandas as pd
import numpy as np
import os
from datetime import date
from glob import glob
import warnings

# Ignore warnings to maintain clean output
warnings.filterwarnings('ignore')

In [26]:
# Establishing the working directory

pwd = os.getcwd() + "/food_prices_data_in_nigeria"

pwd

'/Users/mj/Projects/Projects/More Projects/food_prices_data_in_nigeria'

## Phase 1 : Data Import Part One

In [27]:
# Creating a funtion that reads each sheet from the excel file then combine as one

def combine_excel_sheets(file_path):

    data = []

    # Iterating over each sheet in the Excel file
    for sheet_name in pd.ExcelFile(file_path).sheet_names:

        # Reading data from each sheet and adding a column (State) to identify the sheet_name
        if sheet_name not in ["SELECTED FOOD JAN 2023","NATIONAL"]:

            sheet_data = pd.read_excel(file_path, sheet_name=sheet_name)

            sheet_data['State'] = sheet_name

            data.append(sheet_data)

    # Concatenating all DataFrames into a single DataFrame
    combined_data = pd.concat(data, ignore_index=True)
    
    return combined_data

In [28]:
# Accessing the excel file in focus
xcel_file = pwd + "/Food Prices (Jan 2017 - Jan 2023).xlsx" 

# Now deploying the above "combine_excel_sheets" funtion created 
df1 = combine_excel_sheets(xcel_file)

# Displaying the combined data

print(df1.shape)

df1.head()

(1591, 75)


Unnamed: 0,ItemLabels,2017-01-01 00:00:00,2017-02-01 00:00:00,2017-03-01 00:00:00,2017-04-01 00:00:00,2017-05-01 00:00:00,2017-06-01 00:00:00,2017-07-01 00:00:00,2017-08-01 00:00:00,2017-09-01 00:00:00,...,2022-05-01 00:00:00,2022-06-01 00:00:00,2022-07-01 00:00:00,2022-08-01 00:00:00,2022-09-01 00:00:00,2022-10-01 00:00:00,2022-11-01 00:00:00,2022-12-01 00:00:00,2023-01-01 00:00:00,State
0,Agric eggs medium size,459.977222,485.809524,519.565217,520.608696,559.655172,563.928571,520.9375,501.481481,495.916667,...,768.619048,770.003357,770.003357,800.0,816.428571,850.374913,901.307692,957.142857,1018.4,ABIA
1,Agric eggs(medium size price of one),45.107143,44.513575,46.4,48.148148,47.741935,48.4375,48.125,48.333333,46.956522,...,70.0,70.035,73.157895,75.0,75.875,80.333333,83.75,90.714286,95.007692,ABIA
2,"Beans brown,sold loose",471.22,436.72,415.625,394.144661,474.193548,465.898618,458.333333,489.0625,486.956522,...,786.060606,787.771825,790.756614,755.555556,746.666667,735.964912,733.571429,745.555556,746.79803,ABIA
3,Beans:white black eye. sold loose,420.764286,375.090498,400.241546,391.872428,470.3125,466.748768,463.793103,484.375,468.571429,...,764.734641,766.717087,769.541353,716.0,712.847222,700.0,699.691358,721.428571,724.670251,ABIA
4,Beef Bone in,1107.670833,955.094825,981.257777,969.967863,996.492745,990.955057,950.230393,1050.8075,996.219093,...,1566.196511,1568.388426,1584.589789,1593.59369,1609.94548,1644.630483,1708.941335,1830.461538,1833.098441,ABIA


In [29]:
# The above dataframe needs one more step to get to the final data shape intended for this analysis
# An unpivoting will achieve this for us.

df1_unpivoted = pd.melt(df1, id_vars=["ItemLabels", "State"], var_name="Date", value_name="Item_Price")

df1_unpivoted.rename(columns = {"ItemLabels" : "Item_Label"}, inplace=True)

# Convert to datetime type -- Validating 
df1_unpivoted["Date"] = pd.to_datetime(df1_unpivoted["Date"])  

# Convert datetime to date
df1_unpivoted["Date"] = df1_unpivoted["Date"].dt.date

df_part_one = df1_unpivoted[["Item_Label", "Date", "State", "Item_Price"]]

# Printing shape of DataFrame
print("   rows","cols","\n" ,df_part_one.shape)

df_part_one.info()


df_part_one.head()

   rows cols 
 (116143, 4)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 116143 entries, 0 to 116142
Data columns (total 4 columns):
 #   Column      Non-Null Count   Dtype 
---  ------      --------------   ----- 
 0   Item_Label  116143 non-null  object
 1   Date        116143 non-null  object
 2   State       116143 non-null  object
 3   Item_Price  116142 non-null  object
dtypes: object(4)
memory usage: 3.5+ MB


Unnamed: 0,Item_Label,Date,State,Item_Price
0,Agric eggs medium size,2017-01-01,ABIA,459.977222
1,Agric eggs(medium size price of one),2017-01-01,ABIA,45.107143
2,"Beans brown,sold loose",2017-01-01,ABIA,471.22
3,Beans:white black eye. sold loose,2017-01-01,ABIA,420.764286
4,Beef Bone in,2017-01-01,ABIA,1107.670833


# Phase 2 : Data Import Part Two

In [30]:
# Accesing all Excel files in the current working folder
file_location_paths = glob(pwd + "/*.xlsx")

file_location_paths

['/Users/mj/Projects/Projects/More Projects/food_prices_data_in_nigeria/Food Prices April 2023.xlsx',
 '/Users/mj/Projects/Projects/More Projects/food_prices_data_in_nigeria/Food Prices May 2023.xlsx',
 '/Users/mj/Projects/Projects/More Projects/food_prices_data_in_nigeria/Food Prices February 2023.xlsx',
 '/Users/mj/Projects/Projects/More Projects/food_prices_data_in_nigeria/Food Prices (Jan 2017- Dec 2022).xlsx',
 '/Users/mj/Projects/Projects/More Projects/food_prices_data_in_nigeria/Food Prices March 2023.xlsx',
 '/Users/mj/Projects/Projects/More Projects/food_prices_data_in_nigeria/Food Prices (Jan 2017 - Jan 2023).xlsx']

In [31]:
# exempting some files to avoid duplicate import

exempted_files = ['/Users/mj/Projects/Projects/More Projects/food_prices_data_in_nigeria/Food Prices April 2023.xlsx',
                  '/Users/mj/Projects/Projects/More Projects/food_prices_data_in_nigeria/Food Prices May 2023.xlsx',
                  '/Users/mj/Projects/Projects/More Projects/food_prices_data_in_nigeria/Food Prices (Jan 2017- Dec 2022).xlsx',
                  '/Users/mj/Projects/Projects/More Projects/food_prices_data_in_nigeria/Food Prices (Jan 2017 - Jan 2023).xlsx']

files = []

for file in file_location_paths:

    if file not in exempted_files:
            
        read_file = pd.read_excel(file, sheet_name="States", header=1)

        files.append(read_file)

df2 = pd.concat(files)

print(df2.shape)

df2.head()


(86, 39)


Unnamed: 0,ITEMS,Date,ABIA,ABUJA,ADAMAWA,AKWA IBOM,ANAMBRA,BAUCHI,BAYELSA,BENUE,...,OGUN,ONDO,OSUN,OYO,PLATEAU,RIVERS,SOKOTO,TARABA,YOBE,ZAMFARA
0,Agric eggs medium size,2023-02-01,1060.4,800.0,771.818182,901.666667,1043.75,698.74,1050.0,660.454545,...,817.2,912.222222,850.0,918.571429,650.0,1015.488154,899.827586,731.428571,798.148148,864.074074
1,Agric eggs(medium size price of one),2023-02-01,97.307692,87.571429,84.666667,95.23,92.947368,79.777778,100.0,84.0,...,89.666667,90.769231,93.333333,91.428571,85.0,97.12,79.137931,88.428571,86.875,80.0
2,"Beans brown,sold loose",2023-02-01,706.780296,747.916667,500.055096,813.209877,733.333333,480.4,720.992055,450.350649,...,615.336199,632.25475,661.111111,817.5,410.0,766.550356,539.770263,463.890457,472.857143,566.631464
3,Beans:white black eye. sold loose,2023-02-01,697.670251,590.149336,568.189379,742.828283,799.375,475.681818,653.676186,390.151966,...,628.528504,660.492888,675.954545,759.794372,489.122272,668.986496,529.235554,452.21724,455.273492,580.629697
4,Beef Bone in,2023-02-01,1851.498441,2010.0,1668.298368,1636.725519,1731.875219,1736.363636,1999.382395,1715.428571,...,2133.164616,1575.31635,1396.715608,1627.83003,1615.0,1736.836333,2272.0,1809.090909,1818.181818,2064.285714


In [32]:
# The above dataframe needs one more step to get to the final data shape intended for this analysis
# An unpivoting will achieve this for us.

df2_unpivoted = pd.melt(df2, id_vars=["ITEMS", "Date"], var_name="State", value_name="Item_Price")

df2_unpivoted.rename(columns = {"ITEMS" : "Item_Label"}, inplace=True)

df_part_two = df2_unpivoted

# Printing shape of DataFrame
print(" rows","cols","\n" ,df_part_two.shape)

df_part_two.info()


df_part_two.head()

 rows cols 
 (3182, 4)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3182 entries, 0 to 3181
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Item_Label  3182 non-null   object        
 1   Date        3182 non-null   datetime64[ns]
 2   State       3182 non-null   object        
 3   Item_Price  3182 non-null   float64       
dtypes: datetime64[ns](1), float64(1), object(2)
memory usage: 99.6+ KB


Unnamed: 0,Item_Label,Date,State,Item_Price
0,Agric eggs medium size,2023-02-01,ABIA,1060.4
1,Agric eggs(medium size price of one),2023-02-01,ABIA,97.307692
2,"Beans brown,sold loose",2023-02-01,ABIA,706.780296
3,Beans:white black eye. sold loose,2023-02-01,ABIA,697.670251
4,Beef Bone in,2023-02-01,ABIA,1851.498441


## Phase 3 : Data Import Part Three

In [33]:
# exempting some files to avoid duplicate import

exempted_files_two = ['/Users/mj/Projects/Projects/More Projects/food_prices_data_in_nigeria/Food Prices February 2023.xlsx',
                      '/Users/mj/Projects/Projects/More Projects/food_prices_data_in_nigeria/Food Prices (Jan 2017- Dec 2022).xlsx',
                      '/Users/mj/Projects/Projects/More Projects/food_prices_data_in_nigeria/Food Prices March 2023.xlsx',
                      '/Users/mj/Projects/Projects/More Projects/food_prices_data_in_nigeria/Food Prices (Jan 2017 - Jan 2023).xlsx']

# Creating an empty list to house all extracted data
xl_files = []

# looping file location to fetch required excel files
for xl_file in file_location_paths:

    # Stating a condition to ignore the exempted files above

    if xl_file not in exempted_files_two:

        # converting read excel files into DataFtames
            
        read_xlfile = pd.read_excel(xl_file, sheet_name="States", header=1)

        # dumping all files into the empty list created above

        xl_files.append(read_xlfile)

# Combining all extracted data into one Dataframe

data_f = pd.concat(xl_files)

# Printing shape of DataFrame
print(" rows","cols","\n" ,data_f.shape)

data_f.info()

# Viewing few rows from the DataFrame
data_f.head()

 rows cols 
 (86, 45)
<class 'pandas.core.frame.DataFrame'>
Index: 86 entries, 0 to 42
Data columns (total 45 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   ITEMS        86 non-null     object        
 1   Date         86 non-null     datetime64[ns]
 2   ABUJA        86 non-null     float64       
 3   BENUE        86 non-null     float64       
 4   KOGI         86 non-null     float64       
 5   KWARA        86 non-null     float64       
 6   NASARAWA     86 non-null     float64       
 7   NIGER        86 non-null     float64       
 8   PLATEAU      86 non-null     float64       
 9   AVERAGE      86 non-null     float64       
 10  ADAMAWA      86 non-null     float64       
 11  BAUCHI       86 non-null     float64       
 12  BORNO        86 non-null     float64       
 13  GOMBE        86 non-null     float64       
 14  TARABA       86 non-null     float64       
 15  YOBE         86 non-null     float64      

Unnamed: 0,ITEMS,Date,ABUJA,BENUE,KOGI,KWARA,NASARAWA,NIGER,PLATEAU,AVERAGE,...,EDO,RIVERS,AVERAGE.4,EKITI,LAGOS,OGUN,ONDO,OSUN,OYO,AVERAGE.5
0,Agric eggs medium size,2023-04-01,860.0,695.0,599.909091,700.666667,695.142857,690.230769,680.333333,703.040388,...,1143.0,1175.0,1098.98951,909.333333,974.705882,844.117647,942.727273,899.333333,953.75,920.661245
1,Agric eggs(medium size price of one),2023-04-01,78.0,70.727273,70.153846,73.666667,77.647059,68.846154,77.083333,73.732047,...,97.777778,99.0,95.796296,98.75,90.0,94.375,98.75,93.0,96.25,95.1875
2,"Beans brown,sold loose",2023-04-01,691.909323,500.589614,500.627751,504.545479,500.171489,428.566453,434.036519,508.635233,...,631.216931,776.845131,699.756052,675.925926,586.940166,635.745372,694.918531,646.428571,841.397059,680.225937
3,Beans:white black eye. sold loose,2023-04-01,591.091954,410.285127,454.750365,449.757758,495.648604,490.44152,531.780358,489.107955,...,545.485009,674.870781,644.966657,587.715082,623.33567,695.510158,708.960864,695.126615,782.03125,682.113273
4,Beef Bone in,2023-04-01,2033.333333,1857.142857,1495.333333,1600.724068,1785.0,1618.350168,1637.333333,1718.173871,...,2274.989825,1753.867843,1894.914054,2083.333333,1818.367347,2363.615206,1578.546069,1493.480213,1736.59022,1845.655398


In [34]:
# Quick glance at all columns in "data_f"
data_f.columns

Index(['ITEMS', 'Date', 'ABUJA', 'BENUE', 'KOGI', 'KWARA', 'NASARAWA', 'NIGER',
       'PLATEAU', 'AVERAGE', 'ADAMAWA', 'BAUCHI', 'BORNO', 'GOMBE', 'TARABA',
       'YOBE', 'AVERAGE.1', 'JIGAWA', 'KADUNA', 'KANO', 'KATSINA', 'KEBBI',
       'SOKOTO', 'ZAMFARA', 'AVERAGE.2', 'ABIA', 'ANAMBRA', 'EBONYI', 'ENUGU',
       'IMO', 'AVERAGE.3', 'AKWA IBOM', 'BAYELSA', 'CROSS RIVER', 'DELTA',
       'EDO', 'RIVERS', 'AVERAGE.4', 'EKITI', 'LAGOS', 'OGUN', 'ONDO', 'OSUN',
       'OYO', 'AVERAGE.5'],
      dtype='object')

In [35]:
drop_columns = ['AVERAGE', 'AVERAGE.1', 'AVERAGE.2', 'AVERAGE.3', 'AVERAGE.4', 'AVERAGE.5']

df3 = data_f.drop(columns = drop_columns)

In [36]:
# The above dataframe needs one more step to get to the final data shape intended for this analysis
# An unpivoting will achieve this for us.

df3_unpivoted = pd.melt(df3, id_vars=["ITEMS", "Date"], var_name="State", value_name="Item_Price")

df3_unpivoted.rename(columns = {"ITEMS" : "Item_Label"}, inplace=True)

# Convert to datetime type -- Validating 
df3_unpivoted["Date"] = pd.to_datetime(df3_unpivoted["Date"])  

# Convert datetime to date
df3_unpivoted["Date"] = df3_unpivoted["Date"].dt.date

df_part_three = df3_unpivoted

print(df_part_three.shape)

df_part_three.head()

(3182, 4)


Unnamed: 0,Item_Label,Date,State,Item_Price
0,Agric eggs medium size,2023-04-01,ABUJA,860.0
1,Agric eggs(medium size price of one),2023-04-01,ABUJA,78.0
2,"Beans brown,sold loose",2023-04-01,ABUJA,691.909323
3,Beans:white black eye. sold loose,2023-04-01,ABUJA,591.091954
4,Beef Bone in,2023-04-01,ABUJA,2033.333333


## Phase 4 : Data Import Part Four

In [37]:
# Importing Monthly Inflation Rate Data From January 2017 to May 2023

data_dir = pwd + "/Other Supporting Data/Inflation Rate.xlsx"

infData = pd.read_excel(data_dir)

infData["Date"] = pd.to_datetime(infData["Date"])

infData["Inflation Rate (%)"] = infData["Inflation Rate (%)"].astype(float)

print("rows","cols","\n" ,infData.shape)

infData.info()

infData.head()


rows cols 
 (77, 2)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 77 entries, 0 to 76
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   Date                77 non-null     datetime64[ns]
 1   Inflation Rate (%)  77 non-null     float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 1.3 KB


Unnamed: 0,Date,Inflation Rate (%)
0,2017-01-01,17.81824
1,2017-02-01,18.528148
2,2017-03-01,18.436097
3,2017-04-01,19.303071
4,2017-05-01,19.266167


## Phase 5: Combining data from phase 1 to phase 4

In [38]:

# Combining DataFrames from phase 1 to 3 vertically (stacking on top of each other)
all_data = pd.concat([df_part_one, df_part_two, df_part_three], ignore_index=True) 

all_data["Date"] = pd.to_datetime(all_data["Date"])

# Merging the resulting DataFrame to the Inflation Rate DataFrame

data_final = pd.merge(left= all_data, right= infData, how="left", on="Date")


# Adding 2 new fields "Year" and "Month"

data_final["Year"] = data_final["Date"].dt.year

data_final["Month"] = data_final["Date"].dt.month

# Getting a glance on how the final data looks like
print(data_final.shape)

data_final.head()


(122507, 7)


Unnamed: 0,Item_Label,Date,State,Item_Price,Inflation Rate (%),Year,Month
0,Agric eggs medium size,2017-01-01,ABIA,459.977222,17.81824,2017,1
1,Agric eggs(medium size price of one),2017-01-01,ABIA,45.107143,17.81824,2017,1
2,"Beans brown,sold loose",2017-01-01,ABIA,471.22,17.81824,2017,1
3,Beans:white black eye. sold loose,2017-01-01,ABIA,420.764286,17.81824,2017,1
4,Beef Bone in,2017-01-01,ABIA,1107.670833,17.81824,2017,1


## Phase 6: Data Exploration, Cleaning And Transformation

In [39]:
data_final[data_final["Item_Price"].isna()]

Unnamed: 0,Item_Label,Date,State,Item_Price,Inflation Rate (%),Year,Month
1143,Dried Fish Sardine,2017-01-01,BORNO,,17.81824,2017,1


In [40]:
dried_fish_sardine_2017_average_price = data_final[(data_final["State"] == "BORNO") & 
                           (data_final["Year"] == 2017) & 
                           (data_final["Item_Label"] == "Dried Fish Sardine")]["Item_Price"].mean()

dried_fish_sardine_2017_average_price

1613.7379775360982

In [41]:
# Filling empty data

data_final["Item_Price"] = data_final["Item_Price"].fillna(dried_fish_sardine_2017_average_price)

data_final["State"] = data_final["State"].str.title()

data_final.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 122507 entries, 0 to 122506
Data columns (total 7 columns):
 #   Column              Non-Null Count   Dtype         
---  ------              --------------   -----         
 0   Item_Label          122507 non-null  object        
 1   Date                122507 non-null  datetime64[ns]
 2   State               122507 non-null  object        
 3   Item_Price          122507 non-null  object        
 4   Inflation Rate (%)  122507 non-null  float64       
 5   Year                122507 non-null  int32         
 6   Month               122507 non-null  int32         
dtypes: datetime64[ns](1), float64(1), int32(2), object(3)
memory usage: 5.6+ MB


In [42]:
# Correcting the Inconsistencies in the "State" Column

data_final["State"] = data_final.apply(lambda row: "Akwa Ibom"
                                        if row["State"] == "Akwa_Ibom"
                                        else "Cross River" if row["State"] == "Cross_River"
                                        else "Nasarawa" if row["State"] == "Nassarawa"
                                        else row["State"], axis = 1)

In [43]:
# Assigning "State_Id" to each states

states = list(data_final["State"].unique())

id_series = np.arange(1,38)

state_id_dic = dict(zip(states, id_series))

# Adding the "State_Id" Column to the DataFarame


# Function to map state names to IDs
def map_stateId(state):
    for key, value in state_id_dic.items():
        if state in key:
            return value

# Applying the function to the "state" column
data_final["State_Id"] = data_final["State"].apply(map_stateId)

In [44]:
# Adding "Region" Column to the DataFrame

NC = ["Plateau", "Niger", "Nasarawa", "Kwara", "Kogi", "Benue", "Abuja"]
NE = ["Yobe", "Taraba", "Gombe", "Borno", "Bauchi", "Adamawa"]
NW = ["Sokoto", "Zamfara", "Kebbi", "Katsina", "Kano", "Kaduna", "Jigawa"]
SE = ["Imo", "Enugu", "Ebonyi", "Anambra", "Abia"]
SS = ["Edo", "Rivers", "Delta", "Cross River", "Bayelsa", "Akwa Ibom"]
SW = ["Oyo", "Osun", "Ogun", "Ondo", "Lagos", "Ekiti"]

data_final["Region"] = data_final.apply(lambda row: "North Central"
                                        if row["State"] in NC
                                        else "North East" if row["State"] in NE
                                        else "North West" if row["State"] in NW
                                        else "South East" if row["State"] in SE
                                        else "South South" if row["State"] in SS
                                        else "South West", axis = 1
                                        )


# Adding "Region_Code" Column to the DataFrame

RegionCode = {"North Central" : "NC",
            "North East" : "NE",
            "North West" : "NW",
            "South East" : "SE",
            "South South" : "SS",
            "South West" : "SW"
            }

data_final["Region_Code"] = data_final["Region"].apply(lambda row: RegionCode[row]
                                             if row in RegionCode
                                             else row            
                                             )

In [45]:
# Accessing the unqiue item labels present in the DataFrame

list(data_final["Item_Label"].unique())


# Assigning Category to each item

item_category = {'Agric eggs medium size': 'Eggs',
                'Agric eggs(medium size price of one)': 'Eggs',
                'Beans brown,sold loose' : 'Beans',
                'Beans:white black eye. sold loose' : 'Beans',
                'Beef Bone in' : 'Beef',
                'Beef,boneless' : 'Beef',
                'Bread sliced 500g' : 'Bread',
                'Bread unsliced 500g' : 'Bread',
                'Broken Rice (Ofada)' : 'Rice',
                'Chicken Feet': 'Chicken' ,
                'Chicken Wings' : 'Chicken',
                'Evaporated tinned milk carnation 170g' : 'Milk',
                'Evaporated tinned milk(peak), 170g' : 'Milk',
                'Frozen chicken' : 'Chicken',
                'Gari white,sold loose' : 'Garri',
                'Gari yellow,sold loose' : 'Garri',
                'Mudfish (aro) fresh' : 'Fish',
                'Mudfish : dried' : 'Fish',
                'Onion bulb' : 'Onion',
                'Rice agric sold loose' : 'Rice',
                'Rice local sold loose' : 'Rice',
                'Rice Medium Grained' : 'Rice',
                'Rice,imported high quality sold loose' : 'Rice',
                'Tomato' : 'Tomato',
                'Yam tuber' : 'Tubers',
                'Dried Fish Sardine' : 'Fish',
                'Iced Sardine' : 'Fish',
                'Irish potato' : 'Tubers',
                'Sweet potato' : 'Tubers',
                'Tilapia fish (epiya) fresh' : 'Fish',
                'Titus:frozen' : 'Fish',
                'Catfish (obokun) fresh' : 'Fish',
                'Catfish :dried' : 'Fish',
                'Catfish Smoked' : 'Fish',
                'Mackerel : frozen' : 'Fish',
                'Groundnut oil: 1 bottle, specify bottle' : 'Cooking Oil',
                'Maize grain white sold loose' : 'Maize',
                'Maize grain yellow sold loose' : 'Maize',
                'Palm oil: 1 bottle,specify bottle' : 'Cooking Oil',
                'Plantain(ripe)' : 'Plantain',
                'Plantain(unripe)' : 'Plantain',
                'Vegetable oil:1 bottle,specify bottle' : 'Cooking Oil',
                'Wheat flour: prepacked (golden penny 2kg)' : 'Wheat Flour'}


# defining a function that adds category to DataFrame

def add_item_cat(item):
    for key, value in item_category.items():
        if item in key:
            return value


# Adding the "item_category" to DataFrame

data_final["Item_Category"] = data_final["Item_Label"].apply(add_item_cat)


# Renaming each item_label

data_final["Item_Label"].replace({'Agric eggs medium size': 'Agric eggs - medium',
                'Agric eggs(medium size price of one)': 'Agric eggs - medium (price of one)',
                'Beans brown,sold loose' : 'Beans brown - sold loose',
                'Beans:white black eye. sold loose' : 'Beans white, black eye - sold loose',
                'Beef Bone in' : 'Beef - one in',
                'Beef,boneless' : 'Beef - boneless',
                'Bread sliced 500g' : 'Bread - sliced 500g',
                'Bread unsliced 500g' : 'Bread - unsliced 500g',
                'Broken Rice (Ofada)' : 'Broken Rice - Ofada',
                'Chicken Feet': 'Chicken - Feet' ,
                'Chicken Wings' : 'Chicken - Wings',
                'Evaporated tinned milk carnation 170g' : 'Evaporated tinned milk - carnation 170g',
                'Evaporated tinned milk(peak), 170g' : 'Evaporated tinned milk - peak 170g',
                'Frozen chicken' : 'Chicken - Frozen',
                'Gari white,sold loose' : 'Garri white - sold loose',
                'Gari yellow,sold loose' : 'Gari yellow - sold loose',
                'Mudfish (aro) fresh' : 'Mud_fish - aro fresh',
                'Mudfish : dried' : 'Mud_fish - dried',
                'Onion bulb' : 'Onion bulb',
                'Rice agric sold loose' : 'Rice - agric sold loose',
                'Rice local sold loose' : 'Rice - local sold loose',
                'Rice Medium Grained' : 'Rice - medium grained',
                'Rice,imported high quality sold loose' : 'Rice - imported high quality sold loose',
                'Tomato' : 'Tomato',
                'Yam tuber' : 'Yam tuber',
                'Dried Fish Sardine' : 'Dried Fish - sardine',
                'Iced Sardine' : 'Iced fish - sardine',
                'Irish potato' : 'Irish potato',
                'Sweet potato' : 'Sweet potato',
                'Tilapia fish (epiya) fresh' : 'Tilapia fish - epiya fresh',
                'Titus:frozen' : 'Titus - frozen',
                'Catfish (obokun) fresh' : 'Catfish - obokun fresh',
                'Catfish :dried' : 'Catfish - dried',
                'Catfish Smoked' : 'Catfish - smoked',
                'Mackerel : frozen' : 'Mackerel - frozen',
                'Groundnut oil: 1 bottle, specify bottle' : 'Groundnut oil - 1 bottle',
                'Maize grain white sold loose' : 'Maize grain white - sold loose',
                'Maize grain yellow sold loose' : 'Maize grain yellow - sold loose',
                'Palm oil: 1 bottle,specify bottle' : 'Palm oil - 1 bottle',
                'Plantain(ripe)' : 'Plantain - ripe',
                'Plantain(unripe)' : 'Plantain - unripe',
                'Vegetable oil:1 bottle,specify bottle' : 'Vegetable oil - 1 bottle',
                'Wheat flour: prepacked (golden penny 2kg)' : 'Wheat flour - prepacked golden penny 2kg'}, inplace=True)


# Renaming a location ubder the state column

data_final["State"].replace({"Abuja":"Federal Capital Territory"}, inplace=True)

In [46]:
# Correcting the inconsistencies in the "Item Price" Column

data_final["Item_Price"] = data_final["Item_Price"].astype(str)

data_final["Item_Price"] = [x.strip("`") for x in data_final["Item_Price"]]

data_final["Item_Price"] = [x.replace(" ",".") for x in data_final["Item_Price"]]

data_final["Item_Price"] = [x.replace(",",".") for x in data_final["Item_Price"]]



In [47]:
# Defining a function that removes others instances of dots

def remove_second_dot(input_string):
    # Find the index of the first dot
    first_dot_index = input_string.find('.')
    if first_dot_index != -1:
        # Find the index of the second dot starting from the position after the first dot
        second_dot_index = input_string.find('.', first_dot_index + 1)
        if second_dot_index != -1:
            # Remove the second dot from the string
            return input_string[:second_dot_index] + input_string[second_dot_index + 1:]
    # Return the input string if the second dot is not found
    return input_string

data_final["Item_Price"] = data_final["Item_Price"].apply(remove_second_dot)


data_final["Item_Price"] = data_final["Item_Price"].astype(float)

data = data_final[["Date", "Year", "Month", "Region", "Region_Code", "State", "State_Id", "Item_Category", "Item_Label", "Item_Price","Inflation Rate (%)"]]

# We will only be analysing unit price of items, hence there will be no need to include grouped price of items

data = data[data["Item_Label"] != "Agric eggs - medium" ]

data.head()

Unnamed: 0,Date,Year,Month,Region,Region_Code,State,State_Id,Item_Category,Item_Label,Item_Price,Inflation Rate (%)
1,2017-01-01,2017,1,South East,SE,Abia,1,Eggs,Agric eggs - medium (price of one),45.107143,17.81824
2,2017-01-01,2017,1,South East,SE,Abia,1,Beans,Beans brown - sold loose,471.22,17.81824
3,2017-01-01,2017,1,South East,SE,Abia,1,Beans,"Beans white, black eye - sold loose",420.764286,17.81824
4,2017-01-01,2017,1,South East,SE,Abia,1,Beef,Beef - one in,1107.670833,17.81824
5,2017-01-01,2017,1,South East,SE,Abia,1,Beef,Beef - boneless,1378.395833,17.81824


## Phase 7: Data Export

In [48]:
# Parquet format is an efficient way of saving your output.
# My output was compressed by 83.8% which is a huge disk space savings.

df_export = data.to_parquet("ExporteData.parquet", index=False)

print(f"Data has been successfully exported as parquet format.")

Data has been successfully exported as parquet format.
