# 1.0 INTRODUCTION

Sub-Saharan Africa is one of the six major regions in the world. It comprises forty-nine countries. 
This region experienced high economic development and investment in the early 2000s, a period some experts call Africa Rising. Despite this wealth creation, there have not been any appreciable improvements in average salaries or additional jobs in the area, and inequality remains high. The global financial crisis of 2008 and a decline in the cost of commodities like crude oil, iron ore, copper, and palm oil halted this expansion. There are also long-standing issues in the region, such as inadequate infrastructure and access to capital, that continue to impede economic growth. Colonialism's legacy, violence, instability, and weak governmental leadership hinder the region's economy. 

The lowest total gross domestic product (GDP), which is a measure of all goods and services generated in a nation or region, is found in Sub-Saharan Africa. Just under $4,000, or one-fifth of the global average, is its average GDP per capita (GDP divided by population). But in the future, this difference might close. In 2018, the area was home to eight of the twenty economies with the fastest growth rates worldwide. In spite of significant GDP growth in nations like Ethiopia and Ghana, income in the sub-Saharan region is still highly concentrated. Nearly half of the region's GDP is produced by Nigeria and South Africa, the two richest nations in the continent. 

This region, like other parts of the world, in recent times has experienced rise in inflation rate which may likely have effect on the economic growth of the countries in the region.
An important indicator of economic growth is GDP growth, and inflation is a risk associated with a growing economy. Rising inflation affects purchasing power and expected inflation may also drive consumers to spend more. This study seeks to investigate how and to what extent inflation affects or influences economic growth in Sub-Saharan Africa.




# 2.0 METHODOLOGY

![image.png](attachment:image.png)

## 2.1 ACQUISITION OF DATA AND ITS DESCRIPTION
The datasets used are structured data on inflation rates(% annual) and GDP growth rates of the 48 Sub-Saharan African countries from 1960 to 2021. The datasets which are originated from Worldbank was obtained from kaggle.com. The dataset provided the information required to check and compare inflation rates across SSA countries over the years and then compare the inflation rate with GDP growth and check for patterns and trends over the years of consideration in this study. The data acquired is ideal for the descriptive analysis such as investigating average inflation rates, top countries with the highest GDP growths, in this study, over the years and checking for insights on how it affects inflation rates. 

## 2.2  MANIPULATION AND DATA VISUALIZATION

### 2.2.1 TOOLS FOR ANALYSIS
Python was used for programming the data cleaning and analysis. For this project, our preferred working environment was Jupyter notebook. The datasets was cleaned and analyzed using a variety of Python tools and modules, including NumPy, pandas, matplotlib, SciPy, etc., that are needed for the analysis. Using these Python tools and modules, descriptive and comparative analysis of the datasets was carried out and visualized in the form of graphs, plots, or charts.

### 2.2.2  DATA CLEANING
A dataset is cleaned by removing any incorrect, duplicate, or otherwise invalid data. Errors such as incorrectly formatted data, duplicate entries, mislabeled data, and others can occur when two or more datasets are combined. By cleaning the data, we will improve the quality of our decisions and the quality of the data. In this study, the datasets obtain from the www.kaggle.com was critical observed to identify the parameters needed for the analysis, irrelevant values or unwanted rows and columns  were removed using python functions such as drop(), the columns were set to new columns using DataFrame.columns and the countries name was set to index using DataFrame.set_index. Using Python code line, the Sub-Sahara African countries were extracted from the world dataset of both the inflation rate and the Gdp growth.


### 2.2.3 DATA VISUALIZATION
Using python packages such as matplotlib, analysis was done on the data to see the overall visual effects of the data and to draw inferrence using statistical approach.



## 2.3 PREDICTIVE MODEL USING ARIMA

### 2.3.1 Autoregressive Integrated Moving Average (ARIMA)
An ARIMA is a statistical analysis model that uses time series data to either better understand the data set or predict future trends. It is based on the statistical concept of serial correlation, where past data points influence future data points. An analysis of regression that determines how strong a dependent variable is in relation to other variables that are changing. Statistical models are autoregressive when the future values of the model are predicted using the values of the past. The ARIMA model will be used to predict inflation rates in the first five largest economies within the Sub-Sahara African region and a comparative analysis will be conducted on these predictions. As opposed to examining actual values, the model examines the differences between values in the series in order to predict future inflation rates.


To understand ARIMA models, we need to outline each of their components as follows:

### 2.3.2 Autoregression (AR): 
This is a model where a changing variable is regressed on its own lagged value, or prior.

$Y_t$ as a target variable depends on past values, $Y_{t-1}$, $Y_{t-2}$, $Y_{t-3}$ ...

So, $Y_t$ = f($Y_{t-1}$ + $Y_{t-2}$ + $Y_{t-3}$ + ...)

$Y_t$ = $\beta_0 + \beta_1Y_{t-1} + \beta_2Y_{t-2} + \beta_3Y_{t-3}$ 


### 2.3.3  Integrated (I): 
Enables the time series to become stationary by separating the raw observations (i.e., data values are replaced by the difference between the data values and the previous values).

### 2.3.4 Moving average (MA): 
Incorporates the relationship between a residual error from a moving average model applied to lagged observations and an observation.

 $Y_t$ = f($\epsilon_t, \epsilon_{t-1}$, $\epsilon_{t-2}$, $\epsilon_{t-3}$,  ...)

$Y_t$ = $\beta + \epsilon_t + \theta_1\epsilon_{t-1} + \theta_2\epsilon_{t-2} + \theta_3\epsilon_{t-3}$ +  ...

 


### 2.3.5  ARIMA Parameters

Each component in ARIMA functions as a parameter with a standard notation. For ARIMA models, a standard notation would be ARIMA with p, d, and q, where integer values substitute for the parameters to indicate the type of ARIMA model used. The parameters can be defined as:

p: the number of lag observations in the model; also known as the lag order.

d: the number of times that the raw observations are differenced; also known as the degree of differencing.

q: the size of the moving average window; also known as the order of the moving average.

In a linear regression model, for example, the number and type of terms are included. A 0 value, which can be used as a parameter, would mean that particular component should not be used in the model. This way, the ARIMA model can be constructed to perform the function of an ARMA model, or even simple AR, I, or MA models.


In this area of the study, the necessary Python libraries, such as numpy, pandas, matplolib, and statsmodels, were imported in order to fit the model. The dataset's stationarity was examined utilizing python block of codes. Following that, several additional lines of code were required to make the dataset stationary using transformation techniques (log, moving average, etc.) because the ARIMA model works best with stationary data. 
For validation during the course of this study, Augumented Dickey-Fuller (ADF) test (one of the best and most popular methods to determine if a series is stationary or not), Auto Correlation Function (ACF), and Partial Auto Correlation Function (PACF) were used. 
 After the time series has been stationarized, ACF and PACF plots assist in systematically determining the AR and MA terms that are required.