# Predicting Transaction Volumes in Kenya's Electronic Payment System: A Regression Analysis Incorporating GDP and Inflation Data

**Problem Statement:**

In recent years, Kenya has experienced significant growth in electronic payment transactions facilitated through the Kenya Electronic Payment Settlement System Real-Time Gross Settlement (KEPSSRTGS). Understanding the factors influencing transaction volumes and values in the KEPSSRTGS system is essential for financial institutions and policymakers to make informed decisions and optimize financial operations.

To this end, the aim of this project is to build a regression model that predicts transaction volumes or values in the KEPSSRTGS system based on economic indicators, specifically GDP and inflation rates in Kenya. By analyzing the relationship between transaction activity, GDP growth, and inflation, we can identify the key drivers of electronic payment transactions and anticipate future transaction trends.

**Dataset:**
- We have access to three datasets: one containing information about transaction volumes and values related to KEPSSRTGS, another containing GDP data for Kenya, and a third containing inflation rates in Kenya. Each dataset spans from January 2013 to February 2024, with monthly granularity.

**Objective:**
- Develop a regression model that predicts transaction volumes or values in the KEPSSRTGS system based on GDP and inflation data.
- Understand the impact of GDP growth and inflation on transaction activity in the KEPSSRTGS system.
- Provide insights and recommendations for financial institutions and policymakers to optimize electronic payment operations and promote financial stability.

**Approach:**
1. **Data Preprocessing**: Clean and preprocess the datasets, handle missing values, outliers, and ensure compatibility for merging.
2. **Merge Datasets**: Combine the datasets based on the "Year" and "Month" columns to create a unified dataset.
3. **Feature Engineering**: Create new features or derive additional information from the merged dataset, such as GDP growth rate and inflation trends.
4. **Model Selection**: Choose a regression algorithm suitable for the dataset, such as linear regression or ridge regression.
5. **Model Training**: Train the regression model using the merged dataset, with transaction volumes/values as the dependent variable and GDP, inflation, and other relevant features as independent variables.
6. **Model Evaluation**: Evaluate the performance of the trained regression model using metrics such as mean squared error (MSE) or R-squared.
7. **Fine-Tuning and Validation**: Fine-tune the model parameters if necessary and validate the model's performance using cross-validation techniques.
8. **Prediction and Interpretation**: Use the trained model to make predictions on new or future data, and interpret the results to understand the impact of GDP and inflation on transaction activity.

**Expected Outcome:**
- A regression model that accurately predicts transaction volumes or values in the KEPSSRTGS system based on GDP and inflation data.
- Insights into the relationship between transaction activity, GDP growth, and inflation in Kenya.
- Recommendations for financial institutions and policymakers to optimize electronic payment operations andconomic growth and development.way towards a brighter future.nomic growth and development.

## Data Cleaning

In [6]:
import pandas as pd

# Load the datasets
inflation_data = pd.read_csv('inflation_rates.csv')
gdp_data = pd.read_csv('annual_gdp.csv')
kepssrtgs_data = pd.read_csv('kepssrtgs.csv')

# Function to handle missing values
def handle_missing_values(df):
    # Check for missing values
    missing_values = df.isnull().sum()
    if missing_values.sum() > 0:
        print("Handling missing values...")
        # Impute missing values using forward filling
        df.fillna(method='ffill', inplace=True)
        print("Missing values handled.")
    else:
        print("No missing values found.")

# Function to handle outliers
def handle_outliers(df):
    # Outlier detection and handling code can be added here based on specific requirements.
    # For simplicity, let's assume outliers are not a concern in this example.
    print("No outlier handling implemented.")

# Function to address inconsistencies
def address_inconsistencies(df):
    # Check for inconsistencies in data formats or values
    # No specific inconsistencies identified in this example.
    print("No inconsistencies addressed.")

# Clean Inflation Rates dataset
print("Cleaning Inflation Rates dataset...")
handle_missing_values(inflation_data)
handle_outliers(inflation_data)
address_inconsistencies(inflation_data)
print("Inflation Rates dataset cleaned.\n")

# Clean Annual GDP dataset
print("Cleaning Annual GDP dataset...")
handle_missing_values(gdp_data)
handle_outliers(gdp_data)
address_inconsistencies(gdp_data)
print("Annual GDP dataset cleaned.\n")

# Clean KEPSSRTGS dataset
print("Cleaning KEPSSRTGS dataset...")
handle_missing_values(kepssrtgs_data)
handle_outliers(kepssrtgs_data)
address_inconsistencies(kepssrtgs_data)
print("KEPSSRTGS dataset cleaned.")

Cleaning Inflation Rates dataset...
No missing values found.
No outlier handling implemented.
No inconsistencies addressed.
Inflation Rates dataset cleaned.

Cleaning Annual GDP dataset...
No missing values found.
No outlier handling implemented.
No inconsistencies addressed.
Annual GDP dataset cleaned.

Cleaning KEPSSRTGS dataset...
No missing values found.
No outlier handling implemented.
No inconsistencies addressed.
KEPSSRTGS dataset cleaned.


## Merging Datasets

In [7]:
# Check column names in each dataset
print("Column names in inflation_data:", inflation_data.columns)
print("Column names in gdp_data:", gdp_data.columns)
print("Column names in kepssrtgs_data:", kepssrtgs_data.columns)

Column names in inflation_data: Index(['Year', 'Month', 'Annual Average Inflation', '12-Month Inflation'], dtype='object')
Column names in gdp_data: Index(['Year', 'Nominal GDP prices (Ksh Million)', 'Annual GDP growth (%)',
       'Real GDP prices (Ksh Million)'],
      dtype='object')
Column names in kepssrtgs_data: Index(['Year', 'Month    ', 'Volume', 'Value (Kshs Millions)'], dtype='object')


In [8]:
# Remove leading and trailing spaces from column names in KEPSSRTGS dataset
kepssrtgs_data.columns = kepssrtgs_data.columns.str.strip()

# Merge datasets based on 'Year' and 'Month'
merged_data = inflation_data.merge(gdp_data, on=['Year'], suffixes=('_inflation', '_gdp'))
merged_data = merged_data.merge(kepssrtgs_data, on=['Year', 'Month'])

# Display merged dataset information
print("Merged Dataset Information:")
print(merged_data.info())

# Display first few rows of the merged dataset
print("\nFirst few rows of the merged dataset:")
print(merged_data.head())

Merged Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 211 entries, 0 to 210
Data columns (total 9 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Year                              211 non-null    int64  
 1   Month                             211 non-null    object 
 2   Annual Average Inflation          211 non-null    float64
 3   12-Month Inflation                211 non-null    float64
 4   Nominal GDP prices (Ksh Million)  211 non-null    object 
 5   Annual GDP growth (%)             211 non-null    float64
 6   Real GDP prices (Ksh Million)     211 non-null    object 
 7   Volume                            211 non-null    object 
 8   Value (Kshs Millions)             211 non-null    float64
dtypes: float64(4), int64(1), object(4)
memory usage: 15.0+ KB
None

First few rows of the merged dataset:
   Year      Month  Annual Average Inflation  12-Month Inflation  \


The datasets have been successfully merged, and the merged dataset now contains 211 entries with 9 columns. Here's a brief overview of the merged dataset:

- The "Year" and "Month" columns represent the year and month of the recorded data, respectively.
- The "Annual Average Inflation" and "12-Month Inflation" columns contain inflation rate data.
- The "Nominal GDP prices (Ksh Million)" and "Real GDP prices (Ksh Million)" columns represent GDP values at current market prices and adjusted for inflation, respectively.
- The "Annual GDP growth (%)" column represents the annual GDP growth rate.
- The "Volume" column represents the number of transactions made during a particular month.
- The "Value (Kshs Millions)" column represents the total value of transactions made during a specific month, measured in Kenyan Shillings (Kshs) millions.

The data types of the columns seem appropriate, except for the columns "Nominal GDP prices (Ksh Million)" and "Real GDP prices (Ksh Million)" which are currently of type object. These columns should be converted to numeric type for further analysis.

Let's convert these columns to numeric type and proceed with further pretional preprocessing steps!

In [9]:
# Convert 'Nominal GDP prices (Ksh Million)' and 'Real GDP prices (Ksh Million)' columns to numeric
merged_data['Nominal GDP prices (Ksh Million)'] = merged_data['Nominal GDP prices (Ksh Million)'].str.replace(',', '').astype(float)
merged_data['Real GDP prices (Ksh Million)'] = merged_data['Real GDP prices (Ksh Million)'].str.replace(',', '').astype(float)

# Display data types of columns after conversion
print("\nData types of columns after conversion:")
print(merged_data.dtypes)

# Display first few rows of the merged dataset after conversion
print("\nFirst few rows of the merged dataset after conversion:")
print(merged_data.head())


Data types of columns after conversion:
Year                                  int64
Month                                object
Annual Average Inflation            float64
12-Month Inflation                  float64
Nominal GDP prices (Ksh Million)    float64
Annual GDP growth (%)               float64
Real GDP prices (Ksh Million)       float64
Volume                               object
Value (Kshs Millions)               float64
dtype: object

First few rows of the merged dataset after conversion:
   Year      Month  Annual Average Inflation  12-Month Inflation  \
0  2022   December                      7.66                9.06   
1  2022   November                      7.38                9.48   
2  2022    October                      7.48                9.59   
3  2022  September                      6.81                9.18   
4  2022     August                      6.61                8.53   

   Nominal GDP prices (Ksh Million)  Annual GDP growth (%)  \
0                     