# SMALL BUSINESS PERFORMANCE ANALYSIS IN GHANA

### Business Understanding

#### Overview

Small businesses in Ghana play a crucial role in driving the country’s economic growth, contributing significantly to GDP and employment. However, many of these businesses operate under financial constraints, lack access to data-driven decision-making, and face challenges in profitability and sustainability. This project analyzes a dataset simulating operational and financial records of small businesses across various regions in Ghana to uncover performance patterns and identify key factors influencing success.



#### Objectives

1. Understand the operational and financial structure of small businesses in Ghana.
2. Identify and address data quality issues such as missing values and inconsistent formats.
3. Engineer meaningful features to enhance model performance and insight generation.
4. Apply data preprocessing techniques like scaling, encoding, and normalization.
5. Generate actionable insights through visual analytics.
6. Answer key business questions using machine learning and AI techniques.



#### Problem Statement

Small businesses in Ghana often lack the analytical tools to understand what drives or hinders their performance. By exploring and modeling this data, we aim to identify which factors (e.g., region, business type, education of owners, advertising spend) most significantly influence profitability, customer satisfaction, and operational efficiency.

---

#### Stakeholders

* **Small Business Owners:** Want to understand what contributes to profitability and growth.
* **Policy Makers and Government Agencies:** Need insights for creating policies and support systems for SMEs.
* **Financial Institutions and NGOs:** Use data to assess risk and fund businesses effectively.
* **Data Analysts/Data Scientists:** Responsible for analyzing, cleaning, modeling, and interpreting the data.

---

#### Features (Key Parts of the Data)

* **Numerical Features:** `revenue`, `expenses`, `advertising`, `employee_count`, `customer_satisfaction`.
* **Categorical Features:** `region`, `business_type`, `owner_education`, `ownership_type`.
* **Derived Features (to be created):** `profit` (revenue - expenses), `profit_margin`, `profit_per_employee`.

---

#### Hypothesis

1. Businesses with higher advertising expenditure tend to have higher profit margins.
2. Owner education level positively correlates with business performance.
3. Customer satisfaction is higher in certain business types or regions.
4. Businesses in urban regions perform better financially than those in rural areas.
5. Higher employee count does not always translate to higher profit per employee.

---

#### 7 Business Questions (ML/AI-Driven) 

1. **Regression:** Can we predict the **profit** of a small business based on its revenue, expenses, region, and other features?
2. **Classification:** Can we classify businesses as **profitable or not** based on operational metrics?
3. **Clustering:** Can we segment businesses into meaningful **groups (clusters)** based on their financial and operational profiles?
4. **Recommendation:** Which type of **advertising strategy** yields better profit outcomes across business types?
5. **Customer Insight:** What features contribute to **higher customer satisfaction** scores?
6. **Feature Importance:** Which features are most influential in predicting **business success**?
7. **Anomaly Detection:** Can we detect **underperforming or risky businesses** based on outliers in profit or revenue?




### Data Understanding & Preparation
Importing all the relevant libraries

In [1]:
# Data manipulation and analysis
import pandas as pd
import numpy as np

# To load multiple files
import glob 

# Data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Statistical analysis
from scipy import stats

# Date and time handling
from datetime import datetime

# Geospatial analysis (if needed for visualizing trade routes)
# import geopandas as gpd

# Machine learning (if needed for predictive modeling)
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# For handling large datasets (if needed)
# import dask.dataframe as dd

# For interactive visualizations (optional)
import plotly.express as px
import plotly.graph_objects as go

# For data profiling- pandas (optional)
#import ydata_profiling
#from ydata_profiling import ProfileReport


# For handling missing data
from sklearn.impute import SimpleImputer

# For encoding categorical variables
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

# For advanced visualizations (optional)
#import altair as alt

# For working with Excel files (if your data is in Excel format)
#import openpyxl

# For reading data from different file formats
# import pyarrow

# For working with large CSV files
import csv

# For system operations
import os
import sys

# For progress bars in data processing
from tqdm import tqdm

# Set plotting style
# plt.style.use('seaborn')

### Load all datasets from their sources

In [2]:
# Path of csv file
file_path = '../SBPG_Data/small_business_ghana.csv'
 
# Check if the file exists at the specified path
if os.path.exists(file_path):
    print("File exists at the specified path.")
    try:
        # Read the Excel file into a pandas DataFrame
        df_Small_Business = pd.read_csv(file_path)
       
    except FileNotFoundError as e:
        print(f"FileNotFoundError: {e}")
    except Exception as e:
        print(f"An error occurred: {e}")
else:
    print("File does not exist at the specified path.")
 
# Display the DataFrame
df_Small_Business.head()

File exists at the specified path.


Unnamed: 0,business_type,region,revenue,expenses,profit_margin,years_in_operation,owner_education,employee_count,advertising,customer_satisfaction,credit_access,sector_growth
0,Services,Greater Accra,42584.82,21758.31,21.441057,19,Secondary,74,503.67,5,Yes,16.34
1,Manufacturing,Volta,,4188.66,26.951441,4,No Formal Education,98,3434.45,1,Yes,16.04
2,Retail,Western,35736.64,24012.47,43.286788,1,Tertiary,57,3307.1,3,Yes,14.18
3,Services,Volta,28088.18,19916.94,8.954943,8,Secondary,90,1438.97,5,No,13.7
4,Services,Western,15529.0,36193.86,41.26392,23,Secondary,41,4759.23,5,No,-0.86


### Exploratory Data Analysis(EDA) 

In [None]:
df_Small_Business.info()