### 📊 *Business Understanding*

#### A) *Overview*

Small businesses in Ghana play a crucial role in driving the country’s economic growth, contributing significantly to GDP and employment. However, many of these businesses operate under financial constraints, lack access to data-driven decision-making, and face challenges in profitability and sustainability. This project analyzes a dataset simulating operational and financial records of small businesses across various regions in Ghana to uncover performance patterns and identify key factors influencing success.

---

#### B) *Objectives*

1. Understand the operational and financial structure of small businesses in Ghana.
2. Identify and address data quality issues such as missing values and inconsistent formats.
3. Engineer meaningful features to enhance model performance and insight generation.
4. Apply data preprocessing techniques like scaling, encoding, and normalization.
5. Generate actionable insights through visual analytics.
6. Answer key business questions using machine learning and AI techniques.

---

#### C) *Problem Statement*

Small businesses in Ghana often lack the analytical tools to understand what drives or hinders their performance. By exploring and modeling this data, we aim to identify which factors (e.g., region, business type, education of owners, advertising spend) most significantly influence profitability, customer satisfaction, and operational efficiency.

---

#### D) *Stakeholders*

* *Small Business Owners:* Want to understand what contributes to profitability and growth.
* *Policy Makers and Government Agencies:* Need insights for creating policies and support systems for SMEs.
* *Financial Institutions and NGOs:* Use data to assess risk and fund businesses effectively.
* *Data Analysts/Data Scientists:* Responsible for analyzing, cleaning, modeling, and interpreting the data.

---

#### E) *Features (Key Parts of the Data)*

* *Numerical Features:* revenue, expenses, advertising, employee_count, customer_satisfaction.
* *Categorical Features:* region, business_type, owner_education, ownership_type.
* *Derived Features (to be created):* profit (revenue - expenses), profit_margin, profit_per_employee.


In [1]:
# This is cell takes all imports



import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt

In [15]:
# Load data
df_Small_Business = pd.read_csv('./../Data/small_business_ghana.csv')  

df_Small_Business.head(5)



Unnamed: 0,business_type,region,revenue,expenses,profit_margin,years_in_operation,owner_education,employee_count,advertising,customer_satisfaction,credit_access,sector_growth
0,Services,Greater Accra,42584.82,21758.31,21.441057,19,Secondary,74,503.67,5,Yes,16.34
1,Manufacturing,Volta,,4188.66,26.951441,4,No Formal Education,98,3434.45,1,Yes,16.04
2,Retail,Western,35736.64,24012.47,43.286788,1,Tertiary,57,3307.1,3,Yes,14.18
3,Services,Volta,28088.18,19916.94,8.954943,8,Secondary,90,1438.97,5,No,13.7
4,Services,Western,15529.0,36193.86,41.26392,23,Secondary,41,4759.23,5,No,-0.86


In [16]:
#Check for missing values
df_Small_Business.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 12 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   business_type          200 non-null    object 
 1   region                 200 non-null    object 
 2   revenue                180 non-null    float64
 3   expenses               180 non-null    float64
 4   profit_margin          200 non-null    float64
 5   years_in_operation     200 non-null    int64  
 6   owner_education        200 non-null    object 
 7   employee_count         200 non-null    int64  
 8   advertising            200 non-null    float64
 9   customer_satisfaction  200 non-null    int64  
 10  credit_access          200 non-null    object 
 11  sector_growth          200 non-null    float64
dtypes: float64(5), int64(3), object(4)
memory usage: 18.9+ KB


In [17]:
#Check for summary statistics
df_Small_Business.describe()

Unnamed: 0,revenue,expenses,profit_margin,years_in_operation,employee_count,advertising,customer_satisfaction,sector_growth
count,180.0,180.0,200.0,200.0,200.0,200.0,200.0,200.0
mean,25464.125889,21837.6885,28.074339,12.235,52.13,2508.6518,2.975,4.9318
std,13845.475401,12603.525576,12.440702,7.373031,29.722386,1437.087128,1.447359,8.990995
min,1254.06,615.48,5.12199,1.0,2.0,147.88,1.0,-9.67
25%,13395.865,11740.8425,18.163216,5.75,28.75,1208.11,2.0,-3.525
50%,25318.67,21799.855,28.992051,11.5,51.5,2496.745,3.0,4.865
75%,36916.74,31800.3975,38.510639,19.0,77.25,3686.655,4.0,12.975
max,49110.2,44814.5,49.896168,24.0,99.0,4982.04,5.0,19.9


In [19]:
#Find all the colunms in the dataset
columns = df_Small_Business.columns

for column in columns:
    print(f'{column}')  
    print(f'There are {df_Small_Business[column].unique().size} unique values')  
    print(f'{df_Small_Business[column].unique()}')  
    print('_' * 80)

business_type
There are 4 unique values
['Services' 'Manufacturing' 'Retail' 'Agriculture']
________________________________________________________________________________
region
There are 5 unique values
['Greater Accra' 'Volta' 'Western' 'Northern' 'Ashanti']
________________________________________________________________________________
revenue
There are 181 unique values
[42584.82      nan 35736.64 28088.18 15529.   21569.26 13554.14  4998.11
  1254.06 10519.42  4476.1  20442.41  3487.66 44444.24  2353.22 29364.38
 22485.23 33929.28 17079.48  8597.04 49110.2  42107.74 43159.83 13262.32
 15860.01 27317.04 17005.91 41565.58 14305.6  48297.34 23405.99 42259.13
 10524.62 21156.34 35276.1   7779.3  48507.31 36015.16 20542.22 22242.52
 37458.09 13292.17 10032.35  4962.78 21987.41 34736.5   3851.49 45845.47
 22675.26 12749.58  5599.79  9960.43 46796.09 26318.12 33198.46 22347.97
 36771.93  3338.09 28735.82  8773.68 17752.1   5498.15  5613.69 16259.25
 48996.02  9591.18  1840.89 38404.86