## PERSONAL FINANCE AGENCY

We have come up with a project to assist those in remote areas . In essence, we porvide a method of analysing personal mobile finances using deep machine learning and also use of Classification to understand which customers or members belong to which financial level .

## Problem Statement

Small businesses and individuals typically experience a fragmented and error-prone financial management process. Relying on a mix of spreadsheets, paper documents, and various non-integrated software programs generates a series of mission-critical issues:

Lack of Centralization: Financial data like income, expenses, invoices, and budgets are scattered among disparate sources. It is difficult to get a real-time, unified view of one's financial standing.

Manual Data Entry and Human Error: Manual data entry necessity is not just time-consuming, it is also subject to human error. This leads to inaccurate financial reporting, faulty budget forecasting, and compliance issues.

Poor Financial Visibility: Without a centralized system, spending patterns are hard to track, areas for cost reduction are hard to identify, or future cash flow cannot be accurately forecasted. This lack of visibility hinders effective decision-making and strategic planning.

Inefficient Reporting: Preparing reports for tax, stakeholders, or personal analysis is a time-consuming and tedious process. The data must be gathered, sorted, and formatted by hand, delaying the delivery of vital information.

Security Threats: Storing sensitive financial data in unencrypted spreadsheets or paper files subjects it to significant security risks, where it can be viewed or lost without permission.

The new financial management system aims to overcome these difficulties by offering a centralized, automated, and secure system for all financial operations. It will consolidate data management, automate critical processes, and provide robust analysis and reporting tools, allowing users to make better-informed and more strategic financial decisions.

## DATA UNDERSTANDING 

Using data gathered around and tested and approved our program , we were able to come up wwith data for analysis purposes and also to test out our finance tracking program .

To easily and fasten the process of understanding what our data contains . It contains the following columns :
1. user_id Unique user identifier
2. age Age of individual (18–70)
3. gender Gender (Male/Female/Other)
4. education_level Highest education level
5. employment_status Employment type (e.g. Employed, Student)
6. job_title Job title or role
7. monthly_income_usd Approx. monthly income in USD
8. monthly_expenses_usd Approx. monthly expenses in USD
9. savings_usd Total savings
10. has_loan Whether individual has a loan (Yes/No)
11. loan_type Type of loan (if any)
12. loan_amount_usd Loan principal amount
13. loan_term_months Duration of loan
14. monthly_emi_usd Monthly installment (EMI)
15. loan_interest_rate_pct Interest rate on loan (%)
16. debt_to_income_ratio Ratio of debt payments to income
17. credit_score Synthetic credit score (300–850)
18. avings_to_income_ratio Ratio of savings to annual income
19. region Geographic region
20. record_date Record creation date.

In essence , most of this columns won't be required especially for analytical purposes so we will be required to drop them for the impact they will provide for our program at large.

## DATA EXPLORATION

In [1]:
#import necessary libraries 
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 
 
from sklearn.preprocessing import StandardScaler , LabelEncoder
from sklearn.model_selection import train_test_split , cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report , roc_auc_score , r2_score


In [2]:
#load and open the data 

data = pd.read_csv(r"C:/Users/Ryan/Documents/Finance_project/synthetic_personal_finance_dataset.csv")

data.head()

Unnamed: 0,user_id,age,gender,education_level,employment_status,job_title,monthly_income_usd,monthly_expenses_usd,savings_usd,has_loan,loan_type,loan_amount_usd,loan_term_months,monthly_emi_usd,loan_interest_rate_pct,debt_to_income_ratio,credit_score,savings_to_income_ratio,region,record_date
0,U00001,56,Female,High School,Self-employed,Salesperson,3531.69,1182.59,367655.03,No,,0.0,0,0.0,0.0,0.0,430,8.68,Other,2024-01-09
1,U00002,19,Female,PhD,Employed,Salesperson,3531.73,2367.99,260869.1,Yes,Education,146323.34,36,4953.5,13.33,1.4,543,6.16,North America,2022-02-13
2,U00003,20,Female,Master,Employed,Teacher,2799.49,1003.91,230921.21,No,,0.0,0,0.0,0.0,0.0,754,6.87,Africa,2022-05-12
3,U00004,25,Male,PhD,Employed,Manager,5894.88,4440.12,304815.51,Yes,Business,93242.37,24,4926.57,23.93,0.84,461,4.31,Europe,2023-10-02
4,U00005,53,Female,PhD,Employed,Student,5128.93,4137.61,461509.48,No,,0.0,0,0.0,0.0,0.0,516,7.5,Africa,2021-08-07


In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32424 entries, 0 to 32423
Data columns (total 20 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   user_id                  32424 non-null  object 
 1   age                      32424 non-null  int64  
 2   gender                   32424 non-null  object 
 3   education_level          32424 non-null  object 
 4   employment_status        32424 non-null  object 
 5   job_title                32424 non-null  object 
 6   monthly_income_usd       32424 non-null  float64
 7   monthly_expenses_usd     32424 non-null  float64
 8   savings_usd              32424 non-null  float64
 9   has_loan                 32424 non-null  object 
 10  loan_type                12995 non-null  object 
 11  loan_amount_usd          32424 non-null  float64
 12  loan_term_months         32424 non-null  int64  
 13  monthly_emi_usd          32424 non-null  float64
 14  loan_interest_rate_pct

## DATA ANALYSIS

Using the information given above , we can now begin looking at how we can manipulate and analyse our dataset in various ways , i.e removal of duplicates , imputation of null and missing values , dropping of unrequired columns and many more imputation procedures .


Inoorder to not affect the main dataset , we will create a coopy of the original dataset to use for analysis purposes . 

In [4]:
df1 = data.copy()

df1.tail()

Unnamed: 0,user_id,age,gender,education_level,employment_status,job_title,monthly_income_usd,monthly_expenses_usd,savings_usd,has_loan,loan_type,loan_amount_usd,loan_term_months,monthly_emi_usd,loan_interest_rate_pct,debt_to_income_ratio,credit_score,savings_to_income_ratio,region,record_date
32419,U32420,30,Female,High School,Employed,Salesperson,4266.87,1510.82,273669.7,Yes,Car,498400.74,120,6227.54,8.68,1.46,434,5.34,Europe,2024-02-25
32420,U32421,51,Female,Master,Employed,Student,5725.78,4965.02,17247.57,Yes,Home,83602.57,12,7605.13,16.5,1.33,453,0.25,North America,2025-06-06
32421,U32422,18,Female,Bachelor,Self-employed,Doctor,3282.38,2243.77,22081.21,No,,0.0,0,0.0,0.0,0.0,391,0.56,Other,2025-07-20
32422,U32423,36,Other,High School,Self-employed,Accountant,5035.99,4054.32,524039.88,No,,0.0,0,0.0,0.0,0.0,596,8.67,Asia,2022-06-07
32423,U32424,39,Female,Master,Employed,Engineer,4410.19,2866.47,176985.54,No,,0.0,0,0.0,0.0,0.0,689,3.34,North America,2023-08-01
