# Accounting Fun

<p align="center">
<img src="images/accounting.png" width="400" height="400" />
</p>

## Overview
This project is currently under construction. But when complete, it will alleviate the hastle of managing my expenses from a spreadsheet every month. The idea is to maintain my own database of expenses, and then have a machine learning model categorize those expenses. Once we've classified it, we can develop dashboards that will resonate with everyone.
<br />

## Table of Contents TOC<a id='table-of-contents-TOC'></a> 
[Google Colab Instructions](#google-colab-instructions)<br />
[Business Case](#business-case)<br />
[Data Understanding](#data-understanding)<br />
[Data Preparation](#data-preparation)<br />
[Modeling](#modeling)<br />
[Evaluation](#evaluation)<br />
[Key Findings](#key-findings)<br />
[Summary](#summary)<br />

## Google Colab Instructions <a id='google-colab-instructions'></a> 
To run this notebook, you'll need a Kaggle log-in and web access to [Google Colab and link to this notebook](). Google Colab is a free, user-friendly platform to run software, specifically data models. Kaggle is a [website](https://www.kaggle.com/) popular with the data industry that hosts databases and runs data analytics competition. To access the [database]() for this model, you
will need to create a Kaggle account and follow the instructions to download your 'token' and 'key'. This
model will prompt you to have that information.
<br />[return to TOC](#table-of-contents-TOC)

## Business Case <a id='business-case'></a> 

<br />[return to TOC](#table-of-contents-TOC)

## Data Understanding <a id='data-understanding'></a>
<br />[return to TOC](#table-of-contents-TOC)

The data can be found in the following locations:

* [Kaggle]()
* [THis Repository]()

In [1]:
#pip install openpyxl==3.0.7

In [2]:
import os
import pandas as pd
import re

In [3]:
## Data Preparation <a id='data-preparation'></a>
'''
Stored locally in a csv file

Automatically read and put into folder
'''
# Get the list of all files and directories
path = "C:/Users/benne/OneDrive/Desktop/Expenses"

dir_list = os.listdir(path)
print("Files and directories in '", path, "' :")

# prints all files
print(dir_list)

Files and directories in ' C:/Users/benne/OneDrive/Desktop/Expenses ' :
['July', 'Monthly Expenses - 2024.xlsx', 'montly_expenses_2024.xls']


In [4]:
#return month of expenses
def getmonth():
    month = 'July'
    return month

In [5]:
#return account of the four major accounts
def getaccount():
    account = 'capital_one'
    return account

In [6]:
#manual input of path name
folder = 'C:/Users/benne/OneDrive/Desktop/Expenses/'
month = getmonth()
acc = getaccount()
ext = '.csv'

#concatenated path
path = folder+month+'/'+acc+ext

In [14]:
master_path = folder+dir_list[1]

In [35]:
#input the spreadsheet and skip the first few lines
index = 25
df_sheet_map = pd.read_excel(master_path, sheet_name=None, skiprows = list(range(0,index)))

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8
0,Transaction Date,Post Date,Transaction Description,Description,Category,Amount,Type,Account,Frequency
1,,2024-05-01 00:00:00,Fort Tryon Apart WEB PMTS RCPWH WE...,4489 Broadway,Housing,-1600.95,Expense,Chase Checking,Recurring
2,,2024-05-01 00:00:00,JPMORGAN CHASE CHASE ACH PP...,4489 Broadway,Housing,-1218.38,Expense,Chase Checking,Recurring
3,2024-05-08 00:00:00,2024-05-09 00:00:00,HOMEOWNERS INSURANCE,4489 Broadway,Housing,-17.56,Expense,Chase Sapphire,Recurring
4,2024-05-03 00:00:00,,Zelle money received from CHRIS C QUINONES,4489 Broadway,Housing,1000,Income,Capital One,Recurring
5,,2024-05-03 00:00:00,Zelle payment from LUISANNA DELROSARIO 2066447...,4489 Broadway,Housing,1495,Income,Chase Checking,Recurring
6,,2024-05-01 00:00:00,JPMORGAN CHASE CHASE ACH PP...,4489 Broadway,Housing,-940.74,Savings,Chase Checking,Recurring
7,,2024-05-13 00:00:00,Subscription Acorns 1HMGK6 WE...,Acorn,Adjustments & Fees,-5,Expense,Chase Checking,Recurring
8,,2024-05-21 00:00:00,Acorns Invest Transfer 6NH5FG1 WE...,Acorn,Investments,-25,Savings,Chase Checking,Recurring
9,,2024-05-29 00:00:00,Acorns Invest Transfer H1CBKG1 WE...,Acorn,Investments,-25,Savings,Chase Checking,Recurring


In [11]:
#Create dataframe formatted to the account
expenses_test = pd.read_csv(path, usecols=['Transaction Description', 'Transaction Date', 'Transaction Amount', 'Transaction Type'])[['Transaction Date', 'Transaction Description', 'Transaction Amount', 'Transaction Type']]
expenses_test.insert(1, column = 'Post Date', value = '')
expenses_test.insert(3, column = 'Description', value = '')
expenses_test.insert(4, column = 'Category', value = '')
expenses_test.insert(6, column = 'Type', value = 'Expense')
expenses_test.insert(7, column = 'Account', value = acc)
expenses_test.insert(8, column = 'Frequency', value = 'One Time')

## Data Preparation <a id='data-preparation'></a>
As we look at some of these accounts.

In [12]:
#for capital one account
expenses_test.loc[expenses_test['Transaction Type'] == 'Debit', 'Transaction Amount'] *= -1

In [13]:
#create a description_list
expense_list = []
for expense in expenses_test['Transaction Description']:
    description_list = re.sub("[^\w]", " ",  expense).split()
    for description in description_list:
        expense_list.append(description)
    final_list = list(set(expense_list)) 
final_list.sort()

<br />[return to TOC](#table-of-contents-TOC)

## Modeling <a id='modeling'></a> 

<br />[return to TOC](#table-of-contents-TOC)

In [14]:
'''
heirarchy of decision making
1. identify if it's a previous purchase
2. is it a vaction purchase
3. it's a one-time, unidentifiable expense

'''

"\nheirarchy of decision making\n1. identify if it's a previous purchase\n2. is it a vaction purchase\n3. it's a one-time, unidentifiable expense\n\n"

In [15]:
 def binarySearch(arr, x):
    low = 0
    high = len(arr)-1

    while low <= high:
        mid = low + (high - low) // 2
        # Check if x is present at mid
        if arr[mid] == x:
            return mid
        # If x is greater, ignore left half
        elif arr[mid] < x:
            low = mid + 1
        # If x is smaller, ignore right half
        else:
            high = mid - 1

    # If we reach here, then the element
    # was not present
    return -1

In [16]:
def prev_purchase(expense):
    wordlist = re.sub("[^\w]", " ",   expense).split()
    wordlist.remove('Debit')
    wordlist.remove('Purchase')
    wordlist.remove('Card')
    for word in wordlist:
        search = binarySearch(final_list, word)
        if search != -1:
            return final_list[search]
    return False  


In [17]:
expense = expenses_test.loc[54,'Transaction Description']

purchase = prev_purchase(expense)
purchase

'STEAMSHIP'

In [18]:
'''
vacation purchase identification
1. one time purchases within the dates of the vacation
2. air travel or hotel spend
'''

'\nvacation purchase identification\n1. one time purchases within the dates of the vacation\n2. air travel or hotel spend\n'

In [19]:
'''
shopping, expense, with the description being the first few words after debit card purchase
'''

'\nshopping, expense, with the description being the first few words after debit card purchase\n'

## Evaluation<a id='evaluation'></a>

<br />[return to TOC](#table-of-contents-TOC)

## Key Findings<a id='key-findings'></a>

<br />[return to TOC](#table-of-contents-TOC)

## Summary<a id='summary'></a>

### Next Steps:
#### Additional Data

#### Test UI Prompts

#### Try Calorie Counting

<br />[return to TOC](#table-of-contents-TOC)