# Accounting Fun

<p align="center">
<img src="images/accounting.png" width="400" height="400" />
</p>

## Overview
This project is currently under construction. But when complete, it will alleviate the hastle of managing my expenses from a spreadsheet every month. The idea is to maintain my own database of expenses, and then have a machine learning model categorize those expenses. Once we've classified it, we can develop dashboards that will resonate with everyone.
<br />

## Table of Contents TOC<a id='table-of-contents-TOC'></a> 
[Google Colab Instructions](#google-colab-instructions)<br />
[Business Case](#business-case)<br />
[Data Understanding](#data-understanding)<br />
[Data Preparation](#data-preparation)<br />
[Modeling](#modeling)<br />
[Evaluation](#evaluation)<br />
[Key Findings](#key-findings)<br />
[Summary](#summary)<br />

## Google Colab Instructions <a id='google-colab-instructions'></a> 
To run this notebook, you'll need a Kaggle log-in and web access to [Google Colab and link to this notebook](). Google Colab is a free, user-friendly platform to run software, specifically data models. Kaggle is a [website](https://www.kaggle.com/) popular with the data industry that hosts databases and runs data analytics competition. To access the [database]() for this model, you
will need to create a Kaggle account and follow the instructions to download your 'token' and 'key'. This
model will prompt you to have that information.
<br />[return to TOC](#table-of-contents-TOC)

## Business Case <a id='business-case'></a> 

<br />[return to TOC](#table-of-contents-TOC)

## Data Understanding <a id='data-understanding'></a>
<br />[return to TOC](#table-of-contents-TOC)

The data can be found in the following locations:

* [Kaggle]()
* [THis Repository]()

In [42]:
#pip install openpyxl==3.0.7

In [43]:
# relevant files for import
import os
import pandas as pd
import re

In [44]:
## Data Preparation <a id='data-preparation'></a>
'''
Stored locally in a csv file

Automatically read and put into folder
'''
# Get the list of all files and directories
path = "C:/Users/benne/OneDrive/Desktop/Expenses"

dir_list = os.listdir(path)
print("Files and directories in '", path, "' :")

# prints all files
print(dir_list)

Files and directories in ' C:/Users/benne/OneDrive/Desktop/Expenses ' :
['July', 'Monthly Expenses - 2024.xlsx', 'montly_expenses_2024.xls']


In [45]:
#return month of expenses
def getmonth():
    month = 'July'
    return month

In [46]:
#return account of the four major accounts
def getaccount():
    account = 'capital_one'
    return account

In [47]:
#manual input of path name
folder = 'C:/Users/benne/OneDrive/Desktop/Expenses/'
month = getmonth()
acc = getaccount()
ext = '.csv'

#concatenated path
path = folder+month+'/'+acc+ext

In [48]:
#create master path
master_path = folder+dir_list[1]

In [49]:
#input the spreadsheet and skip the first few lines
index = 26
df_sheet_map = pd.read_excel(master_path, sheet_name=None, skiprows = list(range(0,index)))

In [50]:
month_list = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

In [51]:
cols = ['Transaction Date', 'Post Date', 'Transaction Description', 'Description', 'Category', 'Amount', 'Type', 'Account', 'Frequency']
expense_data = pd.DataFrame(columns=cols)

In [61]:
#create a database for expenses
exp_list = [expense_data]

for month in month_list:
    exp_list.append(df_sheet_map[month])
expense_list = pd.concat(exp_list)

In [60]:
expense_list.index

Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
       ...
       202, 203, 204, 205, 206, 207, 208, 209, 210, 211],
      dtype='int64', length=1711)

In [63]:
description_list = list(set(expense_data['Description']))

New Month

In [11]:
#Create dataframe formatted to the account
expenses_test = pd.read_csv(path, usecols=['Transaction Description', 'Transaction Date', 'Transaction Amount', 'Transaction Type'])[['Transaction Date', 'Transaction Description', 'Transaction Amount', 'Transaction Type']]
expenses_test.insert(1, column = 'Post Date', value = '')
expenses_test.insert(3, column = 'Description', value = '')
expenses_test.insert(4, column = 'Category', value = '')
expenses_test.insert(6, column = 'Type', value = 'Expense')
expenses_test.insert(7, column = 'Account', value = acc)
expenses_test.insert(8, column = 'Frequency', value = 'One Time')

## Data Preparation <a id='data-preparation'></a>
As we look at some of these accounts.

In [12]:
#for capital one account
expenses_test.loc[expenses_test['Transaction Type'] == 'Debit', 'Transaction Amount'] *= -1

In [31]:
#create a description_list
expense_list = []
for expense in expense_data['Description']:
    description_list = re.sub("[^\w]", " ",  expense).split()
    for description in description_list:
        expense_list.append(description)
    final_list = list(set(expense_list)) 
final_list.sort()

TypeError: expected string or bytes-like object

In [25]:
final_list

['00',
 '000024547396075',
 '00005447',
 '001',
 '0010247326146',
 '00121376',
 '0012137612',
 '003699',
 '006648',
 '01',
 '01220003',
 '02',
 '0204',
 '03',
 '03596',
 '04',
 '0465',
 '04685',
 '05',
 '06',
 '0619',
 '0642',
 '07',
 '0877',
 '09',
 '0906',
 '0GDR8',
 '1',
 '10',
 '1000008113',
 '10077519134',
 '1016',
 '10173353084',
 '102',
 '10277525924',
 '103',
 '1033936540279',
 '1034826001',
 '1034938203752',
 '1035272246909',
 '1035272263617',
 '1035698362189',
 '1035777397001',
 '1041',
 '10639',
 '11',
 '111',
 '1146191650',
 '1148',
 '123',
 '12N9MYNNCCYL',
 '1510394779',
 '1545',
 '16',
 '1658133182',
 '1800948598',
 '189',
 '19350844888',
 '19438002519',
 '19453056412',
 '19471788284',
 '19539228544',
 '19539298071',
 '19628',
 '19668085723',
 '19748517375',
 '19754789081',
 '19929374483',
 '19946385302',
 '19950106870',
 '19953',
 '1HMGK6',
 '1N5JPG1',
 '1NB76D1',
 '2',
 '20',
 '20017657448',
 '20041592532',
 '20347098566',
 '20552156835',
 '20560487118',
 '20593453023',

<br />[return to TOC](#table-of-contents-TOC)

## Modeling <a id='modeling'></a> 

<br />[return to TOC](#table-of-contents-TOC)

In [14]:
'''
heirarchy of decision making
1. identify if it's a previous purchase
2. is it a vaction purchase
3. it's a one-time, unidentifiable expense

'''

"\nheirarchy of decision making\n1. identify if it's a previous purchase\n2. is it a vaction purchase\n3. it's a one-time, unidentifiable expense\n\n"

In [28]:
 def binarySearch(arr, x):
    low = 0
    high = len(arr)-1

    while low <= high:
        mid = low + (high - low) // 2
        # Check if x is present at mid
        if arr[mid] == x:
            return mid
        # If x is greater, ignore left half
        elif arr[mid] < x:
            low = mid + 1
        # If x is smaller, ignore right half
        else:
            high = mid - 1

    # If we reach here, then the element
    # was not present
    return -1

In [29]:
#determines if a word in the purchase list is located
def prev_purchase(expense):
    wordlist = re.sub("[^\w]", " ",   expense).split()
    wordlist.remove('Debit')
    wordlist.remove('Purchase')
    wordlist.remove('Card')
    for word in wordlist:
        search = binarySearch(final_list, word)
        if search != -1:
            return final_list[search]
    return False  


In [30]:
expense = expenses_test.loc[54,'Transaction Description']

purchase = prev_purchase(expense)
purchase

NameError: name 'expenses_test' is not defined

In [18]:
'''
vacation purchase identification
1. one time purchases within the dates of the vacation
2. air travel or hotel spend
'''

'\nvacation purchase identification\n1. one time purchases within the dates of the vacation\n2. air travel or hotel spend\n'

In [19]:
'''
shopping, expense, with the description being the first few words after debit card purchase
'''

'\nshopping, expense, with the description being the first few words after debit card purchase\n'

## Evaluation<a id='evaluation'></a>

<br />[return to TOC](#table-of-contents-TOC)

## Key Findings<a id='key-findings'></a>

<br />[return to TOC](#table-of-contents-TOC)

## Summary<a id='summary'></a>

### Next Steps:
#### Additional Data

#### Test UI Prompts

#### Try Calorie Counting

<br />[return to TOC](#table-of-contents-TOC)