Skip to content

A collection of Python libraries for mining personal data. Intended to be used from an Emacs org-babel notebook, or, in an IPython REPL.

License

Notifications You must be signed in to change notification settings

bgutter/personal-data-mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Personal Data Mining

This repo consists of a collection of Python packages that I used to analyze my personal data. Generally speaking, they’re intended to be used either in an Emacs org-file using org-babel, or, in an IPython REPL.

Cash Account Exports

cash_reader is a small convenience library for analyzing CSV exports from Mint.com and Tiller Money. Under the hood, it’s mostly just Pandas.

Quick Start

from cash_ledger import *

# Read in the transactions
# The numbers in all examples below have been replaced with 1's and 2's at random
transactions = CashLedger.from_mint( "./transactions-from-mint.csv" )

# Remove any zero-sum transfer transaction pairs (moving cash between accounts, credit card payments, etc)
# I always do this immediately after loading data from file, because I never care about these transactions, and
# it's a little too slow to run on each query.
transactions = transactions.transfers( invert=True )

# Where did most of my discretionary spending occur in 2020?
transactions.in_accounts(["Mulligan Bank", "Discover"]).expenses().in_year( 2020 ).in_categories( "Shopping" ).by_description().totals().sort_values()
# Out[305]:
# description
# Amazon                                  -1212.22
# Target                                  -1212.11
# Ab Abebooks                              -121.12
# Name: amount, dtype: float64

# Aside from the augmented APIs we'll show below, the transactions object can be treated as a pandas DataFrame
# We can use the pandas DataFrame.sample() method to get 5 random transactions
transactions.sample( 5 )
# Out[232]:
#                                          description                               original_description             category             account  Labels  Notes   amount       date
# 8078                                    Examplething  EXAMPLE THING numbers 34234 HERE 32323023=23=2...             Shopping       Mulligan Bank     NaN    NaN  -221.00 2015-11-01
# 2209                                          Amazon               AMAZON GO AMZN.COM/BILLWA12121212212             Shopping            Discover     NaN    NaN   -12.12 2019-09-24
# 167   Internet transfer to Interest Checking account  Internet transfer to Interest Checking account...             Transfer  Some Checking Acct     NaN    NaN -1212.00 2020-12-06
# 1847                                           Geico     GEICO *AUTO 1212121212121212121212121212121212       Auto Insurance            Discover     NaN    NaN   -12.12 2019-12-05
# 1583                                     - THANK YOU                         ONLINE PAYMENT - THANK YOU  Credit Card Payment       Mulligan Bank     NaN    NaN   121.12 2020-01-31

# Get yearly total income
transactions.income().yearly().totals()
# Out[215]:
# date
# 2013     12121.12
# 2014     21211.12
# 2015    212121.12
# 2016    222111.11
# 2017    212121.12
# 2018    121221.21
# 2019    212221.22
# 2020    212212.22
# Name: amount, dtype: float64

# Get all money spent at Walmart (and wal-mart, WAL-MART, etc)
transactions.search( "wal.*mart" ).expenses().total()
# Out[212]: -1212.12

# Get all spending per category in 2018
transactions.when( after="1/1/2018", before="1/1/2019" ).expenses().by_category().totals()
# Out[213]:
# category
# Advertising           -12.12
# Air Travel          -1212.12
# Alcohol & Bars      -1212.12
# Amusement             -12.12
# Auto & Transport     -121.12
#                       ...
# Travel                -12.12
# Tuition              -121.12
# Utilities           -1212.12
# Vacation              -12.12
# Name: amount, Length: 21, dtype: float64

# Get all accounts which were expensed in summer 2013
transactions.when( after="5/1/2013", before="8/1/2013" ).expenses().accounts()
# Out[221]:
# array(['FREE CHECKING x121212', 'Discover', 'STATEMENT SAVINGS x121212'],
#       dtype=object)

# Get all veterinary expenses in two accounts by year
transactions.expenses().in_accounts( [ "Mulligan Bank", "Discover" ] ).in_categories( "Veterinary" ).yearly().totals()
# Out[234]:
# date
# 2017   -121.12
# 2019   -121.22
# 2020   -222.12
# Name: amount, dtype: float64

# Set the category for all income from the "Discover" account to "Cash Back"
# Then recategorize transactions in any account named like "EXX..." as a Student Loan payment.
# Then set the category for all transactions matching "ford" before 2016 as "Auto Maintenance"
transactions.recategorize( transactions.income().in_accounts( "Discover" ), "Cash Back", inplace=True )
transactions.recategorize( transactions.account_like( "EXXX.*" ), "Student Loan", inplace=True )
transactions.recategorize( transactions.expenses().when( before="1/1/2017" ).search( "ford" ), "Auto Maintenance", inplace=True )

# Get total spending per vendor (roughly) since the start of 2020
transactions.when( after="1/1/2020" ).expenses().by_description().totals().sort_values()

# Remembering that dataframe APIs are available, note that you can use .to_csv() to inspect any transaction subsets
transactions.search( "conoco" ).when( before="4/23/2015" ).to_csv( "./old-car-gas-purcahses.csv" )

API Overview

See source comments for full documentation.

Aside from what is documented here, any valid pandas.DataFrame operation can be applied to a TransactionsExport object.

Similarly, any valid pandas.GroupBy operation may be applied to a TransactionsExportCollection object (which is returned by all of the grouping APIs).

Filtering APIs

APIDescription
search( regex, invert=False )Keep only transactions whose description (original or final) contains regex. Invertible.
account_like( regex, invert=False )Keep only transactions whose account contains regex. Invertible.
income()Keep only transactions whose amount is more than zero.
expenses()Keep only transactions whose amount is less than or equal to zero.
transfers( invert=False, time_window=None )Keep only transactions which are part of a transfer pair. Invertible.
when( after=None, before=None, invert=False )Keep only transactions which occur in a time range. Invertible.
in_year( year )Keep only transactions which occur in a particular year.
with_amount( above=None, below=None, invert=False )Keep only transactions with occur in an amount range. Invertible.
in_accounts( account_or_accounts, invert=False )Keep only transactions occurring in a set of accounts. Invertible.
in_categories( category_or_categories, invert=False )Keep only transactions occurring in a set of categories. Invertible.

Editing APIs

APIDescription
recategorize( transaction_subset, new_category )Change the category for a selection of transactions.

Descriptive APIs

APIDescription
accounts()Get all unique accounts referenced in the current transaction set.
categories()Get all unique categories referenced in the current transaction set.
descriptions()Get all unique descriptions referenced in the current transaction set.
original_descriptions()Get all unique original descriptions referenced in the current transaction set.
total()Get the sum of all amounts of all transactions in the current set.

Grouping APIs

All grouping APIs return a TransactionsExportCollection.

APIDescription
by_category()Group transactions by category.
by_account()Group transactions by account.
by_description()Group transactions by description.
by_original_description()Group transactions by original description.
yearly()Group transactions by calendar year.
monthly()Group transactions by month.
weekly()Group transactions by week.
daily()Group transactions by day.

Grouped (TransactionsExportCollection) APIs

APIDescription
totals()Applies “.total()” to each group.
transaction_counts()Get the number of transactions in each group.

Stock Account Tracking

Similar to cash account API, documentation is TODO

About

A collection of Python libraries for mining personal data. Intended to be used from an Emacs org-babel notebook, or, in an IPython REPL.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages