The purpose of this notebook is to clean the excel spreadsheets I inputted manually from my notebooks.

In [13]:
import pandas as pd
import numpy as np

# Mileage

In [2]:
df = pd.read_excel('mileage.xlsx')
df =  df.iloc[:, :6]
df.head()

Unnamed: 0,day,city,state,mileage,time,part
0,2019-02-17,san diego,ca,41491.0,midday,1
1,2019-02-17,sonoran desert national monument,az,41889.0,night,1
2,2019-02-18,mesa,az,41948.0,midday,1
3,2019-02-18,tucson,az,42067.0,evening,1
4,2019-02-18,indian bread rocks blm,az,42179.0,night,1


## How many days was I on the road for?

Some days have NaN mileage, sometimes because I didn't drive that day but sometimes because I didn't write down the mileage.

In [5]:
driving_days = []
for d, subdf in df.groupby('day'):
    if subdf.dropna().shape[0] > 0:
        driving_days.append(d)

In [6]:
len(driving_days), len(df['day'].unique())

(46, 61)

In [18]:
no_mileage_df = df.loc[df['mileage'].isna(), :]
not_driving_days = [i for i in no_mileage_df['day'].tolist() if i not in driving_days]
len(not_driving_days)

16

So there are 16 days without any driving. I'm skeptical that the number is this high, but let's see...

In [19]:
df.query('day == @not_driving_days')

Unnamed: 0,day,city,state,mileage,time,part
27,2019-04-18,hot springs,ar,,afternoon,2
36,2019-04-24,flagstaff,az,,evening,2
39,2019-04-27,south rim grand canyon national park,az,,allday,2
47,2019-05-03,bryce canyon,ut,,afternoon,2
48,2019-05-03,dixie national forest,ut,,night,2
49,2019-05-04,bryce canyon,ut,,morning,2
61,2019-05-11,grand lake,co,,allday,2
62,2019-05-12,grand lake,co,,allday,2
63,2019-05-13,grand lake,co,,allday,2
64,2019-05-14,denver,co,,evening,2


Cross-checking with blog, here's what I get:

Correct no-driving days:

2019-04-24, 2019-04-27, 2019-05-03, 2019-05-11, 2019-05-12, 2019-05-13, 2019-05-18, 2019-05-25, 2019-05-26, 2019-05-28, 2019-05-30


- 2019-04-27: hiking the grand canyon (technically did drive quite a bit in the park, but we'll count it)
- 2019-05-03: correct, hiking bryce
- 2019-05-18: drove through badlands, but we'll leave it as a "no driving" day
- 2019-05-25, 2019-05-26: memorial day weekend at sheldon wildlife refuge
- 2019-05-28: day in susanville
- 2019-05-30: day in oakland

Incorrect no-driving days:

2019-04-18, 2019-05-04, 2019-05-14, 2019-05-17

- 2019-04-18: Ozarks –> Hot Springs, AR –> Ouachita Forest
- 2019-05-04: Bryce Canyon –> Grand Staircase Escalante National Monument
- 2019-05-14: Grand Lake --> Denver
- 2019-05-17: Fort Robinson State Park, NE –> Badlands National Park, SD

Now that we have that sorted, let's just create a new column called driving day or not.

In [20]:
to_remove = [pd.Timestamp('2019-04-18'),
             pd.Timestamp('2019-05-04'),
             pd.Timestamp('2019-05-14'),
             pd.Timestamp('2019-05-17')]
print(len(not_driving_days))
not_driving_days = [i for i in not_driving_days if i not in to_remove]
print(len(not_driving_days))

16
12


In [25]:
df['driving_day'] = True
df.loc[df.query('day == @not_driving_days').index, 'driving_day'] = False
df.tail()

Unnamed: 0,day,city,state,mileage,time,part,driving_day
93,2019-05-30,oakland,ca,,evening,2,False
94,2019-05-31,oakland,ca,54874.0,midday,2,True
95,2019-06-01,san carlos,ca,54954.0,afternoon,2,True
96,2019-06-01,los padres national forest,ca,55115.0,night,2,True
97,2019-06-02,san diego,ca,55521.0,evening,2,True


In [27]:
df['driving_day'].sum()

86

In [37]:
df.loc[df.index[-1], 'day'] - df.loc[0, 'day']

Timedelta('105 days 00:00:00')

# Money

In [51]:
money = pd.read_excel('money.xlsx')
money = money.iloc[:, :4]
money.head()

Unnamed: 0,date,item,price,category
0,2019-02-17,gas,37.14,car
1,2019-02-17,sonic,4.9,food
2,2019-02-18,coffee,2.5,food
3,2019-02-18,lunch,12.0,food
4,2019-02-18,dinner tacos,8.15,food


In [53]:
money['price'].sum()

4311.379999999999

In [54]:
money.groupby('category').sum()

Unnamed: 0_level_0,price
category,Unnamed: 1_level_1
car,1721.53
dumb,69.01
food,1235.75
fun,382.89
gear,161.42
lodging,569.02
misc,59.7
postcards,83.06
souvenirs,29.0


## Gas money

In [40]:
gas = money[money['item'].str.contains('gas')]
gas

Unnamed: 0,date,item,price,category
0,2019-02-17,gas,37.14,car
5,2019-02-18,gas,32.89,car
7,2019-02-19,gas,23.41,car
9,2019-02-20,gas,8.66,car
10,2019-02-20,gas,24.43,car
14,2019-02-23,gas,35.02,car
16,2019-02-23,gas,26.85,car
25,2019-02-26,gas,21.51,car
28,2019-02-27,gas,19.21,car
43,2019-03-05,gas,34.46,car


In [41]:
gas['price'].sum()

1393.4800000000002

In [49]:
gas.shape

(46, 4)

In [50]:
gas['price'].describe()

count    46.000000
mean     30.293043
std       8.397190
min       8.660000
25%      25.340000
50%      30.000000
75%      35.335000
max      51.390000
Name: price, dtype: float64

## Groceries

In [43]:
groceries = money[money['item'].str.contains('groceries')]
groceries

Unnamed: 0,date,item,price,category
6,2019-02-18,groceries,35.21,food
12,2019-02-20,groceries,15.0,food
17,2019-02-25,heb groceries,58.0,food
63,2019-04-10,groceries,35.0,food
79,2019-04-17,groceries,93.4,food
120,2019-04-26,groceries,93.53,food
127,2019-04-29,groceries,17.72,food
137,2019-05-04,groceries,28.64,food
143,2019-05-12,groceries,6.74,food
149,2019-05-15,groceries,55.71,food


In [44]:
groceries['price'].sum()

489.76

In [48]:
groceries['price'].describe()

count    12.000000
mean     40.813333
std      28.827369
min       6.740000
25%      20.547500
50%      32.160000
75%      56.282500
max      93.530000
Name: price, dtype: float64

In [58]:
print(money[money['item'].str.contains('oil change')].sum())
money[money['item'].str.contains('oil change')]

price    186.79
dtype: float64


Unnamed: 0,date,item,price,category
35,2019-02-26,oil change,64.3,car
113,2019-04-20,oil change,50.01,car
160,2019-06-01,oil change,72.48,car


In [57]:
money.query('category == "car"').sum()

price    1721.53
dtype: float64

In [59]:
money.query('category == "car"')

Unnamed: 0,date,item,price,category
0,2019-02-17,gas,37.14,car
5,2019-02-18,gas,32.89,car
7,2019-02-19,gas,23.41,car
9,2019-02-20,gas,8.66,car
10,2019-02-20,gas,24.43,car
14,2019-02-23,gas,35.02,car
16,2019-02-23,gas,26.85,car
25,2019-02-26,gas,21.51,car
28,2019-02-27,gas,19.21,car
35,2019-02-26,oil change,64.30,car


Note: I think this spreadsheet is missing the lockout fee, which geico only reimbursed partially. I think I ended up paying 65, but need to check!