# Bike Trip Miles Per Day

First, we will randomly generate a series to represent the trip. The Dirichlet distribution can be used to randomly choose a set of numbers with a with a given size whose sum is one.

We'll scale this up by 1500, a rough approximation of the total mileage of the trip, and also round each day to the nearest mile to make the numbers more readable, although the solution that follows would work equally well for decimal numbers.

Lastly, we'll call the `cumsum` method to convert the series into the total cumulative mileage per day.

In [26]:
import pandas as pd
import numpy as np
s = pd.Series((np.random.dirichlet(np.ones(20) * 100, size=1) * 1500).round().cumsum(), dtype='int32')
print(s)

0       86
1      165
2      233
3      308
4      383
5      448
6      528
7      604
8      678
9      766
10     845
11     915
12     983
13    1046
14    1117
15    1196
16    1275
17    1354
18    1425
19    1499
dtype: int32


In order to convert a culumative series into the per-day mileage, we'll need to compute the difference between each value and the next one. For example, if `s[2]` is `202` and `s[3]` is `285`, then we know that their diferrence, `83`, is how far we rode on the fourth day (keeping in mind that the index for `s` is zero-based). Luckily pandas makes these easy by providing the `diff` method for series.

In [28]:
d = s.diff()
print(d)

0      NaN
1     79.0
2     68.0
3     75.0
4     75.0
5     65.0
6     80.0
7     76.0
8     74.0
9     88.0
10    79.0
11    70.0
12    68.0
13    63.0
14    71.0
15    79.0
16    79.0
17    79.0
18    71.0
19    74.0
dtype: float64


As you can see, this one operations takes care of almost of the all work. Unfortunately, it doesn't compute the correct mileage for the first day, as there is no previous day to compare it to.

There a couple of ways this could be solved. We could have added a `0` at the beginning of our input series and then the `diff` result would hold the correct value at index `1` for the first day. However, there would still be a `NaN` at the beginning of the series.

So instead we'll simply replace the `NaN` in the above result with the first value from our input series, as the per-day mileage and culmulative value is guaranteed to be the same for the first day.

Now that there are no `NaN`s in the series, we can also convert it to integers.

In [29]:
d[0] = s[0]
d = d.astype('int32')
print(d)

0     86
1     79
2     68
3     75
4     75
5     65
6     80
7     76
8     74
9     88
10    79
11    70
12    68
13    63
14    71
15    79
16    79
17    79
18    71
19    74
dtype: int32


That simple line fixes the one problem we had before and we are finished. 