## Here I try to analyse the behaviour of bike traffic according to the day of the week, especially on friday.

In [1]:
from download import download
import pandas as pd
import datetime
from datetime import date
import os

I use functions defined in `preprocess.py`.

In [2]:
import load_data

from preprocess import totem_first_cleaning
from preprocess import from_select_date
from preprocess import drop_hour_gap
from preprocess import resamp_interp
from preprocess import only_at
from preprocess import pick_week_days

Downloading data from https://doc-0k-7k-sheets.googleusercontent.com/pub/l5l039s6ni5uumqbsj9o11lmdc/g4legju6hplrat89hrg22lpkes/1617289555000/108937725768799295374/*/e@2PACX-1vQVtdpXMHB4g9h75a0jw8CsrqSuQmP5eMIB2adpKR5hkRggwMwzFy5kB-AIThodhVHNLxlZYm8fuoWj?gid=2105854808&single=true&output=csv (1 byte)

file_sizes: 50.4kB [00:00, 1.57MB/s]                                            
Successfully downloaded file to ./Data/SaisiesTotem.csv


In [3]:
df_totem = load_data.Load_totemdata().save_as_df()

Downloading data from https://doc-0k-7k-sheets.googleusercontent.com/pub/l5l039s6ni5uumqbsj9o11lmdc/g5sil8ioukjknpratgtv0fdnt8/1617289605000/108937725768799295374/*/e@2PACX-1vQVtdpXMHB4g9h75a0jw8CsrqSuQmP5eMIB2adpKR5hkRggwMwzFy5kB-AIThodhVHNLxlZYm8fuoWj?gid=2105854808&single=true&output=csv (1 byte)

file_sizes: 50.4kB [00:00, 1.26MB/s]                                            
Successfully downloaded file to ./Data/SaisiesTotem.csv


In [4]:
df_totem = totem_first_cleaning(df_totem)
# select data from march and from february for test purpose
march = from_select_date(df_totem, 2021, 3, 1)
feb_march = from_select_date(df_totem, 2021, 2, 1)
# apply drop_hour_gap to march and february datasets
march = drop_hour_gap(march, 12)
feb_march = drop_hour_gap(feb_march, 12)
# apply resamp_interp
march_minutes = resamp_interp(march)
feb_march_minutes = resamp_interp(feb_march)
# select only the rows corresponding to the time 09:00
# for march
march_at_nine = only_at(march_minutes, 9)
# for february and march
feb_march_at_nine = only_at(feb_march_minutes, 9)
# set index for pick_week_days
march_at_nine.set_index('Date', inplace=True)
feb_march_at_nine.set_index('Date', inplace=True)
# apply pick_week_days to march_at_nine and to feb_march_at_nine
mar_week_at_nine = pick_week_days(march_at_nine)
febmar_week_at_nine = pick_week_days(feb_march_at_nine)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dftemp["Today's total"][i] = 0.0


Saving `mar_week_at_nine` for weather-related analysis. (See `bad_pred_with_weather.ipynb`)

In [None]:
path = "./Data/"
mar_week_at_nine.to_csv(os.path.join(path, r'marchweekatnine.csv'))

## Plotting part

In [5]:
import plotly.express as px

In [6]:
fig = px.bar(mar_week_at_nine, y="Today's total")
fig.show()

The mean seems to remain stable, or maybe to be slowly increasing with days... Let's investigate on that.

Let's remove those zero values, and set this into a new dataframe called `March`

In [61]:
March = mar_week_at_nine.loc[(mar_week_at_nine!=0).any(axis=1)]

Let's look at the mean value of `March`, then the mean value of `March[0:-1]`, then the mean value of `March[0:-2]`, ... and so on.

In [53]:
marchmean = [March["Today's total"].mean()]
for i in range(1, len(March), 1):
    marchmean.append(March["Today's total"][0:-i].mean())

marchmean

[333.99121966950656,
 329.73620661313646,
 328.2331743779559,
 328.00465594549325,
 327.46535531035164,
 331.02336568877837,
 330.70967271145764,
 333.3509102528737,
 333.286394437899,
 348.8096134428505,
 336.74083580460496,
 322.35991421397256,
 339.9060283308983,
 344.9993386923114,
 338.79475317705015,
 330.393915479861,
 249.3201438848921]

The first row of `marchmean` contains the mean for the whole month of March. The second row contains the mean for the whole month of March **except** for yesterday, the third row of `marchmean` contains the mean for the whole month of March **except** for yesterday **and** the day before yesterday, the fourth row of...well we got it.

So looking at marchmean, we can see that these means seems, indeed, to be slowly increasing these past few days. (We shouldn't look at values before 10 days before now, because of the lack of values to calculate the means.) So these past days, we have $327.5$, $328.0$, $328.2$, $329.7$, $334$. 

Does it seem reasonable to consider that tomorrow's mean shouldn't be too far away from today's one ? ...say..."today's one **plus one**" ?

For this to be true, we simply need to solve a single-unknown equation. Namely, if "today's one plus one" $= m$, and "tomorrow's value" $= x$, and $l =$ `len(March)`$+ 1$, and $S$ is the sum of all the values in `March` then,
$$
\frac{(S + x)}{l} = m\ \Longleftrightarrow\ x = (m \times l) - S
$$ 

We know $m$ and we can easily compute $S$ and $l$.

In [63]:
m = 334 + 1
S = March["Today's total"].sum()
l = len(March) + 1

In [66]:
x = (m*l) - S
x

352.1492656183882

Try again with $m = 334$

In [72]:
x = ((m-1) * l) - S
x

334.1492656183882

And with $333$

In [73]:
x = ((m-2) * l) - S
x

316.1492656183882

In [30]:
March

Unnamed: 0_level_0,Today's total
Date,Unnamed: 1_level_1
2021-03-04 09:00:00,249.320144
2021-03-05 09:00:00,411.467687
2021-03-08 09:00:00,355.596429
2021-03-09 09:00:00,363.613095
2021-03-10 09:00:00,319.532787
2021-03-12 09:00:00,234.629344
2021-03-17 09:00:00,423.026365
2021-03-18 09:00:00,433.291057
2021-03-19 09:00:00,209.100642
2021-03-23 09:00:00,333.931553


## Below are fruitless attempts that I chose to let anyway...

### Select data from each week day separately

In [7]:
# monday = 0, friday = 4
mon_at_nine = mar_week_at_nine.loc[mar_week_at_nine.index.dayofweek == 0]
tue_at_nine = mar_week_at_nine.loc[mar_week_at_nine.index.dayofweek == 1]
wed_at_nine = mar_week_at_nine.loc[mar_week_at_nine.index.dayofweek == 2]
thu_at_nine = mar_week_at_nine.loc[mar_week_at_nine.index.dayofweek == 3]
fri_at_nine = mar_week_at_nine.loc[mar_week_at_nine.index.dayofweek == 4]

In [8]:
mon_at_nine

Unnamed: 0_level_0,Today's total
Date,Unnamed: 1_level_1
2021-03-08 09:00:00,355.596429
2021-03-15 09:00:00,0.0
2021-03-22 09:00:00,0.0
2021-03-29 09:00:00,335.015564


In [9]:
tue_at_nine

Unnamed: 0_level_0,Today's total
Date,Unnamed: 1_level_1
2021-03-09 09:00:00,363.613095
2021-03-16 09:00:00,0.0
2021-03-23 09:00:00,333.931553
2021-03-30 09:00:00,331.432432


In [10]:
wed_at_nine

Unnamed: 0_level_0,Today's total
Date,Unnamed: 1_level_1
2021-03-10 09:00:00,319.532787
2021-03-17 09:00:00,423.026365
2021-03-24 09:00:00,304.297297
2021-03-31 09:00:00,352.28169


In [11]:
thu_at_nine

Unnamed: 0_level_0,Today's total
Date,Unnamed: 1_level_1
2021-03-04 09:00:00,249.320144
2021-03-11 09:00:00,0.0
2021-03-18 09:00:00,433.291057
2021-03-25 09:00:00,334.473988
2021-04-01 09:00:00,402.071429


In [13]:
402 - 334

68

In [14]:
fri_at_nine

Unnamed: 0_level_0,Today's total
Date,Unnamed: 1_level_1
2021-03-05 09:00:00,411.467687
2021-03-12 09:00:00,234.629344
2021-03-19 09:00:00,209.100642
2021-03-26 09:00:00,284.769231


In [15]:
fig_wed = px.bar(wed_at_nine, y="Today's total")
fig_wed.show()

In [16]:
fig_thu = px.bar(thu_at_nine, y="Today's total")
fig_thu.show()

In [17]:
fig_fri = px.bar(fri_at_nine, y="Today's total")
fig_fri.show()