### Data Sourcce : https://www.kaggle.com/datasets/city-of-seattle/seattle-fremont-bridge-hourly-bicycle-counts


### Visualizing Seattle Bicycle Counts

**Objectives:**\
1.This notebook is an example of working with time series data\
2.The hourly bicycle counts can also be downloaded from http://data.seattle.gov/; We'll be looking at the bicycle counts in Seattle's Fremont Bridge\
This data comes from an automated bicycle counter, installed in late 2012, which has inductive sensors on the east and west sidewalks of the bridge.


1. Downloading the dataset


In [None]:
# uncomment and run the cell
# !curl -o FremontBridge.csv https://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD

In [None]:
# standard imports
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set_theme()

In [None]:
# Path to file
path = '/home/nyangweso/Desktop/Ds_1/Machine-Learning-Projects/Seattle_Bicycle_Counts/FremontBridge.csv'

In [None]:
data = pd.read_csv(path, index_col="Date", parse_dates=True)
# CSV file is read and first 5 columns displayed
data.head()

In [None]:
# renaming columns for convenience
data.columns = ["Total", "East", "West"]

2. Rearranging colummns


In [None]:
cols = list(data.columns)
data = data[cols[1:3] + [cols[0]]]
data.head()

3. Get summary statistics


In [None]:
data.dropna().describe()

#### Visualizing the data

Let's visualize the dataset inorder to get the insight


4. Plotting raw data


In [None]:
data.plot()
plt.ylabel("Hourly bicycle count")

5. Resampling the data\
   <br>
   The data we have is far to dense for us to get any sense out of it\
   We can get more insight by resampling the data to a coarser grid


In [None]:
weekly = data.resample("W").sum()
weekly.plot(style=[":", "--", "-"])
plt.ylabel("Weekly bicycle count")

We can use another handy method i.e rolling() to perform aggregation


In [None]:
daily = data.resample("D").sum()
daily.rolling(30, center=True).sum().plot(style=[":", "--", "-"])
plt.ylabel("mean hourly count")

We can get a smoother version of the results using a window function e.g a Gaussian window\
we can specify the width of the window(I chose 50 for this case) and the width of the gaussian within the window(I chose 10)


In [None]:
daily.rolling(50, center=True, win_type="gaussian").sum(std=10).plot(
    style=[":", "--", "-"]
)

#### Digging into the data

6. Average traffic as a function of the time of day


In [None]:
by_time = data.groupby(data.index.time).mean()
hourly_ticks = 4 * 60 * 60 * np.arange(6)
by_time.plot(xticks=hourly_ticks, style=[":", "--", "-"])
#   This shows the mean traffic for the whole day from 12 a.m to 11 p.m

We can see the peak hours are around 0800 and 1700\
Let's look how weekly data looks like


In [None]:
by_weekday = data.groupby(data.index.dayofweek).mean()
by_weekday.index = ["Mon", "Tues", "Wed", "Thurs", "Fri", "Sat", "Sun"]
by_weekday.plot(style=[":", "--", "-"])

This shows a strong distinction between weekday and weekend totals, with around\
twice as many average riders crossing the bridge on Monday through Friday than on\
Saturday and Sunday.


With this in mind, let's get our hands dirty.\
First let us look at the hourly trend on weekdays vs weekends.


In [None]:
weekend = np.where(data.index.weekday < 5, "Weekday", "Weekend")
by_time = data.groupby([weekend, data.index.time]).mean()

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(14, 5))
by_time.loc["Weekday"].plot(
    ax=ax[0], title="Weekdays", xticks=hourly_ticks, style=[":", "--", "-"]
)
by_time.loc["Weekend"].plot(
    ax=ax[1], title="Weekends", xticks=hourly_ticks, style=[":", "--", "-"]
)

### Conclusion

The result is very interesting:

- We see a bimodal commute pattern during the work week,
- and a unimodal recreational pattern during the weekends.


### References

- https://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD
- https://www.kaggle.com/datasets/city-of-seattle/seattle-fremont-bridge-hourly-bicycle-counts
