In [1]:
import pandas as pd
import Builder

from sklearn.dummy import DummyRegressor

### Read in Data

This is a fairly common dataset, here sourced from [data.world](https://data.world/data-society/capital-bikeshare-2011-2012), but also available from the UCI database. It details hundreds of thousands of Capital Bikeshare rides in 2011 and 2012 sampled every hour, including features for temporal factors, weather conditions, and registered versus casual users. 

In [2]:
df = pd.read_csv('Data/bike_data.csv')

## Data Cleaning

Our date column was an object, so transform to a datetime with pandas and create a new column for Month.

In [3]:
df.Date = pd.to_datetime(arg=df.Date) #dtype('<M8[ns]')
df['Month'] = df['Date'].dt.month

### Remove Correlated Features

We'll be predicting Total Users, so let's drop the other user columns because they'll be heavily correlated with Total Users. And we'll remove Temperatuer F, in favor of the "feels like" temperature. 

In [4]:
df.drop(['Casual Users', 'Registered Users', 'Temperature F'], axis=1, inplace=True)

## Modeling

In [5]:
X = df.drop('Total Users', axis=1)
y = df['Total Users']

### Dummy Regressor

In [6]:
dummy_regressor = DummyRegressor(strategy='mean')
dummy_regressor.fit(X, y)
dummy_regressor.predict(X)
print("Dummy Model Accuracy: ", dummy_regressor.score(X, y))

Dummy Model Accuracy:  0.0
