# Bike share demand

## Predicting demand
<b>Time series data</b> is collected over an extended time period and is affected by changes over time. Ex: Weather models use recent temperatures (time series data) to predict the daily high and low temperature. Making predictions over time comes with certain challenges.

<b>Seasonal effects</b> describe patterns that repeat over time. Ex: Consumer demand for toys increases near Christmas. Seasonal patterns usually are not linear, so simple linear regression models do not perform well with time series data.

Bike share companies allow customers to rent a bike for a short time period and return the bike to another location. Bike shares are often used by college students, commuters, and tourists. Predicting daily demand helps a bike share company ensure that bikes are available to all customers.

## Predicting total demand based on weather conditions
The bike share dataset contains two types of customers: casual and registered. Registered customers pay a monthly subscription fee to use the bike share services, while casual customers are one-time users. The total number of daily customers is the number of registered customers who use a bike that day plus the number of casual customers. A model that successfully predicts total demand helps the bike share company determine how many bikes must be available for customers on a given day.

A multilayer perceptron for predicting total demand was fitted using five input features: temperature, humidity, windspeed, working day, and season. The multilayer perceptron used 70% of the original data for training and the remaining 30% for validation.

In [1]:
# Import packages and data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.neural_network import MLPRegressor
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

In [2]:
# Loads the bike_share_day.csv dataset and drop null values
bikes = pd.read_csv('bike_share_day.csv')
bikes.sample(10)

Unnamed: 0,season,yr,month,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,total
419,1,2,2,0,5,1,2,52,52,0.7375,16,246,3241,3487
467,2,2,4,0,4,1,1,51,49,0.46625,19,663,4746,5409
260,3,1,9,0,0,0,1,61,61,0.695,12,1353,2921,4274
80,2,1,3,0,2,1,1,55,56,0.624583,15,460,2243,2703
240,3,1,8,0,1,1,1,71,75,0.554583,11,729,3905,4634
643,4,2,10,0,5,1,1,70,72,0.6275,7,1516,6640,8156
175,3,1,6,0,6,0,1,76,80,0.483333,14,1782,3420,5202
330,4,1,11,0,0,0,1,56,57,0.698333,14,810,2261,3071
547,3,2,7,0,0,0,1,87,92,0.51875,11,1421,4110,5531
109,2,1,4,0,3,1,1,68,70,0.614167,16,613,3331,3944


In [3]:
seed=123

In [4]:
# Convert text features to strings
bikes['season'] = bikes['season'].astype(str)
bikes['month'] = bikes['month'].astype(str)
bikes['weekday'] = bikes['weekday'].astype(str)
bikes['weathersit'] = bikes['weathersit'].astype(str)

In [5]:
# Define input and output features
# Input features should be temp, hum, windspeed, workingday, season
# output feature should be total
X = bikes[['temp', 'hum', 'windspeed', 'workingday', 'season']]
y = bikes['total']

In [6]:
# Splits the data into training and test sets, with test size 30%
XTrain, XTest, yTrain, yTest = train_test_split(X, y, test_size=0.3, random_state=seed)

In [7]:
# Define and fit model with training data, with the max_iter=5000, tol=0.0001
mlpModelTrain = MLPRegressor(max_iter=5000, tol=0.0001)
mlpModelTrain = mlpModelTrain.fit(XTrain, yTrain)

  if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):


In [16]:
# get the Loss score for the training model
mlpModelTrainLoss = mlpModelTrain.loss_
print(mlpModelTrainLoss)

952304.6455516237


In [9]:
# Define and fit model with testing data, with the max_iter=5000, tol=0.0001
mlpModelTest = MLPRegressor(max_iter=5000, tol=0.0001)
mlpModelTest = mlpModelTest.fit(XTest, yTest)

  if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):


In [10]:
# get the Loss score for the testing model
mlpModelTestLoss = mlpModelTest.loss_
print(mlpModelTestLoss)

## Predicting high demand from casual customers
Demand from registered customers is easier for a company to plan for: Registered customers pay a subscription fee, and the company collects revenue even if the customer does not use the service. But, casual customers are more difficult to predict. If a casual customer wants to ride a bike and none are available, the company does not collect any revenue. Understanding and predicting which days will have higher demand from casual customers helps the bike share company make sure bikes are available when needed.

Days with more than 1,500 casual customers are classified as high demand. Three multilayer perceptron classification models are fit to predict high-demand days using different activation functions: linear, sigmoid, and ReLU.

In [11]:
# Set days with more than 1500 casual users as "high demand"
demand = bikes['casual'] >= 1500
bikes.insert(14, 'high_demand', demand.astype(int))
bikes.sample(10)

Unnamed: 0,season,yr,month,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,total,high_demand
45,1,1,2,0,2,1,1,40,33,0.314348,20,140,1675,1815,0
547,3,2,7,0,0,0,1,87,92,0.51875,11,1421,4110,5531,0
133,2,1,5,0,6,0,2,62,63,0.9225,9,902,2507,3409,0
296,4,1,10,0,1,1,1,57,57,0.772083,8,699,3488,4187,0
563,3,2,7,0,2,1,1,87,93,0.505833,8,921,5865,6786,0
598,3,2,8,0,2,1,1,73,75,0.67375,5,1081,5925,7006,0
194,3,1,7,0,4,1,1,75,79,0.47625,16,888,4196,5084,0
554,3,2,7,0,0,0,1,87,97,0.57375,8,1203,3469,4672,0
201,3,1,7,0,4,1,2,87,101,0.69125,15,632,3152,3784,0
252,3,1,9,0,6,0,1,73,75,0.75375,10,1750,3595,5345,1


In [12]:
# Define input and output features
# Input features should be temp, holiday, hum, windspeed, workingday
# output feature should be high_demand
X = bikes[['temp', 'holiday', 'hum', 'windspeed', 'workingday']]
y = bikes['high_demand']

In [13]:
# Splits the data into training and test sets, with test size 30%
XTrain, XTest, yTrain, yTest = train_test_split(X, y, test_size=0.3, random_state=seed)

In [14]:
# Define and Fit the linear activation model to the training data, with max_iter=1000
clfLinear = MLPClassifier(activation='identity', max_iter=1000)
clfLinear = clfLinear.fit(XTrain, yTrain)

  if not hasattr(array, "sparse") and array.dtypes.apply(is_sparse).any():
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
  if is_sparse(pd_dtype):
  if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):


In [15]:
# # Prints the final weights, biases, and losses
print(clfLinear.coefs_)
print(clfLinear.intercepts_)
print(clfLinear.loss_)

[array([[-1.65464897e-01, -1.90221417e-01, -1.94385392e-01,
        -7.43859888e-02, -6.83157678e-02, -2.04276722e-01,
         8.21955329e-02,  1.21185145e-01, -1.43410775e-01,
         4.68454468e-02,  3.16019859e-02, -1.29065875e-01,
        -9.15546586e-02, -2.06692059e-02,  1.67553052e-01,
         6.30953768e-02,  2.21775149e-01, -1.82244103e-01,
         1.22730681e-01,  1.03024237e-01, -1.84951380e-02,
         5.31640709e-02, -1.89864329e-01, -1.98741399e-01,
         4.74899423e-02, -7.87639514e-02,  5.24537449e-02,
         2.06795254e-03,  1.38995634e-01, -1.54295234e-01,
        -1.97254121e-01,  2.08305565e-01, -1.11203266e-02,
         1.60826367e-01, -1.69170550e-01, -9.17245080e-02,
         7.24687401e-02, -1.99784559e-01, -6.22708698e-03,
        -1.52146567e-02, -1.88388065e-01, -1.45915443e-01,
        -1.65100467e-01, -1.35325528e-01,  9.59467110e-02,
        -6.69473735e-03,  1.83750374e-01, -1.58583862e-01,
        -9.99631130e-02,  1.95263455e-01,  1.36959881e-