Creators: Aman Kumar Garg, Victor Cuspinera-Contreras, Yingping Qian
Here we attempt to build a regression machine learning model using the Random Forest Regressor algorithm which predicts the count of bike rentals based on the time and weather-related information. The main predictive research question is "Given the information shared by Bike share company, can we predict the count of bike rentals on hourly basis in order to forecast the future demand of bike rentals?"
Our final model performed fairly well on an unseen test data set, with the mean square error of 70.39 and a visually linear relationship between actual and predicted values. However, the variance of predicted values becomes larger as the actual count of bike rentals increases, which indicates there are incorrectness in the model when the prediction is large. Thus we recommend continuing study to improve our machine learning model.
The dataset we are using to build a machine learning model is the bike-sharing dataset from UCI Machine Learning Repository. It contains both the hourly and daily data about the numbers of bike rentals in Washington, DC between 2011 and 2012. We would use the hourly dataset, which is more complete and have a greater number of observations than the daily dataset. The dataset has 1 target and 16 features, including both time and weather-related information for each hour on a specific day.
The dataset was created by Dr. Hadi (Fanaee-T 2013) from the Laboratory of Artificial Intelligence and Decision Support (LIAAD), at the (UCI Machine Learning Repository 2017) and can be found here.
The final report can be found here
There are two suggested ways to run this analysis:
1. Using docker
note - the instructions in this section also depends on running this in a unix shell (e.g., terminal or Git Bash)
To replicate the analysis, please follow the instruction below:
- Install Docker
- Clone this GitHub repository
- Download the docker image
docker pull doraqmon/group_409_bike_sharing:v1.0
- Run the following command at the command line/terminal from the root directory of this project:
docker run --rm -v /$(pwd):/home/analysis doraqmon/group_409_bike_sharing:v1.0 make -C '/home/analysis' all
- To reset the repo to a clean state, with no intermediate or results files, run the following command at the command line/terminal from the root directory of this project:
docker run --rm -v /$(pwd):/home/analysis doraqmon/group_409_bike_sharing:v1.0 make -C '/home/analysis' clean
2. Without using Docker
To replicate the analysis, clone this GitHub repository, install the dependencies listed below, and run the following command at the command line/terminal from the root directory of this project. Please note this process will take few minutes:
make all
To reset the repo to a clean state, with no intermediate or results files, run the following command at the command line/terminal from the root directory of this project:
make clean
-
Python 3.7.3 and Python packages:
- pandas==0.24.2
- numpy==1.16.4
- sklearn==0.22
- altair==3.2.0
- docopt==0.6.2
- selenium==3.141.0
- seaborn==0.9.0
-
R version 3.6.1 and R packages:
- knitr==1.26
- tidyverse==1.2.1
- caret==6.0-84
- kableExtra==1.1.0
The Bike Sharing Dataset materials here are licensed as CC0: Public Domain
. If re-using/re-mixing please provide attribution and link to this webpage.