Bike Demand Prediction for BoomBikes
This project focuses on predicting the demand for shared bikes for BoomBikes, a US bike-sharing provider, particularly in the post-Covid-19 scenario. The goal is to identify significant factors affecting bike demand in the American market and to quantify their impact.
In the wake of Covid-19, BoomBikes aspires to understand and prepare for the expected demand for shared bikes. The company aims to identify key variables influencing this demand to outperform competitors and maximize profits.
Import and Inspection: Loaded the dataset, examined its structure and variables. Dropping Irrelevant Variables: Removed 'dteday', 'casual', 'registered', and 'instant' due to redundancy and irrelevance. Mapping Values: Transformed categorical variables like 'season', 'weathersit', 'weekday', 'mnth', and 'yr' into readable formats. Dummy Variable Creation: Generated dummies for categorical variables for linear modeling.
Observed distribution of variables like 'cnt', 'humidity', and 'temp'. Noted the maximum frequency of bike counts and the range of weather conditions.
Identified correlations between temperature, humidity, windspeed, and bike counts. Explored relationships between bike counts and variables like holidays, working days, seasons, and weather conditions.
Analyzed the correlation matrix to understand inter-variable relationships.
Data Splitting & Scaling: Divided data into training (70%) and testing (30%) sets, and applied Min-Max scaling. Initial Model: Achieved an R-squared of 0.853, but noticed several variables with high p-values. Feature Reduction with RFE: Reduced variables to 17 using Recursive Feature Elimination (RFE), slightly adjusting the R-squared. VIF & P-value Adjustments: Dropped variables like 'humidity', 'holiday', 'temp', and 'winter' based on VIF and p-values. Final Model: Established a final model with an R-squared of 0.798.
Checked for normal distribution of residuals to validate linear regression assumptions. Predictions and Model Evaluation Made predictions on the test set and evaluated using a scatter plot. Final test R-squared: 0.7832504527952754.
Identified significant variables influencing bike demand. Provided insights for BoomBikes to strategize for the post-Covid-19 market.
cnt = 0.0570workingday - 0.1926windspeed - 0.2381spring - 0.0403summer + 0.24572019 - 0.1186Dec - 0.1231Jan - 0.1127Nov + 0.0558Sep + 0.0665Mon - 0.3207Light Snow_Weather - 0.0901Misty_Weather + 0.5360
This analysis was performed using the following libraries:
NumPy (numpy) Pandas (pandas) Matplotlib (matplotlib.pyplot) Seaborn (seaborn) Scikit-learn (sklearn.model_selection, sklearn.preprocessing, sklearn.feature_selection, sklearn.linear_model) Statsmodels (statsmodels.api)