##### Build a regression model.

In [2]:
import pandas as pd
import statsmodels.api as sm
from patsy.builtins import *

In [4]:
# Importing dataframe

df_model = pd.read_csv('df_merged.csv')
df_model.head()

#Filtering for dependent and independent variables

df_model = df_model[['free_bikes', 'number_places', 'place_popularity_avg', 'place_rating_avr']] 
df_model = df_model.dropna()

#Independent variables

x = df_model[['number_places', 'place_popularity_avg', 'place_rating_avr']] 

#Dependent variables

y = df_model['free_bikes'] 

# adding a constant

x = sm.add_constant(x) 
lin_reg = sm.OLS(y,x)


model = lin_reg.fit()
print_model = model.summary()
print(print_model)

                            OLS Regression Results                            
Dep. Variable:             free_bikes   R-squared:                       0.072
Model:                            OLS   Adj. R-squared:                  0.067
Method:                 Least Squares   F-statistic:                     14.29
Date:                Mon, 04 Sep 2023   Prob (F-statistic):           5.60e-09
Time:                        05:39:49   Log-Likelihood:                -970.82
No. Observations:                 558   AIC:                             1950.
Df Residuals:                     554   BIC:                             1967.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                    5.9433 

##### Provide model output and an interpretation of the results. 

R-squared (R^2): The R-squared value is 0.072, which means that approximately 7.2% of the variance in the dependent variable (free_bikes) is explained by the independent variables (number_places, place_popularity_avg, and place_rating_avr). This suggests that the model has relatively low explanatory power, and a significant portion of the variation in free_bikes remains unexplained.

F-statistic: The F-statistic is 14.29 with a very low p-value (5.60e-09), indicating that at least one of the independent variables in the model is collectively contributing to the explanation of the variance in free_bikes. 

Correlation Coefficients:

number_places: The coefficient is -0.0202. We can ignore the influence of this variable as the value tends to zero.

place_popularity_avg: The coefficient is -12.7161. It is not a valid value within the range of -1 to 1. There is a mistake or misinterpretation of the results.

place_rating_avr: The coefficient is 0.8542. This suggests that holding other variables constant, a one-unit increase in place_rating_avr leads to an increase of approximately 0.8542 units in free bikes. Higher average place ratings have a positive impact on the number of free bikes. It would be logical to assume the reverse situation: the high rating of places leads to high attendance and therefore fewer free bikes

Statistical Significance (P-values): All three independent variables have p-values less than 0.05, indicating that they are statistically significant predictors of the dependent variable, free_bikes.

In summary, the regression results suggest that the model's explanatory power is relatively low, and there may be other unaccounted factors affecting the number of free bikes. Further investigation and potentially including additional variables may be necessary to improve the model's predictive accuracy. 

# Stretch

##### How can you turn the regression model into a classification model?

To turn the provided regression model into a classification model, we need to define specific categories or classes for the dependent variable (free_bikes) and set up a threshold or rule for classifying observations into these categories. Steps of the process could be:

* Define classification categories or classes we want to predict based on the values of free_bikes. 
* Set thresholds that divide the continuous free_bikes values into the defined classes. 
* Assign Class Labels to the defined categories. 
* Modify the regression model's output to provide class labels (0, 1, or 2) instead of the continuous predicted values. 