# Build a regression model.

I think that a bike station's location and nearby restaurant ratings really influence how many bikes are free. Basically, I believe stations near well-rated restaurants might have fewer free bikes because more people might be using them in these popular areas.

`free_bikes = b_0 + b_1(latitude_y) + b_2(longitude_y) + b_3(rating)`

In the model, `free_bikes` shows how many bikes are free at a station. `Latitude_y` and `longitude_y` tell us where the station is, and `rating` shows how good the nearby restaurants are. `b_0` is a starting point, while `b_1`, `b_2`, and `b_3` show how much each of these factors affects the number of free bikes.

In [75]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import seaborn as sns

In [76]:
df = pd.read_csv('../data/combined_data.csv')

In [77]:
df = df.rename(columns={'name_x': 'POI', 'latitude_x': 'latitude_POI', 'longitude_x': 'longitude_POI', 'name_y': 'bike_station', 'latitude_y': 'latitude_station', 'longitude_y':'longitude_station'})

In [78]:
df

Unnamed: 0,POI,address,category,latitude_POI,longitude_POI,rating,rounded_latitude,rounded_longitude,bike_station,latitude_station,longitude_station,free_bikes
0,Bagatelle,"4323 Rue Ontario E, Montreal, QC H1V 1K5, Canada","French, Breakfast & Brunch",45.552822,-73.539624,4.0,45.553,-73.54,Marché Maisonneuve,45.553219,-73.539782,11
1,Les Gourmandises de Marie-Antoinette,"4317 Rue Ontario E, Montreal, QC H1V 1K5, Canada","Bakeries, Tea Rooms",45.55291,-73.53978,4.5,45.553,-73.54,Marché Maisonneuve,45.553219,-73.539782,11
2,Pita Bar,"4315 Rue Ontario E, Montreal, QC H1V 1K5, Canada",Greek,45.552803,-73.539842,3.5,45.553,-73.54,Marché Maisonneuve,45.553219,-73.539782,11
3,India Rosa,"1241 Avenue du Mont-Royal E, Montreal, QC H2J ...","Indian, Cocktail Bars, Tapas Bars",45.529291,-73.578237,4.0,45.529,-73.578,du Mont-Royal / de Brébeuf,45.529337,-73.577953,6
4,Le Rouge Gorge,"1234 Mont-royal E, Montreal, QC H2J 1Y2, Canada","Wine Bars, Tapas/Small Plates",45.529112,-73.577978,4.5,45.529,-73.578,du Mont-Royal / de Brébeuf,45.529337,-73.577953,6
5,Pizzédélic,"1250 avenue du Mont-Royal Est, Montreal, QC H2...","Pizza, Food Delivery Services",45.529399,-73.577818,3.5,45.529,-73.578,du Mont-Royal / de Brébeuf,45.529337,-73.577953,6
6,Au Pain Perdu,"4489 Rue de la Roche, Montreal, QC H2J 3J1, Ca...",Breakfast & Brunch,45.528691,-73.578071,3.5,45.529,-73.578,du Mont-Royal / de Brébeuf,45.529337,-73.577953,6
7,Boîte Geisha Fusion Sushi,"1209 Mount Royal Avenue E, Montreal, QC H2J 1Y...","Japanese, Sushi Bars",45.528972,-73.578486,4.5,45.529,-73.578,du Mont-Royal / de Brébeuf,45.529337,-73.577953,6
8,Ibéricos,"4475 Rue Saint-Denis, Montreal, QC H2J 2L2, Ca...","Spanish, Tapas Bars",45.523902,-73.58243,4.5,45.524,-73.582,Métro Mont-Royal (Utilités publiques / Rivard),45.524236,-73.581552,2
9,Café Gentile,"9299 Avenue du Parc, Montreal, QC H2N 2A2, Canada","Sandwiches, Breakfast & Brunch",45.538197,-73.65451,4.0,45.538,-73.655,Chabanel / du Parc,45.538308,-73.654884,12


In [81]:
# Splitting data into features (X) and target (y)

X = df[['rating', 'latitude_POI', 'longitude_POI']]
y = df['free_bikes']

In [82]:
linear_model = LinearRegression()
linear_model.fit(X, y)

# Predictions
y_pred_linear = linear_model.predict(X)

# Evaluating model
mse = mean_squared_error(y, y_pred_linear)
print("Mean Squared Error:", mse)

Mean Squared Error: 25.755866305529384


# Provide model output and an interpretation of the results. 

In [83]:
import statsmodels.api as sm

X = sm.add_constant(X)  # Adding a constant column for the intercept
model = sm.OLS(y, X).fit()
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:             free_bikes   R-squared:                       0.023
Model:                            OLS   Adj. R-squared:                 -0.048
Method:                 Least Squares   F-statistic:                    0.3276
Date:                Fri, 08 Dec 2023   Prob (F-statistic):              0.805
Time:                        12:00:49   Log-Likelihood:                -136.95
No. Observations:                  45   AIC:                             281.9
Df Residuals:                      41   BIC:                             289.1
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
const           893.5832   3030.422      0.295

**Notes**:
- From this model, we can see a very low `R-Squared value`. Meaning this is not a good model for predicting the number of `free_bikes` at a station based on the station's location and the `rating` of nearby restaurants.
- Both the variable have a `P-value` much greater than 0.05. This means that including both these variables make the model weak and does not the rating of the restaurant.