### Linear Regression Model for predicting Bike Availability: 


- Here we will be implementing a linear regression model in order to predict the number of bikes available and the number of bike stands available at a give bike stand. 
- Linear regression is a statistical method for modeling relationships between a dependent variable with a given set of independent variables.
- In our model the dependent variable will be number of bikes/bike stands and the independent variables will be time of day, day of the week, area, and weather.  

In [3]:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sqlalchemy import create_engine

import pickle

from sklearn.linear_model import LinearRegression


### Connect to database:

In [4]:
URL = "dublin-bikesdb.cmd8vuwgew1e.us-east-1.rds.amazonaws.com"
PORT = "3306"
DB = "dbikes"
USER = "admin"
PASSWORD = "Dbikes123"


### Weather Data 

In [19]:
def weather():
    engine = create_engine("mysql+mysqldb://{}:{}@{}:{}/{}".format(USER, PASSWORD, URL, PORT, DB), echo=True)
    sql_query_weather= """
    SELECT weather.id, weather.description1, weather.temperature, weather.humidity, weather.windspeed FROM weather;
    """
    df_weather = pd.read_sql_query(sql_query_weather, engine)

    return df_weather


df_weather = weather()

2022-03-18 15:30:35,263 INFO sqlalchemy.engine.Engine SHOW VARIABLES LIKE 'sql_mode'
2022-03-18 15:30:35,264 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-18 15:30:35,375 INFO sqlalchemy.engine.Engine SHOW VARIABLES LIKE 'lower_case_table_names'
2022-03-18 15:30:35,375 INFO sqlalchemy.engine.Engine [generated in 0.00248s] ()
2022-03-18 15:30:35,618 INFO sqlalchemy.engine.Engine SELECT DATABASE()
2022-03-18 15:30:35,619 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-18 15:30:35,953 INFO sqlalchemy.engine.Engine 
    SELECT weather.id, weather.description1, weather.temperature, weather.humidity, weather.windspeed FROM weather;
    
2022-03-18 15:30:35,969 INFO sqlalchemy.engine.Engine [raw sql] ()


In [20]:
df_weather

Unnamed: 0,id,description1,temperature,humidity,windspeed
0,803,broken clouds,281.82,77,6.69
1,803,broken clouds,281.88,77,6.69
2,803,broken clouds,281.82,77,6.69
3,803,broken clouds,281.72,78,6.69
4,803,broken clouds,281.72,78,6.69
...,...,...,...,...,...
1728,801,few clouds,286.35,72,7.20
1729,801,few clouds,286.29,72,7.20
1730,801,few clouds,286.30,71,6.71
1731,801,few clouds,286.30,71,6.71


In [21]:
df_weather.dtypes


id                int64
description1     object
temperature     float64
humidity          int64
windspeed       float64
dtype: object

### Availablity Data

In [7]:
def availability():
    engine = create_engine("mysql+mysqldb://{}:{}@{}:{}/{}".format(USER, PASSWORD, URL, PORT, DB), echo=True)
    df_avail = pd.read_sql_table("availability", engine)
    return df_avail

In [9]:
df_avail = availability()

2022-03-18 14:27:23,170 INFO sqlalchemy.engine.Engine SHOW VARIABLES LIKE 'sql_mode'
2022-03-18 14:27:23,194 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-18 14:27:23,325 INFO sqlalchemy.engine.Engine SHOW VARIABLES LIKE 'lower_case_table_names'
2022-03-18 14:27:23,327 INFO sqlalchemy.engine.Engine [generated in 0.00197s] ()
2022-03-18 14:27:23,568 INFO sqlalchemy.engine.Engine SELECT DATABASE()
2022-03-18 14:27:23,568 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-18 14:27:24,042 INFO sqlalchemy.engine.Engine SHOW FULL TABLES FROM `dbikes`
2022-03-18 14:27:24,042 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-18 14:27:24,166 INFO sqlalchemy.engine.Engine SHOW FULL TABLES FROM `dbikes`
2022-03-18 14:27:24,166 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-18 14:27:24,314 INFO sqlalchemy.engine.Engine SHOW CREATE TABLE `availability`
2022-03-18 14:27:24,330 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-18 14:27:24,574 INFO sqlalchemy.engine.Engine SELECT availabili

In [10]:
df_avail

Unnamed: 0,number,available_bike_stands,available_bikes,last_update
0,42,16,14,2022-02-23 19:50:20
1,30,0,20,2022-02-23 19:41:25
2,54,11,22,2022-02-23 19:48:38
3,108,16,19,2022-02-23 19:51:13
4,56,2,38,2022-02-23 19:45:20
...,...,...,...,...
709092,39,2,18,2022-03-18 14:24:00
709093,83,19,21,2022-03-18 14:19:09
709094,92,39,1,2022-03-18 14:23:07
709095,21,20,10,2022-03-18 14:22:40


In [11]:
df_avail.dtypes


number                            int64
available_bike_stands             int64
available_bikes                   int64
last_update              datetime64[ns]
dtype: object

In [12]:
df_avail.shape

(709097, 4)

In [14]:
df_avail.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
number,709097.0,60.327091,33.710733,2.0,31.0,60.0,90.0,117.0
available_bike_stands,709097.0,12.68999,8.877794,0.0,5.0,12.0,19.0,40.0
available_bikes,709097.0,19.066116,10.421366,0.0,11.0,19.0,27.0,40.0
