### Linear Regression Model for predicting Bike Availability: 


- Here we will be implementing a linear regression model in order to predict the number of bikes available and the number of bike stands available at a give bike stand. 
- Linear regression is a statistical method for modeling relationships between a dependent variable with a given set of independent variables.
- In our model the dependent variable will be number of bikes/bike stands and the independent variables will be time of day, day of the week, area, and weather.  

In [3]:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sqlalchemy import create_engine

import pickle

from sklearn.linear_model import LinearRegression


### Connect to database:

In [4]:
URL = "dublin-bikesdb.cmd8vuwgew1e.us-east-1.rds.amazonaws.com"
PORT = "3306"
DB = "dbikes"
USER = "admin"
PASSWORD = "Dbikes123"


### Weather Data 

In [5]:
def weather():
    engine = create_engine("mysql+mysqldb://{}:{}@{}:{}/{}".format(USER, PASSWORD, URL, PORT, DB), echo=True)
    sql_query_weather= """
    SELECT weather.id, weather.description1, weather.temperature, weather.humidity, weather.windspeed FROM weather;
    """
    df_weather = pd.read_sql_query(sql_query_weather, engine)

    return df_weather


df_weather = weather()

2022-03-18 19:59:28,365 INFO sqlalchemy.engine.Engine SHOW VARIABLES LIKE 'sql_mode'
2022-03-18 19:59:28,365 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-18 19:59:28,513 INFO sqlalchemy.engine.Engine SHOW VARIABLES LIKE 'lower_case_table_names'
2022-03-18 19:59:28,514 INFO sqlalchemy.engine.Engine [generated in 0.00134s] ()
2022-03-18 19:59:28,751 INFO sqlalchemy.engine.Engine SELECT DATABASE()
2022-03-18 19:59:28,752 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-18 19:59:29,105 INFO sqlalchemy.engine.Engine 
    SELECT weather.id, weather.description1, weather.temperature, weather.humidity, weather.windspeed FROM weather;
    
2022-03-18 19:59:29,105 INFO sqlalchemy.engine.Engine [raw sql] ()


In [6]:
df_weather

Unnamed: 0,id,description1,temperature,humidity,windspeed
0,803,broken clouds,281.82,77,6.69
1,803,broken clouds,281.88,77,6.69
2,803,broken clouds,281.82,77,6.69
3,803,broken clouds,281.72,78,6.69
4,803,broken clouds,281.72,78,6.69
...,...,...,...,...,...
1782,801,few clouds,283.28,74,5.14
1783,801,few clouds,283.28,74,5.14
1784,801,few clouds,283.28,74,5.14
1785,801,few clouds,282.99,76,5.14


In [7]:
df_weather.dtypes


id                int64
description1     object
temperature     float64
humidity          int64
windspeed       float64
dtype: object

### Availablity Data

In [8]:
def availability():
    engine = create_engine("mysql+mysqldb://{}:{}@{}:{}/{}".format(USER, PASSWORD, URL, PORT, DB), echo=True)
    df_avail = pd.read_sql_table("availability", engine)
    return df_avail

In [9]:
df_avail = availability()

2022-03-18 19:59:46,114 INFO sqlalchemy.engine.Engine SHOW VARIABLES LIKE 'sql_mode'
2022-03-18 19:59:46,114 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-18 19:59:46,236 INFO sqlalchemy.engine.Engine SHOW VARIABLES LIKE 'lower_case_table_names'
2022-03-18 19:59:46,236 INFO sqlalchemy.engine.Engine [generated in 0.00321s] ()
2022-03-18 19:59:46,484 INFO sqlalchemy.engine.Engine SELECT DATABASE()
2022-03-18 19:59:46,484 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-18 19:59:46,953 INFO sqlalchemy.engine.Engine SHOW FULL TABLES FROM `dbikes`
2022-03-18 19:59:46,953 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-18 19:59:47,085 INFO sqlalchemy.engine.Engine SHOW FULL TABLES FROM `dbikes`
2022-03-18 19:59:47,085 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-18 19:59:47,234 INFO sqlalchemy.engine.Engine SHOW CREATE TABLE `availability`
2022-03-18 19:59:47,234 INFO sqlalchemy.engine.Engine [raw sql] ()
2022-03-18 19:59:47,508 INFO sqlalchemy.engine.Engine SELECT availabili

In [10]:
df_avail

Unnamed: 0,number,available_bike_stands,available_bikes,last_update
0,42,16,14,2022-02-23 19:50:20
1,30,0,20,2022-02-23 19:41:25
2,54,11,22,2022-02-23 19:48:38
3,108,16,19,2022-02-23 19:51:13
4,56,2,38,2022-02-23 19:45:20
...,...,...,...,...
716352,39,1,19,2022-03-18 19:53:45
716353,83,18,22,2022-03-18 19:48:06
716354,92,33,7,2022-03-18 19:51:00
716355,21,27,3,2022-03-18 19:53:21


In [11]:
df_avail.dtypes


number                            int64
available_bike_stands             int64
available_bikes                   int64
last_update              datetime64[ns]
dtype: object

In [15]:
df_avail["number"] = df_avail["number"].astype('category')  


In [12]:
df_avail.shape

(716357, 4)

In [16]:
df_avail.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
available_bike_stands,716357.0,12.682707,8.888243,0.0,5.0,12.0,19.0,40.0
available_bikes,716357.0,19.069475,10.437634,0.0,11.0,19.0,27.0,40.0


In [17]:
df_avail["number"].describe().T

count     716357
unique       110
top           61
freq        6514
Name: number, dtype: int64

#### Combining the two data frames 