# Aluguel de bicicletas com Regressão Linear

![](https://vahcompare.com/blog/wp-content/uploads/2018/11/Aluguel-de-bicicletas-ou-Bike-Sharing-em-Paris-na-França-1130x753.jpg)

DATASET LINK: https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset

- instantâneo: índice de registro
- dteday: data
- estação : estação (1:primavera, 2:verão, 3:outono, 4:inverno)
- ano: ano (0: 2011, 1:2012)
- mnth: mês (1 a 12)
- h: hora (0 a 23)
- feriado: o dia do tempo é feriado ou não (extraído de http://dchr.dc.gov/page/holiday-schedule)
- dia da semana: dia da semana
- workingday : se o dia não for fim de semana nem feriado for 1, caso contrário será 0.
+ weathersit:
    - 1: Claro, Poucas nuvens, Parcialmente nublado, Parcialmente nublado
    - 2: Névoa + Nublado, Névoa + Nuvens quebradas, Névoa + Poucas nuvens, Névoa
    - 3: Neve fraca, Chuva fraca + Trovoada + Nuvens dispersas, Chuva fraca + Nuvens dispersas
    - 4: Chuva Pesada + Paletes de Gelo + Trovoada + Névoa, Neve + Neblina
- temp : Temperatura normalizada em Celsius. Os valores são divididos em 41 (máx.)
- atemp: Temperatura de sensação normalizada em Celsius. Os valores são divididos em 50 (máx.)
- hum: umidade normalizada. Os valores são divididos em 100 (máx.)
- windspeed: velocidade do vento normalizada. Os valores são divididos em 67 (máx.)
- casual: contagem de usuários casuais
- registrados: contagem de usuários registrados
- cnt: contagem do total de bicicletas alugadas, incluindo casuais e registradas 

### Importando ferramentas

In [80]:
from __future__ import print_function
import warnings
warnings.filterwarnings("ignore")
from glob import glob
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
%matplotlib inline

In [63]:
for csv in glob(r'C:\Users\Polho\Documents\Python\datasets\Bike Shared\*.csv'):
    print('File:', csv)

File: C:\Users\Polho\Documents\Python\datasets\Bike Shared\day.csv
File: C:\Users\Polho\Documents\Python\datasets\Bike Shared\hour.csv


In [115]:
df_day = pd.read_csv(r'C:\Users\Polho\Documents\Python\datasets\Bike Shared\day.csv')
backup = df_day

In [65]:
df_day.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


### Limpeza de Dados

In [66]:
df_day = df_day.drop(['instant', 'dteday', 'registered', 'casual'], axis=1)

In [67]:
df_day.head()

Unnamed: 0,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,cnt
0,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,985
1,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,801
2,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,1349
3,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,1562
4,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,1600


In [68]:
df_day.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   season      731 non-null    int64  
 1   yr          731 non-null    int64  
 2   mnth        731 non-null    int64  
 3   holiday     731 non-null    int64  
 4   weekday     731 non-null    int64  
 5   workingday  731 non-null    int64  
 6   weathersit  731 non-null    int64  
 7   temp        731 non-null    float64
 8   atemp       731 non-null    float64
 9   hum         731 non-null    float64
 10  windspeed   731 non-null    float64
 11  cnt         731 non-null    int64  
dtypes: float64(4), int64(8)
memory usage: 68.7 KB


In [69]:
df_day.isnull().sum()

season        0
yr            0
mnth          0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
cnt           0
dtype: int64

### Convertendo todas as colunas para númerico

In [70]:
list_coluns_num = []
for c in df_day.columns:
    list_coluns_num.append(c)

for col in list_coluns_num:
    df_day[col] = pd.to_numeric(df_day[col], errors='coerce')


In [71]:
df_day.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   season      731 non-null    int64  
 1   yr          731 non-null    int64  
 2   mnth        731 non-null    int64  
 3   holiday     731 non-null    int64  
 4   weekday     731 non-null    int64  
 5   workingday  731 non-null    int64  
 6   weathersit  731 non-null    int64  
 7   temp        731 non-null    float64
 8   atemp       731 non-null    float64
 9   hum         731 non-null    float64
 10  windspeed   731 non-null    float64
 11  cnt         731 non-null    int64  
dtypes: float64(4), int64(8)
memory usage: 68.7 KB


### Lidando com variáveis categóricas

In [72]:
df_day = pd.get_dummies(df_day, columns=['season','weathersit'], drop_first=True)

In [73]:
df_day.head()

Unnamed: 0,yr,mnth,holiday,weekday,workingday,temp,atemp,hum,windspeed,cnt,season_2,season_3,season_4,weathersit_2,weathersit_3
0,0,1,0,6,0,0.344167,0.363625,0.805833,0.160446,985,0,0,0,1,0
1,0,1,0,0,0,0.363478,0.353739,0.696087,0.248539,801,0,0,0,1,0
2,0,1,0,1,1,0.196364,0.189405,0.437273,0.248309,1349,0,0,0,0,0
3,0,1,0,2,1,0.2,0.212122,0.590435,0.160296,1562,0,0,0,0,0
4,0,1,0,3,1,0.226957,0.22927,0.436957,0.1869,1600,0,0,0,0,0


In [74]:
X = df_day.drop(['cnt'], axis=1)
y = df_day['cnt']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

### Noramalização dos Dados

In [75]:
scale = StandardScaler()
X_train = scale.fit_transform(X_train)
X_test = scale.fit_transform(X_test)

### Testando modelos de Machine Learning

In [101]:
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor

#### Linear Regresssor

In [89]:
modelLinearRegression = LinearRegression()
modelLinearRegression.fit(X_train, y_train)

LinearRegression()

In [90]:
print('Train Score', modelLinearRegression.score(X_train, y_train))

Train Score 0.8181424437726028


In [91]:
print('Test Score',modelLinearRegression.score(X_test, y_test))

Test Score 0.8343304603639339


In [92]:
y_pred = modelLinearRegression.predict(X_test)
print('RMSE', np.sqrt(mean_squared_error(y_test, y_pred)))

RMSE 802.7616954449903


#### KNN Regresssor

In [93]:
modelKNeighborsRegressor = KNeighborsRegressor()
modelKNeighborsRegressor.fit(X_train, y_train)

KNeighborsRegressor()

In [96]:
print('Train Score', modelKNeighborsRegressor.score(X_train, y_train))

Train Score 0.854981055836138


In [97]:
print('Test Score', modelKNeighborsRegressor.score(X_test, y_test))

Test Score 0.8330701821157442


In [98]:
y_pred = modelKNeighborsRegressor.predict(X_test)
print('RMSE', np.sqrt(mean_squared_error(y_test, y_pred)))

RMSE 805.8092874673231
