# Proyek Analisis Data: Bike Rental Analysis
- **Nama:** Bernadus Raka Sulistyo
- **Email:** bernadusraka@gmail.com
- **ID Dicoding:** B244021F

## Menentukan Pertanyaan Bisnis

- Musim macam apa yang disukai oleh para penyewa sepeda
- Cuaca macam apa yang disukai oleh para penyewa sepeda?
- Apakah pada hari libur atau hari kerja penyewa biasa menggunakan sepeda?
- Apakah parameter lingkungan seperti kelembapan udara, temperatur, dan kecepatan angin dapat mempengaruhi perilaku penyewa? Jika iya, Bagaimana kondisi lingkungan yang disukai oleh para penyewa?
- Apakah terdapat perubahan perilaku konsumen dari tahun 2011 dan tahun 2012?


## Import Semua Packages/Library yang Digunakan

In [6]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px

## Data Wrangling

### Gathering Data

In [7]:
bike_day_df = pd.read_csv("../Dashboard/Data/bike_rental.csv")

print("Bike sharing daily data ")
bike_day_df

Bike sharing daily data 


Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.200000,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.229270,0.436957,0.186900,82,1518,1600
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
726,727,2012-12-27,1,1,12,0,4,1,2,0.254167,0.226642,0.652917,0.350133,247,1867,2114
727,728,2012-12-28,1,1,12,0,5,1,2,0.253333,0.255046,0.590000,0.155471,644,2451,3095
728,729,2012-12-29,1,1,12,0,6,0,2,0.253333,0.242400,0.752917,0.124383,159,1182,1341
729,730,2012-12-30,1,1,12,0,0,0,1,0.255833,0.231700,0.483333,0.350754,364,1432,1796


**Insight:**
- Terdapat dua jenis tahun yaitu tahun 2011 dan 2012
- Terdapat 16 tabel yang dapat digunakan untuk analisis. Keterangan setiap tabel data dapat dilihat di berkas Readme.txt

### Assessing Data

In [8]:
bike_day_df_2012 = bike_day_df[bike_day_df['yr'] == 1]
bike_day_df_2012.head()


Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
365,366,2012-01-01,1,1,1,0,0,0,1,0.37,0.375621,0.6925,0.192167,686,1608,2294
366,367,2012-01-02,1,1,1,1,1,0,1,0.273043,0.252304,0.381304,0.329665,244,1707,1951
367,368,2012-01-03,1,1,1,0,2,1,1,0.15,0.126275,0.44125,0.365671,89,2147,2236
368,369,2012-01-04,1,1,1,0,3,1,2,0.1075,0.119337,0.414583,0.1847,95,2273,2368
369,370,2012-01-05,1,1,1,0,4,1,1,0.265833,0.278412,0.524167,0.129987,140,3132,3272


In [9]:
bike_day_df_2011 = bike_day_df[bike_day_df['yr'] == 0]
bike_day_df_2011.head()


Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


**Insight:**
- Terdapat data terkait dengan perilaku konsumen setiap harinya selama dua tahun
- Data parameter lingkungan masih dalam bentuk yang dinormalisasi bukan pada data yang sebenarnya 
- Data terkait hari dan musim masih dalam bentuk numerik 

### Cleaning Data

### Data Cleaning 2011 Dataset

In [10]:
my_season_map = {1:"springer", 2:"summer", 3:"fall", 4:"winter"}
bike_day_df_2011['season'] = bike_day_df_2011['season'].map(my_season_map)

my_weather_map = {1:"Clear", 2:"Foggy", 3:"Light Rain/Snow", 4:"Blizzard/Storm"}
bike_day_df_2011['weathersit'] = bike_day_df_2011['weathersit'].map(my_weather_map)

bike_day_df_2011['temp'] = bike_day_df_2011['temp']*41

bike_day_df_2011['atemp'] = bike_day_df_2011['atemp']*50

bike_day_df_2011['hum'] = bike_day_df_2011['hum']*100

bike_day_df_2011['windspeed'] = bike_day_df_2011['windspeed']*67
bike_day_df_2011.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  bike_day_df_2011['season'] = bike_day_df_2011['season'].map(my_season_map)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  bike_day_df_2011['weathersit'] = bike_day_df_2011['weathersit'].map(my_weather_map)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  bike_day_df_2011['temp'] = bike_day_df_2011['t

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,springer,0,1,0,6,0,Foggy,14.110847,18.18125,80.5833,10.749882,331,654,985
1,2,2011-01-02,springer,0,1,0,0,0,Foggy,14.902598,17.68695,69.6087,16.652113,131,670,801
2,3,2011-01-03,springer,0,1,0,1,1,Clear,8.050924,9.47025,43.7273,16.636703,120,1229,1349
3,4,2011-01-04,springer,0,1,0,2,1,Clear,8.2,10.6061,59.0435,10.739832,108,1454,1562
4,5,2011-01-05,springer,0,1,0,3,1,Clear,9.305237,11.4635,43.6957,12.5223,82,1518,1600


In [11]:
import plotly.express as px
for column in bike_day_df_2011.select_dtypes(include=['float64', 'int64']).columns:  
    fig= px.box(bike_day_df_2011, y=bike_day_df_2011[column],title=f"boxplot for'{column}'",width=800, height=400)
    fig.show()

In [12]:
bike_day_df_2011[bike_day_df_2011['hum'] == 0]

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
68,69,2011-03-10,springer,0,3,0,4,1,Light Rain/Snow,15.952731,19.2834,0.0,17.545759,46,577,623


In [13]:
bike_day_df_2011 = bike_day_df_2011.drop(69, axis=0)

**Insight 2011:**
- Data yang dinormalisasi dikembalikan ke nilai semula terlebih dahulu untuk memudahkan pemahaman data 
- Terdapat satu data humidity yang bernilai 0 sehingga data tersebut dihapus

### Data Cleaning 2012 Dataset

In [14]:
my_season_map = {1:"springer", 2:"summer", 3:"fall", 4:"winter"}
bike_day_df_2012['season'] = bike_day_df_2012['season'].map(my_season_map)

my_weather_map = {1:"Clear", 2:"Foggy", 3:"Light Rain/Snow", 4:"Blizzard/Storm"}
bike_day_df_2012['weathersit'] = bike_day_df_2012['weathersit'].map(my_weather_map)

bike_day_df_2012['temp'] = bike_day_df_2012['temp']*41
bike_day_df_2012['atemp'] = bike_day_df_2012['atemp']*50
bike_day_df_2012['hum'] = bike_day_df_2012['hum']*100
bike_day_df_2012['windspeed'] = bike_day_df_2012['windspeed']*67
bike_day_df_2012.head()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
365,366,2012-01-01,springer,1,1,0,0,0,Clear,15.17,18.78105,69.25,12.875189,686,1608,2294
366,367,2012-01-02,springer,1,1,1,1,0,Clear,11.194763,12.6152,38.1304,22.087555,244,1707,1951
367,368,2012-01-03,springer,1,1,0,2,1,Clear,6.15,6.31375,44.125,24.499957,89,2147,2236
368,369,2012-01-04,springer,1,1,0,3,1,Foggy,4.4075,5.96685,41.4583,12.3749,95,2273,2368
369,370,2012-01-05,springer,1,1,0,4,1,Clear,10.899153,13.9206,52.4167,8.709129,140,3132,3272


In [15]:
import plotly.express as px
for column in bike_day_df_2012.select_dtypes(include=['float64', 'int64']).columns:  
    fig= px.box(bike_day_df_2012, y=bike_day_df_2012[column],title=f"boxplot for'{column}'",width=800, height=400)
    fig.show()

**Insight:**
- Data yang sudah dinormalisasi dikembalikan ke nilai semula untuk memudahkan interpretasi data yang digunakan 
- Tidak ada data yang tidak normal terkait dengan parameter yang ada di dataset pada tahun 2012

## Exploratory Data Analysis (EDA)

### Explore statistic parameter from 2011 Dataset

In [16]:
bike_day_df_2011.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
instant,364.0,183.31044,105.488839,1.0,92.75,183.5,274.25,365.0
yr,364.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
mnth,364.0,6.535714,3.452366,1.0,4.0,7.0,10.0,12.0
holiday,364.0,0.027473,0.163681,0.0,0.0,0.0,0.0,1.0
weekday,364.0,3.002747,2.006187,0.0,1.0,3.0,5.0,6.0
workingday,364.0,0.684066,0.465527,0.0,0.0,1.0,1.0,1.0
temp,364.0,19.972427,7.775478,2.424346,13.325,19.748347,26.95751,34.815847
atemp,364.0,23.363997,8.442693,3.95348,16.145212,23.6426,30.650725,42.0448
hum,364.0,64.364856,14.89484,0.0,53.8229,64.70835,74.2323,97.25
windspeed,364.0,12.816325,5.156651,1.500244,9.08386,12.511278,15.77103,34.000021


In [17]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

fig = make_subplots(rows=2, cols=2, subplot_titles=['temp vs cnt', 'atemp vs cnt', 
    'hum vs cnt', 'windspeed vs cnt'
])

features = ['temp', 'atemp', 'hum', 'windspeed']

for i, feature in enumerate(features):
    row = (i // 2) + 1  
    col = (i % 2) + 1   
    
    fig.add_trace(
        go.Scatter(x=bike_day_df_2011[feature], y=bike_day_df_2011['cnt'], mode='markers', name=feature),
        row=row, col=col
    )

fig.update_layout(
    title_text="Possible Enviromental Parameter Correlation to Count of Rented Bike in 2011",
    height=1000, width=800, showlegend=False
)

fig.show()


In [18]:
workingday_0 = bike_day_df_2011[bike_day_df_2011['workingday'] == 0] 
workingday0 = workingday_0['cnt'].sum()

workingday_1 = bike_day_df_2011[bike_day_df_2011['workingday'] > 0]
workingday1 = workingday_1['cnt'].sum()

vis_workingday = pd.DataFrame(data=(workingday1,workingday0), index=('working day','holiday'))

fig = px.bar(data_frame=vis_workingday,text_auto=".2s",title='Count of rented bike between workingday and holiday in 2011')
fig.show()

In [19]:

import plotly.express as px

fig = px.histogram(bike_day_df_2011,color='weathersit', x="season",y='cnt',text_auto=".2s", title="Rented bike behaviour based on season and weather in 2011")
fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
fig.update_layout(
    barmode='group',  
    xaxis_title="Season",  
    yaxis_title="Count of Rides"
)
fig.show()


In [20]:
import plotly.express as px
fig = px.scatter(bike_day_df_2011, x="dteday", y="cnt", color="season",title='Correlation between count of rented bike every day and the season in 2011')
fig.show()

In [21]:
bike_day_df_2011[(bike_day_df_2011['cnt'] == 431) | (bike_day_df_2011['cnt'] == 1115) | (bike_day_df_2011['cnt'] == 627) | (bike_day_df_2011['cnt'] == 795)] 

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
26,27,2011-01-27,springer,0,1,0,4,1,Clear,7.995,10.985,68.75,7.627079,15,416,431
105,106,2011-04-16,summer,0,4,0,6,0,Light Rain/Snow,17.664153,21.2746,88.8333,22.834136,121,674,795
238,239,2011-08-27,fall,0,8,0,6,0,Foggy,27.88,31.7778,85.0,25.166339,226,889,1115
301,302,2011-10-29,winter,0,10,0,6,0,Light Rain/Snow,10.420847,11.39565,88.25,23.541857,57,570,627


In [22]:
def percent_hum(hum):
    if hum <= 60:
        hum = bike_day_df_2011['hum']
        return 'low humidity'
    elif hum > 60 and hum <= 80:
        return 'mid humidity'
    else:
        return 'high humidity'

bike_day_df_2011['humidity level'] = bike_day_df_2011['hum'].apply(percent_hum)
bike_day_df_2011.head(5)

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt,humidity level
0,1,2011-01-01,springer,0,1,0,6,0,Foggy,14.110847,18.18125,80.5833,10.749882,331,654,985,high humidity
1,2,2011-01-02,springer,0,1,0,0,0,Foggy,14.902598,17.68695,69.6087,16.652113,131,670,801,mid humidity
2,3,2011-01-03,springer,0,1,0,1,1,Clear,8.050924,9.47025,43.7273,16.636703,120,1229,1349,low humidity
3,4,2011-01-04,springer,0,1,0,2,1,Clear,8.2,10.6061,59.0435,10.739832,108,1454,1562,low humidity
4,5,2011-01-05,springer,0,1,0,3,1,Clear,9.305237,11.4635,43.6957,12.5223,82,1518,1600,low humidity


In [23]:
import plotly.express as px

fig = px.histogram(bike_day_df_2011,color='humidity level', x="humidity level",y='cnt',text_auto=".2s", title="Rented bike behaviour based on humidity level in 2011")
fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
fig.show()

In [24]:
import plotly.express as px
fig = px.scatter(bike_day_df_2011, x="hum", y="cnt", color='season',title='Correlation plot between humidity and biking behaviour in 2011')
fig.show()

In [25]:
def percent_temp(temp):
    if temp <= 15:
        temp = bike_day_df_2011['temp']
        return 'low temperature'
    elif temp > 15 and temp <= 25:
        return 'mid temperature'
    else:
        return 'high temperature'

bike_day_df_2011['temperature level'] = bike_day_df_2011['temp'].apply(percent_temp)
bike_day_df_2011.head(5)

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt,humidity level,temperature level
0,1,2011-01-01,springer,0,1,0,6,0,Foggy,14.110847,18.18125,80.5833,10.749882,331,654,985,high humidity,low temperature
1,2,2011-01-02,springer,0,1,0,0,0,Foggy,14.902598,17.68695,69.6087,16.652113,131,670,801,mid humidity,low temperature
2,3,2011-01-03,springer,0,1,0,1,1,Clear,8.050924,9.47025,43.7273,16.636703,120,1229,1349,low humidity,low temperature
3,4,2011-01-04,springer,0,1,0,2,1,Clear,8.2,10.6061,59.0435,10.739832,108,1454,1562,low humidity,low temperature
4,5,2011-01-05,springer,0,1,0,3,1,Clear,9.305237,11.4635,43.6957,12.5223,82,1518,1600,low humidity,low temperature


In [26]:
import plotly.express as px

fig = px.histogram(bike_day_df_2011,color='temperature level', x="temperature level",y='cnt',text_auto=".2s", title="Rented bike behaviour based on temperature level in 2011")
fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
fig.show()

In [27]:
import plotly.express as px
fig = px.scatter(bike_day_df_2011, x="temp", y="cnt", color='season',title='Correlation plot between temperature and biking behaviour in 2011')
fig.show()

In [28]:
def percent_wind(wind):
    if wind <= 10:
        wind = bike_day_df_2011['windspeed']
        return 'low windspeed'
    elif wind > 10 and wind <= 20:
        return 'mid windspeed'
    else:
        return 'high windspeed'

bike_day_df_2011['windspeed level'] = bike_day_df_2011['windspeed'].apply(percent_wind)
bike_day_df_2011.head(5)

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt,humidity level,temperature level,windspeed level
0,1,2011-01-01,springer,0,1,0,6,0,Foggy,14.110847,18.18125,80.5833,10.749882,331,654,985,high humidity,low temperature,mid windspeed
1,2,2011-01-02,springer,0,1,0,0,0,Foggy,14.902598,17.68695,69.6087,16.652113,131,670,801,mid humidity,low temperature,mid windspeed
2,3,2011-01-03,springer,0,1,0,1,1,Clear,8.050924,9.47025,43.7273,16.636703,120,1229,1349,low humidity,low temperature,mid windspeed
3,4,2011-01-04,springer,0,1,0,2,1,Clear,8.2,10.6061,59.0435,10.739832,108,1454,1562,low humidity,low temperature,mid windspeed
4,5,2011-01-05,springer,0,1,0,3,1,Clear,9.305237,11.4635,43.6957,12.5223,82,1518,1600,low humidity,low temperature,mid windspeed


In [29]:
import plotly.express as px

fig = px.histogram(bike_day_df_2011,color='windspeed level', x="windspeed level",y='cnt',text_auto=".2s", title="Rented bike behaviour based on windspeed in 2011")
fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
fig.show()

In [30]:
fig = px.scatter(bike_day_df_2011, x="windspeed", y="cnt", color='season',title='Correlation plot between windspeed and biking behaviour in 2011')
fig.show()

### Kesimpulan tahun 2011

- Peminjaman sepeda lebih banyak pada hari kerja daripada hari libur 
- Cuaca yang paling disukai oleh user adalah cuaca jenis 1 (clear weather)
- Musim yang paling diminati oleh peminjam adalah musim gugur (fall season)
- Kecepatan angin dan kelembapan yang tinggi serta suhu yang rendah kurang diminati oleh para pesepeda 

### Explore Statistic Parameter in 2012 Dataset

In [31]:
bike_day_df_2012.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
instant,366.0,548.5,105.799338,366.0,457.25,548.5,639.75,731.0
yr,366.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0
mnth,366.0,6.513661,3.455958,1.0,4.0,7.0,9.75,12.0
holiday,366.0,0.030055,0.170971,0.0,0.0,0.0,0.0,1.0
weekday,366.0,2.986339,2.006108,0.0,1.0,3.0,5.0,6.0
workingday,366.0,0.68306,0.465921,0.0,0.0,1.0,1.0,1.0
temp,366.0,20.667313,7.220597,4.4075,14.256038,21.080847,26.812299,35.328347
atemp,366.0,24.092605,7.837799,5.0829,17.534275,24.888975,30.382325,40.24565
hum,366.0,61.216645,13.420576,25.4167,50.812525,61.1875,71.1146,92.5
windspeed,366.0,12.701344,5.238985,3.12555,8.959307,11.70825,15.490115,29.584721


In [32]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

fig = make_subplots(rows=2, cols=2, subplot_titles=['temp vs cnt', 'atemp vs cnt', 
    'hum vs cnt', 'windspeed vs cnt'
])

features = ['temp', 'atemp', 'hum', 'windspeed']

for i, feature in enumerate(features):
    row = (i // 2) + 1  
    col = (i % 2) + 1   
    
    fig.add_trace(
        go.Scatter(x=bike_day_df_2012[feature], y=bike_day_df_2012['cnt'], mode='markers', name=feature),
        row=row, col=col
    )

fig.update_layout(
    title_text="Possible Features Correlation to Label",
    height=1000, width=800, showlegend=False
)

fig.show()


In [33]:
workingday_0 = bike_day_df_2012[bike_day_df_2012['workingday'] == 0] 
workingday0 = workingday_0['cnt'].sum()

workingday_1 = bike_day_df_2012[bike_day_df_2012['workingday'] > 0]
workingday1 = workingday_1['cnt'].sum()

vis_workingday = pd.DataFrame(data=(workingday1,workingday0), index=('working day','holiday'))

fig = px.bar(data_frame=vis_workingday,text_auto=".2s",title='Sum of rented bike between workingday and holiday')
fig.show()

In [34]:

fig = px.histogram(bike_day_df_2012,color='weathersit', x="season",y='cnt',text_auto=".2s", title="Rented bike behaviour based on season and weather in 2012")
fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
fig.update_layout(
    barmode='group',  
    xaxis_title="Season",  
    yaxis_title="Count of Rides"
)
fig.show()


In [35]:
fig = px.scatter(bike_day_df_2012, x="dteday", y="cnt", color="season",title='Rented bike every day according to the season in 2012')
fig.show()

In [36]:
bike_day_df_2012[(bike_day_df_2012['cnt'] == 22) | (bike_day_df_2012['cnt'] == 1027) | (bike_day_df_2012['cnt'] == 4073) | (bike_day_df_2012['cnt'] == 441)] 

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
477,478,2012-04-22,summer,1,4,0,0,0,Light Rain/Snow,16.263347,19.4752,83.5417,23.084582,120,907,1027
626,627,2012-09-18,fall,1,9,0,2,1,Foggy,25.556653,28.25335,87.25,23.958329,371,3702,4073
667,668,2012-10-29,winter,1,10,0,1,1,Light Rain/Snow,18.04,21.97,88.0,23.9994,2,20,22
725,726,2012-12-26,springer,1,12,0,3,1,Light Rain/Snow,9.976653,11.01665,82.3333,21.208582,9,432,441


In [37]:
def percent_hum(hum):
    if hum <= 50:
        hum = bike_day_df_2012['hum']
        return 'low humidity'
    elif hum > 50 and hum <= 80:
        return 'mid humidity'
    else:
        return 'high humidity'

bike_day_df_2012['humidity level'] = bike_day_df_2012['hum'].apply(percent_hum)
bike_day_df_2012.head(5)




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt,humidity level
365,366,2012-01-01,springer,1,1,0,0,0,Clear,15.17,18.78105,69.25,12.875189,686,1608,2294,mid humidity
366,367,2012-01-02,springer,1,1,1,1,0,Clear,11.194763,12.6152,38.1304,22.087555,244,1707,1951,low humidity
367,368,2012-01-03,springer,1,1,0,2,1,Clear,6.15,6.31375,44.125,24.499957,89,2147,2236,low humidity
368,369,2012-01-04,springer,1,1,0,3,1,Foggy,4.4075,5.96685,41.4583,12.3749,95,2273,2368,low humidity
369,370,2012-01-05,springer,1,1,0,4,1,Clear,10.899153,13.9206,52.4167,8.709129,140,3132,3272,mid humidity


In [38]:
import plotly.express as px

fig = px.histogram(bike_day_df_2012,color='humidity level', x="humidity level",y='cnt',text_auto=".2s", title="Rented bike behaviour based on humidity in 2012")
fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
fig.show()

In [39]:
fig = px.scatter(bike_day_df_2012, x="hum", y="cnt", color='season',title='Correlation plot between humidity and biking behaviour in 2012')
fig.show()

In [40]:
def percent_wind(wind):
    if wind <= 10:
        wind = bike_day_df_2012['windspeed']
        return 'low windspeed'
    elif wind > 10 and wind <= 20:
        return 'mid windspeed'
    else:
        return 'high windspeed'

bike_day_df_2012['windspeed level'] = bike_day_df_2012['windspeed'].apply(percent_wind)
bike_day_df_2012.head(5)




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt,humidity level,windspeed level
365,366,2012-01-01,springer,1,1,0,0,0,Clear,15.17,18.78105,69.25,12.875189,686,1608,2294,mid humidity,mid windspeed
366,367,2012-01-02,springer,1,1,1,1,0,Clear,11.194763,12.6152,38.1304,22.087555,244,1707,1951,low humidity,high windspeed
367,368,2012-01-03,springer,1,1,0,2,1,Clear,6.15,6.31375,44.125,24.499957,89,2147,2236,low humidity,high windspeed
368,369,2012-01-04,springer,1,1,0,3,1,Foggy,4.4075,5.96685,41.4583,12.3749,95,2273,2368,low humidity,mid windspeed
369,370,2012-01-05,springer,1,1,0,4,1,Clear,10.899153,13.9206,52.4167,8.709129,140,3132,3272,mid humidity,low windspeed


In [41]:
fig = px.histogram(bike_day_df_2012,color='windspeed level', x="windspeed level",y='cnt',text_auto=".2s", title="Rented bike behaviour based on Windspeed in 2012")
fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
fig.show()

In [42]:
fig = px.scatter(bike_day_df_2012, x="windspeed", y="cnt", color='season',title='Correlation plot between windspeed and biking behaviour in 2012')
fig.show()

In [43]:
def percent_temp(temp):
    if temp <= 15:
        temp = bike_day_df_2012['temp']
        return 'low temperature'
    elif temp > 15 and temp <= 25:
        return 'mid temperature'
    else:
        return 'high temperature'

bike_day_df_2012['temperature level'] = bike_day_df_2012['temp'].apply(percent_temp)
bike_day_df_2012.head(5)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt,humidity level,windspeed level,temperature level
365,366,2012-01-01,springer,1,1,0,0,0,Clear,15.17,18.78105,69.25,12.875189,686,1608,2294,mid humidity,mid windspeed,mid temperature
366,367,2012-01-02,springer,1,1,1,1,0,Clear,11.194763,12.6152,38.1304,22.087555,244,1707,1951,low humidity,high windspeed,low temperature
367,368,2012-01-03,springer,1,1,0,2,1,Clear,6.15,6.31375,44.125,24.499957,89,2147,2236,low humidity,high windspeed,low temperature
368,369,2012-01-04,springer,1,1,0,3,1,Foggy,4.4075,5.96685,41.4583,12.3749,95,2273,2368,low humidity,mid windspeed,low temperature
369,370,2012-01-05,springer,1,1,0,4,1,Clear,10.899153,13.9206,52.4167,8.709129,140,3132,3272,mid humidity,low windspeed,low temperature


In [44]:
fig = px.histogram(bike_day_df_2012,color='temperature level', x="temperature level",y='cnt',text_auto=".2s", title="Rented bike behaviour based on Temperature")
fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
fig.show()

In [45]:
fig = px.scatter(bike_day_df_2012, x="temp", y="cnt", color='season', title="Correlation plot between temperature and biking behaviour in 2012")
fig.show()

### Kesimpulan Tahun 2012
- Cuaca yang disukai oleh penyewa sepeda adalah cuaca 1 
- Season yang disukai oleh penyewa sepeda adalah fall (Season 3)
- Terdapat peningkatan perilaku peminjam sepeda pada hari kerja 
- Kecepatan angin dan kelembapan yang tinggi serta temperatur yang rendah kurang disukai oleh penyewa sepeda

### Behaviour Biking Analysis Between 2011 and 2012 

In [46]:

fig = go.Figure()
fig.add_trace(go.Histogram(x=bike_day_df_2011['workingday'], y=bike_day_df_2011['cnt'],histfunc="sum",name='2011',texttemplate='%{y:.2s}',textposition='outside'))
fig.add_trace(go.Histogram(x=bike_day_df_2012['workingday'], y=bike_day_df_2012['cnt'],histfunc="sum", name='2012', texttemplate='%{y:.2s}',textposition='inside'))
fig.update_layout(
    barmode='group',  
    xaxis_title="Working Day",  
    yaxis_title="Count of Rides",  
    title="Bike Counts by Working Day in 2011 and 2012"
)

fig.show()

In [47]:

fig = go.Figure()
fig.add_trace(go.Histogram(x=bike_day_df_2011['weathersit'], y=bike_day_df_2011['cnt'],histfunc="sum", name='2011', texttemplate='%{y:.2s}',textposition='outside'))
fig.add_trace(go.Histogram(x=bike_day_df_2012['weathersit'], y=bike_day_df_2012['cnt'],histfunc="sum", name='2012', texttemplate='%{y:.2s}',textposition='inside'))
fig.update_layout(
    barmode='group',  
    xaxis_title="Weather Condition",  
    yaxis_title="Count of Rides",  
    title="Bike Counts by Weather Condition in 2011 and 2012"
)

fig.show()

In [48]:

fig = go.Figure()
fig.add_trace(go.Histogram(x=bike_day_df_2011['season'], y=bike_day_df_2011['cnt'],histfunc="sum", name='2011', texttemplate='%{y:.2s}',textposition='outside'))
fig.add_trace(go.Histogram(x=bike_day_df_2012['season'], y=bike_day_df_2012['cnt'],histfunc="sum", name='2012', texttemplate='%{y:.2s}',textposition='inside'))
fig.update_layout(
    barmode='group',  
    xaxis_title="Season",  
    yaxis_title="Count of Rides",  
    title="Bike Counts by Season in 2011 and 2012"
)

fig.show()

In [49]:

fig = go.Figure()
fig.add_trace(go.Histogram(x=bike_day_df_2011['humidity level'], y=bike_day_df_2011['cnt'],histfunc="sum", name='2011', texttemplate='%{y:.2s}',textposition='outside'))
fig.add_trace(go.Histogram(x=bike_day_df_2012['humidity level'], y=bike_day_df_2012['cnt'],histfunc="sum", name='2012', texttemplate='%{y:.2s}',textposition='inside'))
fig.update_layout(
    barmode='group',  
    xaxis_title="humidity",  
    yaxis_title="Count of Rides",  
    title="Bike Counts by humidity level in 2011 and 2012"
)

fig.show()

In [50]:

fig = go.Figure()
fig.add_trace(go.Histogram(x=bike_day_df_2011['temperature level'], y=bike_day_df_2011['cnt'],histfunc="sum", name='2011', texttemplate='%{y:.2s}',textposition='outside'))
fig.add_trace(go.Histogram(x=bike_day_df_2012['temperature level'], y=bike_day_df_2012['cnt'],histfunc="sum", name='2012', texttemplate='%{y:.2s}',textposition='inside'))
fig.update_layout(
    barmode='group',  
    xaxis_title="temperature",  
    yaxis_title="Count of Rides",  
    title="Bike Counts by temperature level in 2011 and 2012"
)

fig.show()

In [51]:

fig = go.Figure()
fig.add_trace(go.Histogram(x=bike_day_df_2011['windspeed level'], y=bike_day_df_2011['cnt'],histfunc="sum", name='2011', texttemplate='%{y:.2s}',textposition='outside'))
fig.add_trace(go.Histogram(x=bike_day_df_2012['windspeed level'], y=bike_day_df_2012['cnt'],histfunc="sum", name='2012', texttemplate='%{y:.2s}',textposition='inside'))
fig.update_layout(
    barmode='group',  
    xaxis_title="windspeed",  
    yaxis_title="Count of Rides",  
    title="Bike Counts by windspeed level in 2011 and 2012"
)

fig.show()

### Kesimpulan Analisis data dari tahun ke tahun 
- Secara keseluruhan perilaku konsumen tidak berubah 
- Terjadi peningkatan konsumen dari tahun 2011 ke tahun 2012

## Conclusion

- Musim yang paling disukai oleh penyewa sepeda adalah musim gugur (fall)
- Penyewa paling menyukai cuaca yang clear 
- Jumlah penyewa sepeda pada hari kerja lebih banyak daripada hari libur 
- Jumlah penyewa sepeda berkurang drastis ketika suhu rendah, kelembapan dan kecepatan angin sedang tinggi 
- Tidak terdapat perubahan perilaku dari konsumen dari tahun 2011 dengan tahun 2012, jumlah pengguna meningkat secara signifikan dari tahun 2011 hingga 2012