# Proyek Analisis Data: E-Commerce Public Dataset
- Nama: Alex Lianardo
- Email: alexlianardo9@gmail.com
- Id Dicoding: alexlianardo9

### Dataset "Bike Sharing Dataset"

## Menentukan Pertanyaan Bisnis

1. Mencari tau Season dengan rental sepeda terbanyak dan alasannya kenapa
2. Rata-rata rental sepeda harian selama 2 tahun
3. Rata-rata rental sepeda bulanan selama 2 tahun

## Menyiapkan semua library yang dibutuhkan

In [16]:
import pandas as pd

## Data Wrangling

### Gathering Data

- instant: record index
	- dteday : date
	- season : season (1=springer, 2=summer, 3=fall, 4=winter)
	- yr : year (0: 2011, 1:2012)
	- mnth : month ( 1 to 12)
	- hr : hour (0 to 23)
	- holiday : weather day is holiday or not (extracted from http://dchr.dc.gov/page/holiday-schedule)
	- weekday : day of the week
	- workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
	+ weathersit : 
		- 1: Clear, Few clouds, Partly cloudy, Partly cloudy
		- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
		- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
		- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
	- temp : Normalized temperature in Celsius. The values are divided to 41 (max)
	- atemp: Normalized feeling temperature in Celsius. The values are divided to 50 (max)
	- hum: Normalized humidity. The values are divided to 100 (max)
	- windspeed: Normalized wind speed. The values are divided to 67 (max)
	- casual: count of casual users
	- registered: count of registered users
	- cnt: count of total rental bikes including both casual and registered

In [136]:
df = pd.read_csv('D:\Dicoding\Submission\Bike-sharing-dataset\day.csv')

### Assesing Data

In [137]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     731 non-null    int64  
 1   dteday      731 non-null    object 
 2   season      731 non-null    int64  
 3   yr          731 non-null    int64  
 4   mnth        731 non-null    int64  
 5   holiday     731 non-null    int64  
 6   weekday     731 non-null    int64  
 7   workingday  731 non-null    int64  
 8   weathersit  731 non-null    int64  
 9   temp        731 non-null    float64
 10  atemp       731 non-null    float64
 11  hum         731 non-null    float64
 12  windspeed   731 non-null    float64
 13  casual      731 non-null    int64  
 14  registered  731 non-null    int64  
 15  cnt         731 non-null    int64  
dtypes: float64(4), int64(11), object(1)
memory usage: 91.5+ KB


### Cleaning Data

In [138]:
df.duplicated().sum()

0

Tidak ada duplikat data, hanya ada kesalahan jenis tipe data saja pada kolom dteday.
Saya juga akan menambahkan kolom nama hari dan bulan agar dapat melakukan tracking rata-rata untuk rental sepeda tiap hari.

In [139]:
df['dteday'] = pd.to_datetime(df['dteday'])
df['hari'] = df['dteday'].dt.day_name()
df['bulan'] = df['dteday'].dt.month_name()

In [140]:
df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt,hari,bulan
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985,Saturday,January
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801,Sunday,January
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349,Monday,January
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562,Tuesday,January
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600,Wednesday,January


In [141]:
# Mapping code season ke nama musim
season_mapping = {1: 'Spring', 2: 'Summer', 3: 'Fall', 4: 'Winter'}
df['season'] = df['season'].map(season_mapping)

# Mengembalikan nilai temperature ke Celcius
df['temp'] = df['temp']*40

In [142]:
result_avg_day = df.groupby('hari')[['casual','registered','cnt','temp']].mean().reset_index()
result_avg_month = df.groupby('bulan')[['casual','registered','cnt','temp']].mean().reset_index()
result_avg_season = df.groupby('season')[['casual','registered','cnt','temp']].mean().reset_index()

## Exploratory Data Analysis (EDA)

In [146]:
df.describe()

Unnamed: 0,instant,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0
mean,366.0,0.500684,6.519836,0.028728,2.997264,0.683995,1.395349,19.815392,0.474354,0.627894,0.190486,848.176471,3656.172367,4504.348837
std,211.165812,0.500342,3.451913,0.167155,2.004787,0.465233,0.544894,7.32204,0.162961,0.142429,0.077498,686.622488,1560.256377,1937.211452
min,1.0,0.0,1.0,0.0,0.0,0.0,1.0,2.365216,0.07907,0.0,0.022392,2.0,20.0,22.0
25%,183.5,0.0,4.0,0.0,1.0,0.0,1.0,13.48334,0.337842,0.52,0.13495,315.5,2497.0,3152.0
50%,366.0,1.0,7.0,0.0,3.0,1.0,1.0,19.93332,0.486733,0.626667,0.180975,713.0,3662.0,4548.0
75%,548.5,1.0,10.0,0.0,5.0,1.0,2.0,26.21666,0.608602,0.730209,0.233214,1096.0,4776.5,5956.0
max,731.0,1.0,12.0,1.0,6.0,1.0,3.0,34.46668,0.840896,0.9725,0.507463,3410.0,6946.0,8714.0


Jika dilihat tidak ada yang aneh di datanya

## Visualization & Explanatory Analysis

### Pertanyaan 1 Mencari Tau Musim dengan Rental Sepeda terbanyak

In [143]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from plotly.offline import plot


# fig = make_subplots(rows=1, cols=2)
fig = make_subplots(rows=1, cols=4, specs=[[{"type": "domain"},{"type": "domain"}, {"type": "domain"}, {"type": "domain"}]])

fig.add_trace(go.Pie(
     values=result_avg_season['casual'],
     name="casual",
     labels=result_avg_season['season'],
     domain=dict(x=[0, 0.5]),
     hole=.3,
     textinfo='label+percent'),
     row=1, col=1)

fig.add_trace(go.Pie(
     values=result_avg_season['registered'],
     labels=result_avg_season['season'],
     domain=dict(x=[0.5, 1.0]),
     name="registered",
     hole=.3,
     textinfo='label+percent'),
    row=1, col=2)

fig.add_trace(go.Pie(
     values=result_avg_season['cnt'],
     labels=result_avg_season['season'],
     domain=dict(x=[0.5, 1.0]),
     name="count",
     hole=.3,
     textinfo='label+percent'),
    row=1, col=3)

fig.add_trace(go.Pie(
     values=result_avg_season['temp'],
     labels=result_avg_season['season'],
     domain=dict(x=[0.5, 1.0]),
     name="temperature",
     hole=.3,
     textinfo='label+value'),
    row=1, col=4)


fig.update_layout(
    title_text="Persentase Nilai untuk Setiap Metric",
    # Add annotations in the center of the donut pies.
    annotations=[dict(text='Casual', x=-0.03, y=0, font_size=20, showarrow=False),
                 dict(text='Registered', x=0.21, y=0, font_size=20, showarrow=False),
                 dict(text='Count', x=0.55, y=0, font_size=20, showarrow=False),
                 dict(text='Temperature', x=0.84, y=0, font_size=20, showarrow=False)])

fig.show()

Jika kita lihat bahwa musim gugur (Fall) adalah musim dengan jumlah rental terbanyak untuk setiap kategori pengguna rental.
Dapat dilihat jika suhu rata-rata saat musim gugur lebih tinggi dibandingkan musim yang lainnya, mungkin karena inilah para pesepeda lebih memilih musim Gugur untuk bersepeda.

### Pertanyaan 2 Mencari Tau Rata-rata perhari rental Sepeda

In [144]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from plotly.offline import plot


# fig = make_subplots(rows=1, cols=2)
fig = make_subplots(rows=1, cols=4, specs=[[{"type": "domain"},{"type": "domain"}, {"type": "domain"}, {"type": "domain"}]])

fig.add_trace(go.Pie(
     values=result_avg_day['casual'],
     name="casual",
     labels=result_avg_day['hari'],
     domain=dict(x=[0, 0.5]),
     hole=.3,
     textinfo='label+percent'),
     row=1, col=1)

fig.add_trace(go.Pie(
     values=result_avg_day['registered'],
     labels=result_avg_day['hari'],
     domain=dict(x=[0.5, 1.0]),
     name="registered",
     hole=.3,
     textinfo='label+percent'),
    row=1, col=2)

fig.add_trace(go.Pie(
     values=result_avg_day['cnt'],
     labels=result_avg_day['hari'],
     domain=dict(x=[0.5, 1.0]),
     name="count",
     hole=.3,
     textinfo='label+percent'),
    row=1, col=3)

fig.add_trace(go.Pie(
     values=result_avg_day['temp'],
     labels=result_avg_day['hari'],
     domain=dict(x=[0.5, 1.0]),
     name="temperature",
     hole=.3,
     textinfo='label+value'),
    row=1, col=4)


fig.update_layout(
    title_text="Persentase Nilai untuk Setiap Metric",
    # Add annotations in the center of the donut pies.
    annotations=[dict(text='Casual', x=-0.03, y=0, font_size=20, showarrow=False),
                 dict(text='Registered', x=0.21, y=0, font_size=20, showarrow=False),
                 dict(text='Count', x=0.55, y=0, font_size=20, showarrow=False),
                 dict(text='Temperature', x=0.84, y=0, font_size=20, showarrow=False)])

fig.show()

Tidak ada hal spesifik yang bisa ditemukan disini kecuali hari libur sabtu dan minggu, dimana persentase orang menggunakan rental sepeda lebih tinggi dibandingkan hari-hari lainnya bagi segmen Casual.
Sedangkan segmen Registered cenderung lebih sedikit ketika hari libur sabtu dan minggu.

### Pertanyaan 3 Mencari Tau Rata-rata perbulan rental Sepeda

In [145]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from plotly.offline import plot


# fig = make_subplots(rows=1, cols=2)
fig = make_subplots(rows=1, cols=4, specs=[[{"type": "domain"},{"type": "domain"}, {"type": "domain"}, {"type": "domain"}]])

fig.add_trace(go.Pie(
     values=result_avg_month['casual'],
     name="casual",
     labels=result_avg_month['bulan'],
     domain=dict(x=[0, 0.5]),
     hole=.3,
     textinfo='label+percent'),
     row=1, col=1)

fig.add_trace(go.Pie(
     values=result_avg_month['registered'],
     labels=result_avg_month['bulan'],
     domain=dict(x=[0.5, 1.0]),
     name="registered",
     hole=.3,
     textinfo='label+percent'),
    row=1, col=2)

fig.add_trace(go.Pie(
     values=result_avg_month['cnt'],
     labels=result_avg_month['bulan'],
     domain=dict(x=[0.5, 1.0]),
     name="count",
     hole=.3,
     textinfo='label+percent'),
    row=1, col=3)

fig.add_trace(go.Pie(
     values=result_avg_month['temp'],
     labels=result_avg_month['bulan'],
     domain=dict(x=[0.5, 1.0]),
     name="temperature",
     hole=.3,
     textinfo='label+value'),
    row=1, col=4)


fig.update_layout(
    title_text="Persentase Nilai untuk Setiap Metric",
    # Add annotations in the center of the donut pies.
    annotations=[dict(text='Casual', x=-0.03, y=0, font_size=20, showarrow=False),
                 dict(text='Registered', x=0.21, y=0, font_size=20, showarrow=False),
                 dict(text='Count', x=0.55, y=0, font_size=20, showarrow=False),
                 dict(text='Temperature', x=0.84, y=0, font_size=20, showarrow=False)])

fig.show()

Dilihat berdasarkan temperaturenya bulan juni, july, agustus, dan september lebih panas dibandingkan bulan yang lainnya, persentase yang dilihat untuk segment casual, registered, dan count rata-rata kebanyakan memilihin dibulan juni, juli, agustus, dan september untuk bersepeda karena suhu yang mendukung.

## Conclusion


- Conclution pertanyaan 1
    Dapat disimpulkan jika musim berpengaruh terhadap jumlah rental sepeda, dimana orang-orang lebih prefer untuk bersepeda dihari yang hangat dibandingkan hari yang dingin.
- Conclution pertanyaan 2
    Untuk segment casual orang-orang lebih prefer rental sepeda pada hari weekend, dibandingkan dengan segment Registered orang-orang lebih prefer rental sepeda pada hari weekdays.
- Conclution pertanyaan 3
    Semua segment lebih prefer untuk bersepeda dibulan Juni - Juli dikarenakan suhu yang lebih hangat/panas.