# Proyek Analisis Data: Bike Sharing Dataset
- **Nama:** Andrew Jonatan Damanik
- **Email:** andrewdamanik23@gmail.com
- **ID Dicoding:** drewjd27

## Tentang Dataset

Bike Sharing Dataset ini adalah data mengenai penyewaan sepeda tahun selama dua tahun, yaitu 2011 dan 2012. Terdapat dua file csv pada dataset ini, yaitu day.csv yang mencatat penyewaan sepeda per hari, dan hour.csv yang mencatat penyewaan sepeda per jam.

#### Karakteristik Dataset

Pada hour.csv dan day.csv dapat dilihat bahwa keduanya memiliki field yang sama, kecuali pada day.csv tidak terdapat field "hr". Berikut ini adalah keterangan dari field-field tersebut.
- instant: record index
- dtday: tanggal
- season: musim (1:musim dingin, 2:musim semi, 3:musim panas, 4:musim gugur)
- yr: tahun (0: 2011, 1:2012)
- mtnh: bulan (1 - 12)
- hr: jam (0 - 23)
- holiday : Apakah hari libur atau tidak (0 adalah tidak libur, 1 adalah libur)
- weekday: hari pada tiap pekan (0 sampai 6)
- workingsday: jika hari tersebut bukan hari libur ataupun akhir pekan adalah 1, jika tidak maka 0
- weathersit:
    - 1: Clear, Few clouds, Partly cloudy, Partly cloudy
	- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
	- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
	- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
- temp: temperatur (celcius) yang dinormalisasi. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)
- atemp: dugaan temperatur (celcius) yang dinormalisasi. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)
- hum: Kelembapan yang dinormalisasi. Nilainya dibagi 100 (max)
- windspeed: Kecepatan angin yang dinormalisasi. Nilainya dibagi ke 67 (max)
- casual: jumlah pengguna kasual
- registered: jumlah pengguna yang sudah mendaftar
- cnt: jumlah total dari sepeda yang disewa termasuk kasual dan terdaftar.

## Menentukan Pertanyaan Bisnis

- Di musim apa sepeda paling banyak disewa?
- Apakah cuaca mempengaruhi jumlah sepeda yang disewa? Bagaimana pengaruhnya?

## Import Packages/Library yang Digunakan

In [499]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

## Data Wrangling

### Gathering Data

Menampilkan 5 baris pertama dari hour.csv

In [500]:
hour_df = pd.read_csv('data/hour.csv')
hour_df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


Menampilkan 5 data dari day.csv

In [501]:
day_df = pd.read_csv('data/day.csv')
day_df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


### Assessing Data

#### Cek day_df

In [502]:
day_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     731 non-null    int64  
 1   dteday      731 non-null    object 
 2   season      731 non-null    int64  
 3   yr          731 non-null    int64  
 4   mnth        731 non-null    int64  
 5   holiday     731 non-null    int64  
 6   weekday     731 non-null    int64  
 7   workingday  731 non-null    int64  
 8   weathersit  731 non-null    int64  
 9   temp        731 non-null    float64
 10  atemp       731 non-null    float64
 11  hum         731 non-null    float64
 12  windspeed   731 non-null    float64
 13  casual      731 non-null    int64  
 14  registered  731 non-null    int64  
 15  cnt         731 non-null    int64  
dtypes: float64(4), int64(11), object(1)
memory usage: 91.5+ KB


Dilihat dari jumlah non-null count nya, tidak ada missing values pada day_df. Kemudian pada type data pada field dteday harusnya adalah bertipe datetime

In [503]:
print("Total duplicates: ", day_df.duplicated().sum())

Total duplicates:  0


Tidak ada data duplikat pada day_df

In [529]:
print(hour_df.isna().sum())

instant       0
dteday        0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64


In [505]:
day_df.describe()

Unnamed: 0,instant,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0
mean,366.0,2.49658,0.500684,6.519836,0.028728,2.997264,0.683995,1.395349,0.495385,0.474354,0.627894,0.190486,848.176471,3656.172367,4504.348837
std,211.165812,1.110807,0.500342,3.451913,0.167155,2.004787,0.465233,0.544894,0.183051,0.162961,0.142429,0.077498,686.622488,1560.256377,1937.211452
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.05913,0.07907,0.0,0.022392,2.0,20.0,22.0
25%,183.5,2.0,0.0,4.0,0.0,1.0,0.0,1.0,0.337083,0.337842,0.52,0.13495,315.5,2497.0,3152.0
50%,366.0,3.0,1.0,7.0,0.0,3.0,1.0,1.0,0.498333,0.486733,0.626667,0.180975,713.0,3662.0,4548.0
75%,548.5,3.0,1.0,10.0,0.0,5.0,1.0,2.0,0.655417,0.608602,0.730209,0.233214,1096.0,4776.5,5956.0
max,731.0,4.0,1.0,12.0,1.0,6.0,1.0,3.0,0.861667,0.840896,0.9725,0.507463,3410.0,6946.0,8714.0


#### Cek hour_df

In [506]:
hour_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17379 entries, 0 to 17378
Data columns (total 17 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     17379 non-null  int64  
 1   dteday      17379 non-null  object 
 2   season      17379 non-null  int64  
 3   yr          17379 non-null  int64  
 4   mnth        17379 non-null  int64  
 5   hr          17379 non-null  int64  
 6   holiday     17379 non-null  int64  
 7   weekday     17379 non-null  int64  
 8   workingday  17379 non-null  int64  
 9   weathersit  17379 non-null  int64  
 10  temp        17379 non-null  float64
 11  atemp       17379 non-null  float64
 12  hum         17379 non-null  float64
 13  windspeed   17379 non-null  float64
 14  casual      17379 non-null  int64  
 15  registered  17379 non-null  int64  
 16  cnt         17379 non-null  int64  
dtypes: float64(4), int64(12), object(1)
memory usage: 2.3+ MB


Sama seperti pada day_df, tidak ada missing values pada day_df. Kemudian pada type data pada field dteday harusnya adalah bertipe datetime

In [507]:
print("Total duplicates: ", hour_df.duplicated().sum())

Total duplicates:  0


Tidak terdapat data duplikat pada hour_df

In [508]:
hour_df.describe()

Unnamed: 0,instant,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0
mean,8690.0,2.50164,0.502561,6.537775,11.546752,0.02877,3.003683,0.682721,1.425283,0.496987,0.475775,0.627229,0.190098,35.676218,153.786869,189.463088
std,5017.0295,1.106918,0.500008,3.438776,6.914405,0.167165,2.005771,0.465431,0.639357,0.192556,0.17185,0.19293,0.12234,49.30503,151.357286,181.387599
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.02,0.0,0.0,0.0,0.0,0.0,1.0
25%,4345.5,2.0,0.0,4.0,6.0,0.0,1.0,0.0,1.0,0.34,0.3333,0.48,0.1045,4.0,34.0,40.0
50%,8690.0,3.0,1.0,7.0,12.0,0.0,3.0,1.0,1.0,0.5,0.4848,0.63,0.194,17.0,115.0,142.0
75%,13034.5,3.0,1.0,10.0,18.0,0.0,5.0,1.0,2.0,0.66,0.6212,0.78,0.2537,48.0,220.0,281.0
max,17379.0,4.0,1.0,12.0,23.0,1.0,6.0,1.0,4.0,1.0,1.0,1.0,0.8507,367.0,886.0,977.0


#### Insight dari Assessment

- day_df dan hour_df memiliki field yang sama, kecuali "hr" yang mencatat waktu jam sepeda disewa pada hour_df.
- Untuk field season, yr, mnth, hr, holiday, weekday, dan weathersit harusnya adalah data dengan tipe kategorikal untuk pemahaman dan analisis yang lebih baik.
- Tipe data dteday pada hour_df dan day_df adalah object. Seharusnya tipe data yang tepat adalah datetime.
- Tidak ada data duplikat dan data kosong pada day_df dan hour_df

### Cleaning Data

#### hour_df

Mengubah tipe data object pada field dteday menjadi datetime

In [509]:
datetime_columns_hour_df = ["dteday"]

for column in datetime_columns_hour_df:
  hour_df[column] = pd.to_datetime(hour_df[column])

Mengubah tipe data pada field mnth dan weather menjadi category

In [510]:
category_columns_hour_df = ['mnth', 'weathersit']

hour_df[category_columns_hour_df] = hour_df[category_columns_hour_df].astype('category')

Mengubah nilai dan tipe data pada field season, yr, holiday, weekday, workingday

In [511]:
# Dictionary mapping untuk setiap kolom
mappings_hour_df = {
    'season': {1:'Winter', 2:'Spring', 3:'Summer', 4:'Fall'},
    'yr': {0:2011, 1:2012},
    'holiday': {0:'Not Holiday', 1:'Holiday'},
    'weekday': {0: 'Sunday', 1: 'Monday', 2: 'Tuesday', 3: 'Wednesday', 4: 'Thursday', 5: 'Friday', 6: 'Saturday'},
    'workingday': {0:'Not Workingday', 1:'Workingday'}
}

# Mengubah nilai dan tipe data untuk setiap kolom
for col, mapping in mappings_hour_df.items():
    hour_df[col] = hour_df[col].map(mapping).astype('category')

# Menampilkan hasil
hour_df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,Winter,2011,1,0,Not Holiday,Saturday,Not Workingday,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,Winter,2011,1,1,Not Holiday,Saturday,Not Workingday,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,Winter,2011,1,2,Not Holiday,Saturday,Not Workingday,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,Winter,2011,1,3,Not Holiday,Saturday,Not Workingday,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,Winter,2011,1,4,Not Holiday,Saturday,Not Workingday,1,0.24,0.2879,0.75,0.0,0,1,1


In [512]:
hour_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17379 entries, 0 to 17378
Data columns (total 17 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   instant     17379 non-null  int64         
 1   dteday      17379 non-null  datetime64[ns]
 2   season      17379 non-null  category      
 3   yr          17379 non-null  category      
 4   mnth        17379 non-null  category      
 5   hr          17379 non-null  int64         
 6   holiday     17379 non-null  category      
 7   weekday     17379 non-null  category      
 8   workingday  17379 non-null  category      
 9   weathersit  17379 non-null  category      
 10  temp        17379 non-null  float64       
 11  atemp       17379 non-null  float64       
 12  hum         17379 non-null  float64       
 13  windspeed   17379 non-null  float64       
 14  casual      17379 non-null  int64         
 15  registered  17379 non-null  int64         
 16  cnt         17379 non-

#### day_df

Mengubah tipe data object pada field dteday menjadi datetime

In [513]:
datetime_columns_day_csv = ["dteday"]

for column in datetime_columns_day_csv:
  day_df[column] = pd.to_datetime(day_df[column])

Mengubah tipe data pada field mnth dan weather menjadi category

In [514]:
category_columns_hour_df = ['mnth', 'weathersit']

day_df[category_columns_hour_df] = day_df[category_columns_hour_df].astype('category')

Mengubah nilai dan tipe data pada field season, yr, holiday, weekday, workingday

In [515]:
# Dictionary mapping untuk setiap kolom
mappings_day_df = {
    'season': {1:'Winter', 2:'Spring', 3:'Summer', 4:'Fall'},
    'yr': {0:2011, 1:2012},
    'holiday': {0:'Not Holiday', 1:'Holiday'},
    'weekday': {0: 'Sunday', 1: 'Monday', 2: 'Tuesday', 3: 'Wednesday', 4: 'Thursday', 5: 'Friday', 6: 'Saturday'},
    'workingday': {0:'Not Workingday', 1:'Workingday'}
}

# Mengubah nilai dan tipe data untuk setiap kolom
for col, mapping in mappings_day_df.items():
    day_df[col] = day_df[col].map(mapping).astype('category')

# Menampilkan hasil
day_df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,Winter,2011,1,Not Holiday,Saturday,Not Workingday,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,Winter,2011,1,Not Holiday,Sunday,Not Workingday,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,Winter,2011,1,Not Holiday,Monday,Workingday,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,Winter,2011,1,Not Holiday,Tuesday,Workingday,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,Winter,2011,1,Not Holiday,Wednesday,Workingday,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


In [516]:
day_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   instant     731 non-null    int64         
 1   dteday      731 non-null    datetime64[ns]
 2   season      731 non-null    category      
 3   yr          731 non-null    category      
 4   mnth        731 non-null    category      
 5   holiday     731 non-null    category      
 6   weekday     731 non-null    category      
 7   workingday  731 non-null    category      
 8   weathersit  731 non-null    category      
 9   temp        731 non-null    float64       
 10  atemp       731 non-null    float64       
 11  hum         731 non-null    float64       
 12  windspeed   731 non-null    float64       
 13  casual      731 non-null    int64         
 14  registered  731 non-null    int64         
 15  cnt         731 non-null    int64         
dtypes: category(7), datetime64

In [517]:
day_df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,Winter,2011,1,Not Holiday,Saturday,Not Workingday,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,Winter,2011,1,Not Holiday,Sunday,Not Workingday,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,Winter,2011,1,Not Holiday,Monday,Workingday,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,Winter,2011,1,Not Holiday,Tuesday,Workingday,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,Winter,2011,1,Not Holiday,Wednesday,Workingday,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


#### Insight dari Cleaning Data

- Tipe data pada field dteday yang ada pada day_df dan hour_df telah menjadi datetime.
- Tipe data pada field season, yr, holiday, weekday, workingday, weathersit, dan mnth telah menjadi category
- Nilai data pada field season, yr, holiday, weekday, workingday telah diubah agar lebih mudah dipahami

## Exploratory Data Analysis (EDA)

### Eksplor hour_df

In [518]:
hour_df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,Winter,2011,1,0,Not Holiday,Saturday,Not Workingday,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,Winter,2011,1,1,Not Holiday,Saturday,Not Workingday,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,Winter,2011,1,2,Not Holiday,Saturday,Not Workingday,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,Winter,2011,1,3,Not Holiday,Saturday,Not Workingday,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,Winter,2011,1,4,Not Holiday,Saturday,Not Workingday,1,0.24,0.2879,0.75,0.0,0,1,1


In [519]:
hour_df.describe(include = 'all')

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,17379.0,17379,17379,17379.0,17379.0,17379.0,17379,17379,17379,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0
unique,,,4,2.0,12.0,,2,7,2,4.0,,,,,,,
top,,,Summer,2012.0,7.0,,Not Holiday,Saturday,Workingday,1.0,,,,,,,
freq,,,4496,8734.0,1488.0,,16879,2512,11865,11413.0,,,,,,,
mean,8690.0,2012-01-02 04:08:34.552045568,,,,11.546752,,,,,0.496987,0.475775,0.627229,0.190098,35.676218,153.786869,189.463088
min,1.0,2011-01-01 00:00:00,,,,0.0,,,,,0.02,0.0,0.0,0.0,0.0,0.0,1.0
25%,4345.5,2011-07-04 00:00:00,,,,6.0,,,,,0.34,0.3333,0.48,0.1045,4.0,34.0,40.0
50%,8690.0,2012-01-02 00:00:00,,,,12.0,,,,,0.5,0.4848,0.63,0.194,17.0,115.0,142.0
75%,13034.5,2012-07-02 00:00:00,,,,18.0,,,,,0.66,0.6212,0.78,0.2537,48.0,220.0,281.0
max,17379.0,2012-12-31 00:00:00,,,,23.0,,,,,1.0,1.0,1.0,0.8507,367.0,886.0,977.0


- Dapat dilihat bahwa jumlah sepeda terbanyak yang pernah disewa dalam perjam adalah 977. Sedangkan yang paling sedikit adalah 1. Ini berarti sepanjang 2011 sampai 2012 dalam setiap jamnya, selalu ada yang menyewa sepeda paling sedikit 1 dalam tiap jamnya.
- Jumlah sepeda terbanyak yang pernah disewa oleh pengguna casual dalam perjam adalah 367. Sedangkan yang paling sedikit adalah 0.
- Jumlah sepeda terbanyak yang pernah disewa oleh pengguna terdaftar dalam perjam adalah 886. Sedangkan yang paling sedikit adalah 0.

In [520]:
hour_df.instant.is_unique

True

In [521]:
hour_df.groupby(by='hr').agg({
    "instant" : "nunique",
    "cnt" : ["max","min","mean","std"],
})

Unnamed: 0_level_0,instant,cnt,cnt,cnt,cnt
Unnamed: 0_level_1,nunique,max,min,mean,std
hr,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
0,726,283,2,53.898072,42.30791
1,724,168,1,33.375691,33.538727
2,715,132,1,22.86993,26.578642
3,697,79,1,11.727403,13.23919
4,697,28,1,6.352941,4.143818
5,717,66,1,19.889819,13.200765
6,725,213,1,76.044138,55.084348
7,727,596,1,212.064649,161.441936
8,727,839,5,359.011004,235.189285
9,727,426,14,219.309491,93.703458


- Bila dilihat dari nilai rata-rata, mulai dari jam 23.00 - 06.00 rata-rata jumlah penyewa sepeda dibawah 100, yang artinya ini sedikit. Kemudian mulai dari jam 07.00 jumlah penyewa sepeda menaik, yaitu dengan rata-rata diatas 100.
- Kemudian, berdasarkan nilai rata-ratanya, jam 17.00 adalah jam yang paling sering mengalami penyewaan sepeda, dengan urutan berikutnya yang mempunyai selisih sedikit adalah pada jam 18.00

In [522]:
hour_df.groupby(by='workingday').agg({
    "instant" : "nunique",
    "cnt" : ["max","min","mean","std"],
})

  hour_df.groupby(by='workingday').agg({


Unnamed: 0_level_0,instant,cnt,cnt,cnt,cnt
Unnamed: 0_level_1,nunique,max,min,mean,std
workingday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
Not Workingday,5514,783,1,181.405332,172.853832
Workingday,11865,977,1,193.207754,185.107477


### Eksplor day_df

In [523]:
day_df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,Winter,2011,1,Not Holiday,Saturday,Not Workingday,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,Winter,2011,1,Not Holiday,Sunday,Not Workingday,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,Winter,2011,1,Not Holiday,Monday,Workingday,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,Winter,2011,1,Not Holiday,Tuesday,Workingday,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,Winter,2011,1,Not Holiday,Wednesday,Workingday,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


In [524]:
day_df.describe(include = 'all')

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,731.0,731,731,731.0,731.0,731,731,731,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0
unique,,,4,2.0,12.0,2,7,2,3.0,,,,,,,
top,,,Summer,2012.0,1.0,Not Holiday,Monday,Workingday,1.0,,,,,,,
freq,,,188,366.0,62.0,710,105,500,463.0,,,,,,,
mean,366.0,2012-01-01 00:00:00,,,,,,,,0.495385,0.474354,0.627894,0.190486,848.176471,3656.172367,4504.348837
min,1.0,2011-01-01 00:00:00,,,,,,,,0.05913,0.07907,0.0,0.022392,2.0,20.0,22.0
25%,183.5,2011-07-02 12:00:00,,,,,,,,0.337083,0.337842,0.52,0.13495,315.5,2497.0,3152.0
50%,366.0,2012-01-01 00:00:00,,,,,,,,0.498333,0.486733,0.626667,0.180975,713.0,3662.0,4548.0
75%,548.5,2012-07-01 12:00:00,,,,,,,,0.655417,0.608602,0.730209,0.233214,1096.0,4776.5,5956.0
max,731.0,2012-12-31 00:00:00,,,,,,,,0.861667,0.840896,0.9725,0.507463,3410.0,6946.0,8714.0


In [525]:
day_df.instant.is_unique

True

In [526]:
day_df.groupby(by='weekday').agg({
    "instant" : "nunique",
    "cnt" : ["max","min","mean","std"]
})

  day_df.groupby(by='weekday').agg({


Unnamed: 0_level_0,instant,cnt,cnt,cnt,cnt
Unnamed: 0_level_1,nunique,max,min,mean,std
weekday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
Friday,104,8362,1167,4690.288462,1874.62487
Monday,105,7525,22,4338.12381,1793.074013
Saturday,105,8714,627,4550.542857,2196.693009
Sunday,105,8227,605,4228.828571,1872.496629
Thursday,104,7804,431,4667.259615,1939.433317
Tuesday,104,7767,683,4510.663462,1826.911642
Wednesday,104,8173,441,4548.538462,2038.095884


In [527]:
day_df.groupby(by='holiday').agg({
    "instant" : "nunique",
    "cnt" : ["max","min","mean","std"]
})

  day_df.groupby(by='holiday').agg({


Unnamed: 0_level_0,instant,cnt,cnt,cnt,cnt
Unnamed: 0_level_1,nunique,max,min,mean,std
holiday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
Holiday,21,7403,1000,3735.0,2103.35066
Not Holiday,710,8714,22,4527.104225,1929.013947


- Orang lebih sedikit menyewa sepeda saat hari libur (tanggal merah) dibandingkan hari biasa.

### Eksplor day_df

In [528]:
season_stats_df = day_df.groupby('season')['cnt'].agg(['mean', 'sum']).reset_index()
season_stats_df.columns = ['season', 'average_demand', 'total_demand']


season_stats_df

  season_stats_df = day_df.groupby('season')['cnt'].agg(['mean', 'sum']).reset_index()


Unnamed: 0,season,average_demand,total_demand
0,Fall,4728.162921,841613
1,Spring,4992.331522,918589
2,Summer,5644.303191,1061129
3,Winter,2604.132597,471348
