# Proyek Analisis Data: [Bike Sharing Dataset]
- **Nama:** [Diva Putra Almeyda]
- **Email:** [divaalmeida99@gmail.com]
- **ID Dicoding:** [mintopico]

=========================================
Dataset characteristics
=========================================	
Both hour.csv and day.csv have the following fields, except hr which is not available in day.csv
	
	- instant: record index
	- dteday : date
	- season : season (1:springer, 2:summer, 3:fall, 4:winter)
	- yr : year (0: 2011, 1:2012)
	- mnth : month ( 1 to 12)
	- hr : hour (0 to 23)
	- holiday : weather day is holiday or not (extracted from http://dchr.dc.gov/page/holiday-schedule)
	- weekday : day of the week
	- workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
	+ weathersit : 
		- 1: Clear, Few clouds, Partly cloudy, Partly cloudy
		- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
		- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
		- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
	- temp : Normalized temperature in Celsius. The values are divided to 41 (max)
	- atemp: Normalized feeling temperature in Celsius. The values are divided to 50 (max)
	- hum: Normalized humidity. The values are divided to 100 (max)
	- windspeed: Normalized wind speed. The values are divided to 67 (max)
	- casual: count of casual users
	- registered: count of registered users
	- cnt: count of total rental bikes including both casual and registered


## Menentukan Pertanyaan Bisnis

- Pada musim apa penyewaan sepeda yang paling banyak dan paling sedikit?
- Apakah ada perbedaan jumlah penyewa sepeda ketika hari libur dan tidak?
- Bagaimana trend penyewaan sepeda dari tahun 2011 - 2012 per bulannya?
- Bagaimana pengaruh situasi cuaca (weathersit) terhadap jumlah penyewa sepeda baik yang terdaftar maupun tidak?
- Berapa jumlah penyewa yang sudah menjadi member (registered) dan berapa yang belum (casual)?

## Import Semua Packages/Library yang Digunakan

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Data Wrangling

### Gathering Data

* Load datasets

In [2]:
# load day dataset
day_df = pd.read_csv('data/day.csv')
# load hour dataset
hour_df = pd.read_csv('data/hour.csv')

day_df.head(3)

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349


In [3]:
hour_df.head(3)

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32


In [4]:
del hour_df

**Notes:**
- Pertanyaan Bisnis yang saya susun hanya memerlukan dataset day.csv, karena data yang ada pada dataset tersebut sudah dapat menjawab pertanyaan-pertanyaan saya, sedangkan data rincian sewa sepeda per jam tidak digunakan. Oleh karena itu, saya hanya akan menggunakan dataset day.csv

### Assessing Data

In [5]:
# Cek tipe data dan missing value day_df
day_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     731 non-null    int64  
 1   dteday      731 non-null    object 
 2   season      731 non-null    int64  
 3   yr          731 non-null    int64  
 4   mnth        731 non-null    int64  
 5   holiday     731 non-null    int64  
 6   weekday     731 non-null    int64  
 7   workingday  731 non-null    int64  
 8   weathersit  731 non-null    int64  
 9   temp        731 non-null    float64
 10  atemp       731 non-null    float64
 11  hum         731 non-null    float64
 12  windspeed   731 non-null    float64
 13  casual      731 non-null    int64  
 14  registered  731 non-null    int64  
 15  cnt         731 non-null    int64  
dtypes: float64(4), int64(11), object(1)
memory usage: 91.5+ KB


In [6]:
# Cek duplikasi data
print('Jumlah data duplikat:', day_df.duplicated().sum())

Jumlah data duplikat: 0


In [7]:
# Cek statistik data
day_df.describe()

Unnamed: 0,instant,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0
mean,366.0,2.49658,0.500684,6.519836,0.028728,2.997264,0.683995,1.395349,0.495385,0.474354,0.627894,0.190486,848.176471,3656.172367,4504.348837
std,211.165812,1.110807,0.500342,3.451913,0.167155,2.004787,0.465233,0.544894,0.183051,0.162961,0.142429,0.077498,686.622488,1560.256377,1937.211452
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.05913,0.07907,0.0,0.022392,2.0,20.0,22.0
25%,183.5,2.0,0.0,4.0,0.0,1.0,0.0,1.0,0.337083,0.337842,0.52,0.13495,315.5,2497.0,3152.0
50%,366.0,3.0,1.0,7.0,0.0,3.0,1.0,1.0,0.498333,0.486733,0.626667,0.180975,713.0,3662.0,4548.0
75%,548.5,3.0,1.0,10.0,0.0,5.0,1.0,2.0,0.655417,0.608602,0.730209,0.233214,1096.0,4776.5,5956.0
max,731.0,4.0,1.0,12.0,1.0,6.0,1.0,3.0,0.861667,0.840896,0.9725,0.507463,3410.0,6946.0,8714.0


**Notes:**
- Tipe data kolom **dteday** perlu diubah menjadi **datetime**
- Tidak ada **missing value** & **data duplikat** pada dataset

In [8]:
# Transformasi data
day_df['dteday'] = pd.to_datetime(day_df['dteday'])

day_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   instant     731 non-null    int64         
 1   dteday      731 non-null    datetime64[ns]
 2   season      731 non-null    int64         
 3   yr          731 non-null    int64         
 4   mnth        731 non-null    int64         
 5   holiday     731 non-null    int64         
 6   weekday     731 non-null    int64         
 7   workingday  731 non-null    int64         
 8   weathersit  731 non-null    int64         
 9   temp        731 non-null    float64       
 10  atemp       731 non-null    float64       
 11  hum         731 non-null    float64       
 12  windspeed   731 non-null    float64       
 13  casual      731 non-null    int64         
 14  registered  731 non-null    int64         
 15  cnt         731 non-null    int64         
dtypes: datetime64[ns](1), floa

### Cleaning Data

Perbaiki nama kolom agar lebih mudah dipahami

In [9]:
day_df.rename(columns={
    'instan': 'instant',
    'dteday': 'date',
    'season': 'season',
    'yr' : 'year',
    'mnth': 'month',
    'holiday': 'is_holiday',
    'weekday': 'day_of_week',
    'workingday': 'is_workingday',
    'weathersit': 'weather_situation',
    'temp': 'temperature',
    'atemp': 'feels_temperature',
    'hum': 'humidity',
    'windspeed': 'wind_speed',
    'casual': 'casual',
    'registered': 'registered',
    'cnt': 'total_rented'

} , inplace=True)

day_df.head(3)

Unnamed: 0,instant,date,season,year,month,is_holiday,day_of_week,is_workingday,weather_situation,temperature,feels_temperature,humidity,wind_speed,casual,registered,total_rented
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349


Mengubah data angka menjadi keterangan sesuai pada karakterisktik dataset day.csv untuk mempermudah memahami dataset

In [10]:
# Ubah data 'season'
day_df['season'] = day_df['season'].map({
    1: 'Spring',
    2: 'Summer',
    3: 'Fall',
    4: 'Winter'
})

# Ubah data 'year'
day_df['year'] = day_df['year'].map({
    0: 2011,
    1: 2012
})

# Ubah data 'month'
day_df['month'] = day_df['month'].map({
    1: 'Januari',
    2: 'Februari',
    3: 'Maret',
    4: 'April',
    5: 'Mei',
    6: 'Juni',
    7: 'Juli',
    8: 'Agustus',
    9: 'September',
    10: 'Oktober',
    11: 'November',
    12: 'Desember'
})

# Ubah data 'day_of_week'
day_df['day_of_week'] = day_df['day_of_week'].map({
    0: 'Sunday',
    1: 'Monday',
    2: 'Tuesday',
    3: 'Wednesday',
    4: 'Thursday',
    5: 'Friday',
    6: 'Saturday'
})

# Ubah data 'weather_situation'
day_df['weather_situation'] = day_df['weather_situation'].map({
    1: 'Clear / Few clouds',
    2: 'Mist / Cloudy',
    3: 'Light Rain / Snow',
    4: 'Heavy Rain / Snow'
})

In [11]:
day_df.head(3)

Unnamed: 0,instant,date,season,year,month,is_holiday,day_of_week,is_workingday,weather_situation,temperature,feels_temperature,humidity,wind_speed,casual,registered,total_rented
0,1,2011-01-01,Spring,2011,Januari,0,Saturday,0,Mist / Cloudy,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,Spring,2011,Januari,0,Sunday,0,Mist / Cloudy,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,Spring,2011,Januari,0,Monday,1,Clear / Few clouds,0.196364,0.189405,0.437273,0.248309,120,1229,1349


In [12]:
# Ubah tipe data
day_df['season'] = day_df.season.astype('category')
day_df['month'] = day_df.month.astype('category')
day_df['day_of_week'] = day_df.day_of_week.astype('category')
day_df['weather_situation'] = day_df.weather_situation.astype('category')

day_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 16 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   instant            731 non-null    int64         
 1   date               731 non-null    datetime64[ns]
 2   season             731 non-null    category      
 3   year               731 non-null    int64         
 4   month              731 non-null    category      
 5   is_holiday         731 non-null    int64         
 6   day_of_week        731 non-null    category      
 7   is_workingday      731 non-null    int64         
 8   weather_situation  731 non-null    category      
 9   temperature        731 non-null    float64       
 10  feels_temperature  731 non-null    float64       
 11  humidity           731 non-null    float64       
 12  wind_speed         731 non-null    float64       
 13  casual             731 non-null    int64         
 14  registered

#### Seleksi kolom yang digunakan untuk menjawab **Pertanyaan Bisnis**

**Kolom yang tidak digunakan:**
- **day_of_week**: keterangan hari tidak akan menjawab pertanyaan bisnis
- **is_workingday**: tidak digunakan karena sudah diwakili oleh kolom is_holiday
- **temperature**, **feels_ temperature**, **humidity**, **wind_speed**: tidak bisa menjawab pertanyaan bisnis

In [13]:
bike_rent_df = day_df[['instant', 'date', 'season', 'year', 'month', 'is_holiday', 'weather_situation', 'casual', 'registered', 'total_rented']]

bike_rent_df.head(3)

Unnamed: 0,instant,date,season,year,month,is_holiday,weather_situation,casual,registered,total_rented
0,1,2011-01-01,Spring,2011,Januari,0,Mist / Cloudy,331,654,985
1,2,2011-01-02,Spring,2011,Januari,0,Mist / Cloudy,131,670,801
2,3,2011-01-03,Spring,2011,Januari,0,Clear / Few clouds,120,1229,1349


## Exploratory Data Analysis (EDA)

### Explore "rent_bike_df"

**Insight:**
- xxx
- xxx

## Visualization & Explanatory Analysis

### Pertanyaan 1:

### Pertanyaan 2:

**Insight:**
- xxx
- xxx

## Analisis Lanjutan (Opsional)

## Conclusion

- Conclution pertanyaan 1
- Conclution pertanyaan 2