# Proyek Analisis Data: Bike Sharing
- **Nama:** Alifia Mustika Sari
- **Email:** alifiamustika02@gmail.com
- **ID Dicoding:** a463xbf048

## Menentukan Pertanyaan Bisnis

Pertama-tama, kita akan mendefinisikan berbagai pertanyaan bisnis yang akan dibutuhkan saat eksplorasi data.
- Pertanyaan 1: Faktor apa yang paling mempengaruhi jumlah peminjaman sepeda?
- Pertanyaan 2: Apa dampak dari kondisi cuaca terhadap jumlah peminjaman sepeda?
- Pertanyaan 3: Bagaimana pola penggunaan sepeda bervariasi antara pengguna kasual dan pengguna terdaftar?
- Pertanyaan 4: Bagaimana tren penggunaan sepeda berubah sepanjang tahun, dan apakah ada musim atau bulan tertentu dengan permintaan yang lebih tinggi?

## Import Semua Packages/Library yang Digunakan

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Data Wrangling

### Gathering Data

In [3]:
# Memuat data day
df_day = pd.read_csv(r'data\day.csv')
df_day.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


In [4]:
# Memuat data hour
df_hour = pd.read_csv(r'data\hour.csv')
df_hour.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


**Insight:**
- Data df_day lebih cocok untuk menganalisis tren jangka panjang, seperti pola mingguan, bulanan, musiman, atau tahunan, karena data sudah terakumulasi per hari.
- Data df_hour lebih sesuai untuk melihat pola penggunaan sepeda pada jam tertentu dalam sehari, memungkinkan analisis terhadap waktu-waktu dengan tingkat penyewaan tertinggi atau terendah.

### Assessing Data

#### Menilai Data df_day

In [None]:
# memeriksa seluruh parameter statistik termasuk kolom non-numerik yang terdapat didalam df_day
df_day.describe(include='all')

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,731.0,731,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0
unique,,731,,,,,,,,,,,,,,
top,,2011-01-01,,,,,,,,,,,,,,
freq,,1,,,,,,,,,,,,,,
mean,366.0,,2.49658,0.500684,6.519836,0.028728,2.997264,0.683995,1.395349,0.495385,0.474354,0.627894,0.190486,848.176471,3656.172367,4504.348837
std,211.165812,,1.110807,0.500342,3.451913,0.167155,2.004787,0.465233,0.544894,0.183051,0.162961,0.142429,0.077498,686.622488,1560.256377,1937.211452
min,1.0,,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.05913,0.07907,0.0,0.022392,2.0,20.0,22.0
25%,183.5,,2.0,0.0,4.0,0.0,1.0,0.0,1.0,0.337083,0.337842,0.52,0.13495,315.5,2497.0,3152.0
50%,366.0,,3.0,1.0,7.0,0.0,3.0,1.0,1.0,0.498333,0.486733,0.626667,0.180975,713.0,3662.0,4548.0
75%,548.5,,3.0,1.0,10.0,0.0,5.0,1.0,2.0,0.655417,0.608602,0.730209,0.233214,1096.0,4776.5,5956.0


In [None]:
# memeriksa tipe data dari tiap kolom yang terdapat didalam df_day
df_day.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     731 non-null    int64  
 1   dteday      731 non-null    object 
 2   season      731 non-null    int64  
 3   yr          731 non-null    int64  
 4   mnth        731 non-null    int64  
 5   holiday     731 non-null    int64  
 6   weekday     731 non-null    int64  
 7   workingday  731 non-null    int64  
 8   weathersit  731 non-null    int64  
 9   temp        731 non-null    float64
 10  atemp       731 non-null    float64
 11  hum         731 non-null    float64
 12  windspeed   731 non-null    float64
 13  casual      731 non-null    int64  
 14  registered  731 non-null    int64  
 15  cnt         731 non-null    int64  
dtypes: float64(4), int64(11), object(1)
memory usage: 91.5+ KB


Berdasarkan data diatas, terdapat 1 tipe data pada kolom dteday yang tidak sesuai dengan data datetime.

In [None]:
# memeriksa keberadaan missing values pada df_day
df_day.isnull().sum()

instant       0
dteday        0
season        0
yr            0
mnth          0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

Table df_day tidak memiliki missing values sehingga **data tersebut dapat dikatakan aman**.

In [None]:
# Memeriksa duplikasi
print("Jumlah Duplikasi:", df_day.duplicated().sum())

Jumlah Duplikasi: 0


Kesimpulannya bahwa table df_day terdapat 0 data yang duplikasi yang berarti **semua data bersifat unik**.

#### Menilai Data df_hour

In [None]:
# memeriksa seluruh parameter statistik termasuk kolom non-numerik yang terdapat pada table df_hour
df_hour.describe(include='all')

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,17379.0,17379,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0
unique,,731,,,,,,,,,,,,,,,
top,,2011-01-01,,,,,,,,,,,,,,,
freq,,24,,,,,,,,,,,,,,,
mean,8690.0,,2.50164,0.502561,6.537775,11.546752,0.02877,3.003683,0.682721,1.425283,0.496987,0.475775,0.627229,0.190098,35.676218,153.786869,189.463088
std,5017.0295,,1.106918,0.500008,3.438776,6.914405,0.167165,2.005771,0.465431,0.639357,0.192556,0.17185,0.19293,0.12234,49.30503,151.357286,181.387599
min,1.0,,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.02,0.0,0.0,0.0,0.0,0.0,1.0
25%,4345.5,,2.0,0.0,4.0,6.0,0.0,1.0,0.0,1.0,0.34,0.3333,0.48,0.1045,4.0,34.0,40.0
50%,8690.0,,3.0,1.0,7.0,12.0,0.0,3.0,1.0,1.0,0.5,0.4848,0.63,0.194,17.0,115.0,142.0
75%,13034.5,,3.0,1.0,10.0,18.0,0.0,5.0,1.0,2.0,0.66,0.6212,0.78,0.2537,48.0,220.0,281.0


In [None]:
# memeriksa tipe data dari tiap kolom yang terdapat pada table df_hour
df_hour.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17379 entries, 0 to 17378
Data columns (total 17 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     17379 non-null  int64  
 1   dteday      17379 non-null  object 
 2   season      17379 non-null  int64  
 3   yr          17379 non-null  int64  
 4   mnth        17379 non-null  int64  
 5   hr          17379 non-null  int64  
 6   holiday     17379 non-null  int64  
 7   weekday     17379 non-null  int64  
 8   workingday  17379 non-null  int64  
 9   weathersit  17379 non-null  int64  
 10  temp        17379 non-null  float64
 11  atemp       17379 non-null  float64
 12  hum         17379 non-null  float64
 13  windspeed   17379 non-null  float64
 14  casual      17379 non-null  int64  
 15  registered  17379 non-null  int64  
 16  cnt         17379 non-null  int64  
dtypes: float64(4), int64(12), object(1)
memory usage: 2.3+ MB


Berdasarkan data diatas, terdapat 1 tipe data pada kolom dteday yang tidak sesuai dengan data datetime.

In [None]:
# memeriksa keberadaan missing values pada table df_day
df_hour.isnull().sum()

instant       0
dteday        0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

Table df_hour tidak memiliki missing values sehingga **data tersebut dapat dikatakan aman**.

In [None]:
# memeriksa duplikasi
print("Jumlah Duplikasi:", df_hour.duplicated().sum())

Jumlah Duplikasi: 0


Kesimpulannya bahwa table df_hour terdapat 0 data yang duplikasi yang berarti **semua data bersifat unik**.

**Insight:**
- Terdapat **kesalahan tipe data** pada kolom dteday yang memiliki tipe data **object**, sebaiknya dikonversi ke **datetime** untuk mempermudah analisis berbasis waktu.
- Terdapat **kesalahan tipe data** pada kolom season, yr, mnth, holiday, workingday, dan weathersit yang bertipe **integer**, sebaiknya dikonversi ke **category** agar lebih efisien dalam penyimpanan dan analisis.
<!-- - Kolom instant tidak diperlukan sehingga bisa diabaikan atau dihapus. -->

### Cleaning Data

#### Membersihkan Data df_day

##### Kesalahan Tipe Data

object to datetime

In [None]:
# mengonversi tipe data pada dteday yang tadinya 'object' menjadi 'datetime'
datetime_columns = ["dteday"]

for column in datetime_columns:
  df_day[column] = pd.to_datetime(df_day[column])

int to category

In [None]:
# mengonversi tipe data pada kolom-kolom categorical dari 'int' menjadi 'category'
categorical_columns = ['season', 'yr', 'mnth', 'holiday', 'weekday', 'workingday', 'weathersit']

for column in categorical_columns:
     df_day[column] = df_day[column].astype('category')

In [None]:
# melihat perubahan tipe data
df_day.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   instant     731 non-null    int64         
 1   dteday      731 non-null    datetime64[ns]
 2   season      731 non-null    category      
 3   yr          731 non-null    category      
 4   mnth        731 non-null    category      
 5   holiday     731 non-null    category      
 6   weekday     731 non-null    category      
 7   workingday  731 non-null    category      
 8   weathersit  731 non-null    category      
 9   temp        731 non-null    float64       
 10  atemp       731 non-null    float64       
 11  hum         731 non-null    float64       
 12  windspeed   731 non-null    float64       
 13  casual      731 non-null    int64         
 14  registered  731 non-null    int64         
 15  cnt         731 non-null    int64         
dtypes: category(7), datetime64

##### Menghindari Kesalahan Interpretasi

mengganti nama kolom agar lebih mudah dibaca

In [None]:
df_day.rename(columns={'yr':'year','mnth':'month','weekday':'one_of_week', 
                       'weathersit':'weather_situation', 'windspeed':'wind_speed',
                       'cnt':'count_cr','hum':'humidity'},inplace=True)

In [16]:
# Musim
season_labels = {1: 'Spring', 2: 'Summer', 3: 'Fall', 4: 'Winter'}
df_day['season'] = df_day['season'].map(season_labels)

# Tahun
year_labels = {0: '2011', 1: '2012'}
df_day['yr'] = df_day['yr'].map(year_labels)

# Bulan
month_labels = {1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr', 5: 'Mei', 6: 'Jun',
                7: 'Jul', 8: 'Agu', 9: 'Sep', 10: 'Okt', 11: 'Nov', 12: 'Des'}
df_day['mnth'] = df_day['mnth'].map(month_labels)

# Hari
weekday_labels = {0: 'Minggu', 1: 'Senin', 2: 'Selasa', 3: 'Rabu',
                  4: 'Kamis', 5: 'Jumat', 6: 'Sabtu'}
df_day['weekday'] = df_day['weekday'].map(weekday_labels)

# Hari Kerja
workingday_labels = {0: 'Libur/Akhir Pekan', 1: 'Hari Kerja'}
df_day['workingday'] = df_day['workingday'].map(workingday_labels)

# Libur
holiday_labels = {0: 'Tidak', 1: 'Ya'}
df_day['holiday'] = df_day['holiday'].map(holiday_labels)

# Kondisi Cuaca
weathersit_labels = {1: 'Cerah', 2: 'Mendung', 3: 'Hujan Ringan', 4: 'Hujan Lebat'}
df_day['weathersit'] = df_day['weathersit'].map(weathersit_labels)

In [17]:
df_day['temp'] = (df_day['temp']) * 41               # Suhu asli dalam derajat C
df_day['atemp'] = (df_day['atemp']) * 50             # Suhu terasa dalam derajat C
df_day['hum'] = (df_day['hum']) * 100                # Kelembaban dalam %
df_day['windspeed'] = (df_day['windspeed']) * 67     # Kecepatan angin dalam km/h

#### Membersihkan Data df_hour

##### Kesalahan Tipe Data

In [18]:
datetime_columns = ["dteday"]

for column in datetime_columns:
  df_hour[column] = pd.to_datetime(df_hour[column])

In [19]:
categorical_columns = ['season', 'yr', 'mnth', 'hr', 'holiday', 'weekday', 'workingday', 'weathersit']

for column in categorical_columns:
     df_hour[column] = df_hour[column].astype('category')

In [20]:
df_hour.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17379 entries, 0 to 17378
Data columns (total 17 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   instant     17379 non-null  int64         
 1   dteday      17379 non-null  datetime64[ns]
 2   season      17379 non-null  category      
 3   yr          17379 non-null  category      
 4   mnth        17379 non-null  category      
 5   hr          17379 non-null  category      
 6   holiday     17379 non-null  category      
 7   weekday     17379 non-null  category      
 8   workingday  17379 non-null  category      
 9   weathersit  17379 non-null  category      
 10  temp        17379 non-null  float64       
 11  atemp       17379 non-null  float64       
 12  hum         17379 non-null  float64       
 13  windspeed   17379 non-null  float64       
 14  casual      17379 non-null  int64         
 15  registered  17379 non-null  int64         
 16  cnt         17379 non-

##### Menghindari Kesalahan Interpretasi

In [21]:
# Musim
season_labels = {1: 'Spring', 2: 'Summer', 3: 'Fall', 4: 'Winter'}
df_hour['season'] = df_hour['season'].map(season_labels)

# Tahun
year_labels = {0: '2011', 1: '2012'}
df_hour['yr'] = df_hour['yr'].map(year_labels)

# Bulan
month_labels = {1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr', 5: 'Mei', 6: 'Jun',
                7: 'Jul', 8: 'Agu', 9: 'Sep', 10: 'Okt', 11: 'Nov', 12: 'Des'}
df_hour['mnth'] = df_hour['mnth'].map(month_labels)

# Hari
weekday_labels = {0: 'Minggu', 1: 'Senin', 2: 'Selasa', 3: 'Rabu',
                  4: 'Kamis', 5: 'Jumat', 6: 'Sabtu'}
df_hour['weekday'] = df_hour['weekday'].map(weekday_labels)

# Hari Kerja
workingday_labels = {0: 'Libur/Akhir Pekan', 1: 'Hari Kerja'}
df_hour['workingday'] = df_hour['workingday'].map(workingday_labels)

# Libur
holiday_labels = {0: 'Tidak', 1: 'Ya'}
df_hour['holiday'] = df_hour['holiday'].map(holiday_labels)

# Kondisi Cuaca
weathersit_labels = {1: 'Cerah', 2: 'Mendung', 3: 'Hujan Ringan', 4: 'Hujan Lebat'}
df_hour['weathersit'] = df_hour['weathersit'].map(weathersit_labels)

In [22]:
df_hour['temp'] = (df_hour['temp']) * 41               # Suhu asli dalam derajat C
df_hour['atemp'] = (df_hour['atemp']) * 50             # Suhu terasa dalam derajat C
df_hour['hum'] = (df_hour['hum']) * 100                # Kelembaban dalam %
df_hour['windspeed'] = (df_hour['windspeed']) * 67     # Kecepatan angin dalam km/h

**Insight:**
- xxx
- xxx

##### Simpan Data

In [23]:
df_day.to_csv(r'dashboard/day_cleaned.csv')
df_hour.to_csv(r'dashboard/hour_cleaned.csv')

## Exploratory Data Analysis (EDA)

### Eksplorasi Data df_day

In [24]:
df_day.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,Spring,2011,Jan,Tidak,Sabtu,Libur/Akhir Pekan,Mendung,14.110847,18.18125,80.5833,10.749882,331,654,985
1,2,2011-01-02,Spring,2011,Jan,Tidak,Minggu,Libur/Akhir Pekan,Mendung,14.902598,17.68695,69.6087,16.652113,131,670,801
2,3,2011-01-03,Spring,2011,Jan,Tidak,Senin,Hari Kerja,Cerah,8.050924,9.47025,43.7273,16.636703,120,1229,1349
3,4,2011-01-04,Spring,2011,Jan,Tidak,Selasa,Hari Kerja,Cerah,8.2,10.6061,59.0435,10.739832,108,1454,1562
4,5,2011-01-05,Spring,2011,Jan,Tidak,Rabu,Hari Kerja,Cerah,9.305237,11.4635,43.6957,12.5223,82,1518,1600


In [25]:
df_hour.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,Spring,2011,Jan,0,Tidak,Sabtu,Libur/Akhir Pekan,Cerah,9.84,14.395,81.0,0.0,3,13,16
1,2,2011-01-01,Spring,2011,Jan,1,Tidak,Sabtu,Libur/Akhir Pekan,Cerah,9.02,13.635,80.0,0.0,8,32,40
2,3,2011-01-01,Spring,2011,Jan,2,Tidak,Sabtu,Libur/Akhir Pekan,Cerah,9.02,13.635,80.0,0.0,5,27,32
3,4,2011-01-01,Spring,2011,Jan,3,Tidak,Sabtu,Libur/Akhir Pekan,Cerah,9.84,14.395,75.0,0.0,3,10,13
4,5,2011-01-01,Spring,2011,Jan,4,Tidak,Sabtu,Libur/Akhir Pekan,Cerah,9.84,14.395,75.0,0.0,0,1,1


In [26]:
df_day.groupby('season')['cnt'].mean().reset_index().sort_values(by='cnt', ascending=False)

  df_day.groupby('season')['cnt'].mean().reset_index().sort_values(by='cnt', ascending=False)


Unnamed: 0,season,cnt
2,Fall,5644.303191
1,Summer,4992.331522
3,Winter,4728.162921
0,Spring,2604.132597


**Insight:**
- xxx
- xxx

## Visualization & Explanatory Analysis

### Pertanyaan 1:

### Pertanyaan 2:

**Insight:**
- xxx
- xxx

## Analisis Lanjutan (Opsional)

## Conclusion

- Conclution pertanyaan 1
- Conclution pertanyaan 2