<a href="https://colab.research.google.com/github/dwitaciaa/sharebike/blob/main/notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Proyek Analisis Data: Bike Sharing Dataset
- Nama: Anisa Dwita S
- Email: anisadwitas18@gmail.com
- Id Dicoding: anisadw

## Menentukan Pertanyaan Bisnis

- Bagaimana Pengaruh Perubahan Musim terhadap Pengguna Sepeda?
- Jam Berapakah yang Paling Ramai dan Paling Sepi Pengguna Sepeda?

## Menyiapkan semua library yang dibutuhkan

In [12]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Data Wrangling

### Gathering Data

In [14]:
hour_df = pd.read_csv("/content/sharebike/data/hour.csv")
hour_df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


### Assessing Data

#### Menilai tabel `hour_df`

In [15]:
hour_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17379 entries, 0 to 17378
Data columns (total 17 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     17379 non-null  int64  
 1   dteday      17379 non-null  object 
 2   season      17379 non-null  int64  
 3   yr          17379 non-null  int64  
 4   mnth        17379 non-null  int64  
 5   hr          17379 non-null  int64  
 6   holiday     17379 non-null  int64  
 7   weekday     17379 non-null  int64  
 8   workingday  17379 non-null  int64  
 9   weathersit  17379 non-null  int64  
 10  temp        17379 non-null  float64
 11  atemp       17379 non-null  float64
 12  hum         17379 non-null  float64
 13  windspeed   17379 non-null  float64
 14  casual      17379 non-null  int64  
 15  registered  17379 non-null  int64  
 16  cnt         17379 non-null  int64  
dtypes: float64(4), int64(12), object(1)
memory usage: 2.3+ MB


In [16]:
hour_df.isna().sum()

instant       0
dteday        0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

In [None]:
print("Jumlah Duplikasi:", hour_df.duplicated().sum())

In [None]:
hour_df.describe()

### Cleaning Data

#### Memperbaiki Tipe Data `hour_df`

In [None]:
datetime_columns = ["dteday"]
for column in datetime_columns:
  hour_df[column] = pd.to_datetime(hour_df[column])

In [None]:
hour_df.info()

In [None]:
hour_df.head()

## Exploratory Data Analysis (EDA)

### Explore `hour_df`

In [None]:
hour_df.sample(5)

In [None]:
hour_df.describe(include="all")

In [None]:
hour_df.groupby(by="season").agg({
    "casual": "sum",
    "registered": "sum",
    "cnt": "sum"
})

In [None]:
hour_df.groupby(by="hr").agg({
    "casual": "sum",
    "registered": "sum",
    "cnt": "sum"
})

In [None]:
hour_df["season"] = hour_df["season"].map({
    1: "Spring",
    2: "Summer",
    3: "Fall",
    4: "Winter"
})

In [None]:
hour_df.head()

In [None]:
plt.figure(figsize=(16,6))
heatmap = sns.heatmap(hour_df.corr(), vmin=-1, vmax=1, annot=True)
heatmap.set_title("Correlation Heatmap", fontdict={'fontsize':12}, pad=12);

## Visualization & Explanatory Analysis

### Bagaimana Pengaruh Perubahan Musim terhadap Pengguna Sepeda?

In [None]:
byseason_df = hour_df.groupby('season')[['registered', 'casual']].sum().reset_index()

In [None]:
plt.figure(figsize=(10,5))

plt.bar(byseason_df['season'],byseason_df['registered'],label="Registered",color='#FFFF00')
plt.bar(byseason_df['season'],byseason_df['casual'],label="Casual",color='#FF0000')

plt.title('Count of Rental Bikes by Season')
plt.ylabel("Count of Rental Bikes")
plt.xlabel("Season")
plt.legend()
plt.show()


### Jam Berapakah yang Paling Ramai dan Paling Sepi Pengguna Sepeda?

In [None]:
byhour_df = hour_df.groupby('hr')[['registered', 'casual']].sum().reset_index()

In [None]:
plt.figure(figsize=(10,5))

sns.lineplot(
    x="hr",
    y="registered",
    data=byhour_df,
    label="Registered",
    marker='o', linewidth=2,
    color='#0000FF'),

sns.lineplot(
    x="hr",
    y="casual",
    data=byhour_df,
    label="Casual",
    marker='o', linewidth=2,
    color='#FF0000'),

plt.title('Count of Rental Bikes by Hour')
plt.ylabel("Count of Rental Bikes")
plt.xlabel("Hour")
plt.legend()
plt.show()

In [None]:
hour_df.to_csv("bike_hour.csv", index=False)

## Conclusion

### Bagaimana Pengaruh Perubahan Musim terhadap Pengguna Sepeda?



 Berdasarkan visualisasi di atas, kita dapat melihat bahwa perubahan musim berpengaruh terhadap pengguna sepeda. Jumlah pengguna sepeda terbanyak terjadi pada Season Fall. Selain itu, kita juga dapat melihat perbedaan jumlah pengguna sepeda pada Season Spring cenderung lebih rendah dibandingkan Season lainnya.

### Jam Berapakah yang Paling Ramai dan Paling Sepi Pengguna Sepeda?

Dari visualisasi data di atas, dapat kita ketahui bahwa jam yang paling ramai diminati pengguna sepeda pada jam 17 PM. Kontras dengan hal tersebut, pengguna sepeda paling sepi pada jam 4 AM.