# Proyek Analisis Data: Bike Sharing Dataset
- **Nama:** Safira Dyah Khairunisa
- **Email:** safira.dyh@gmail.com
- **ID Dicoding:** safiradyh

## Menentukan Pertanyaan Bisnis

- Pertanyaan 1 \\
How does the weather affect bike rental?
- Pertanyaan 2 \\
How does the performance of bike rental in the last few months?

## Import Semua Packages/Library yang Digunakan

In [None]:
#import semua library yang dibutuhkan
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
#install streamlit
pip install streamlit

In [None]:
%%writefile app.py

In [None]:
! wget -q -O - ipv4.icanhazip.com

In [None]:
#url streamlit yang akan digunakan
! streamlit run app.py & npx localtunnel --port 8501

[##................] - fetchMetadata: sill resolveWithNewModule ms@2.1.2 checki[0m[K
Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://34.106.202.6:8501[0m
[0m
[K[?25hnpx: installed 22 in 2.785s
your url is: https://khaki-masks-dance.loca.lt
2024-03-04 23:56:42.860 Uncaught app exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/streamlit/runtime/scriptrunner/script_runner.py", line 535, in _run_script
    exec(code, module.__dict__)
  File "/content/app.py", line 41, in <module>
    bike_df = pd.read_csv("bike_df.csv")
  File "/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py", line 331, i

## Data Wrangling

### Gathering Data

In [None]:
#menampilkan data by day
day_df = pd.read_csv("day.csv", delimiter=",")
day_df.head()

In [None]:
#menampilkan data by hour
hour_df = pd.read_csv("hour.csv", delimiter=",")
hour_df.head()

In [None]:
#merging data
bike_df = pd.merge(
    left=day_df,
    right=hour_df,
    how="left",
    suffixes=('_daily','_hourly'),
    left_on="dteday",
    right_on="dteday"
)

bike_df.head()

In [None]:
bike_df.cnt_daily

Karena **cnt_daily** mengisi kolom kosong dengan value yang sama dengan tanggalnya. Jadi, untuk perhitungan jumlah peminjaman, akan digunakan **cnt_hourly** untuk menghindari kesalahan perhitungan.

### Assessing Data

In [None]:
bike_df.info()

Terdapat kesalahan jenis data pada dteday yang seharusnya datetime

In [None]:
bike_df.isnull().sum()

In [None]:
bike_df.describe()

### Cleaning Data

In [None]:
#mengubah tipe data dteday

datetime_columns = ["dteday"]

for column in datetime_columns:
    bike_df[column] = pd.to_datetime(bike_df[column])  #untuk mengubah tipe data ke datetime

In [None]:
bike_df.info()

## Exploratory Data Analysis (EDA)

### Explore ...

In [None]:
bike_df.groupby(by="dteday").agg({
    "casual_hourly" : "sum" ,
    "registered_hourly" : "sum",
    "cnt_hourly" : "sum"}).head()
#sum hourly artinya total jam sehari (daily)

In [None]:
bike_df.groupby(by="dteday").agg({
    "cnt_hourly" : "sum"}).head()

In [None]:
bike_df.groupby(by="weathersit_hourly").agg({
    "cnt_hourly" : "sum"
}).head()

## Visualization & Explanatory Analysis

### Pertanyaan 1:

# How does the weather affect bike rental?

In [None]:
byweather_df = bike_df.groupby(by="weathersit_hourly").cnt_hourly.sum().reset_index()
byweather_df.weathersit_hourly.replace(1, "Clear", inplace=True)
byweather_df.weathersit_hourly.replace(2, "Cloudy", inplace=True)
byweather_df.weathersit_hourly.replace(3, "Light rain", inplace=True)
byweather_df.weathersit_hourly.replace(4, "Heavy rain", inplace=True)

byweather_df.rename(columns={
    "weathersit_hourly": "Weather",
    "cnt_hourly": "Total_user"
}, inplace=True)

byweather_df

In [None]:
plt.figure(figsize=(8, 4))

sns.barplot(
    y="Weather",
    x="Total_user",
    data=byweather_df.sort_values(by="Total_user", ascending=False)
)
plt.title("Total Bike Rental by Weather", loc="center", fontsize=15)
plt.xlabel("Average sharing")
plt.ylabel("Weather")

### Pertanyaan 2:

#How does the performance of sharing bike rental in the last few months?

In [None]:
monthly_sharing_df = bike_df.resample(rule='M', on='dteday').agg({
    "casual_hourly": "sum",
    "registered_hourly": "sum",
    "cnt_hourly": "sum"
})

monthly_sharing_df.index = monthly_sharing_df.index.strftime('%Y-%m')
monthly_sharing_df = monthly_sharing_df.reset_index()

monthly_sharing_df.rename(columns={
    "dteday": "date_sharing",
    "casual_hourly": "casual_user",
    "registered_hourly": "registered_user",
    "cnt_hourly": "total_user"
}, inplace=True)

monthly_sharing_df

In [None]:
plt.figure(figsize=(20, 6))
plt.plot(monthly_sharing_df["date_sharing"], monthly_sharing_df["casual_user"], marker='o', linewidth=2, color="#77BBAA", label="casual user")
plt.plot(monthly_sharing_df["date_sharing"], monthly_sharing_df["registered_user"], marker='o', linewidth=2, color="#3366BB", label="registered user")
plt.plot(monthly_sharing_df["date_sharing"], monthly_sharing_df["total_user"], marker='o', linewidth=2, color="#FF6633", label="total user")
plt.title("Number of Bike Sharing per Month (2011-2012)", loc="center", fontsize=20)
plt.xticks(fontsize=10)
plt.yticks(fontsize=10)
plt.legend()
plt.show()

In [None]:
bike_df.to_csv("bike_df.csv", index=False)

## Conclusion

- Conclusion pertanyaan 1 \\
How does the weather affect bike rental? \\
The most high demand of rent bike occurs when the weather is sunny and the least occurs when it rains heavily.
- Conclusion pertanyaan 2 \\
How does the performance of bike rental in the last few months? \\
As in the line chart above, the rental bike tends to increase. For casual users, the trend is relatively stagnant. Meanwhile, registered users increase relatively.

