# Proyek Analisis Data: Bike Sharing Dataset
- **Nama:** Muhammad Faqih Ajiputra
- **Email:** mfaqihajiputra99@gmail.com
- **ID Dicoding:** faqihaji

## Menentukan Pertanyaan Bisnis

- Bagaimana performa bulanan perental di tahun 2011 dan 2012?
- Bagaimana pengaruh status hari terhadap banyaknya rental?

## Import Semua Packages/Library yang Digunakan

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

## Data Wrangling

### Gathering Data

In [None]:
day_df = pd.read_csv('data/day.csv')
hour_df = pd.read_csv('data/hour.csv')

In [None]:
day_df.head()

In [None]:
hour_df.head()

### Assessing Data

In [None]:
day_df.info()

In [None]:
day_df.isna().sum()

In [None]:
print("Duplicated Data: ",  day_df.duplicated().sum())

In [None]:
day_df.describe()

In [None]:
hour_df.info()

In [None]:
hour_df.isna().sum()

In [None]:
print("Duplicated Data: ",  hour_df.duplicated().sum())

### Cleaning Data

Mengubah tipe data kolom 'dteday' menjadi 'datetime64'

In [None]:
datetime_columns = ['dteday']

for column in datetime_columns:
  day_df[column] = pd.to_datetime(day_df[column])

In [None]:
day_df.info()

Mengubah tipe data kolom 'dteday' menjadi 'datetime64'

In [None]:
datetime_columns = ['dteday']

for column in datetime_columns:
  hour_df[column] = pd.to_datetime(hour_df[column])

In [None]:
hour_df.info()

Mengubah kolom 'yr', 'workingday', 'mnth' sesuai dengan Dataset characteristics yang diberikan. Lalu mengubah tipe data kolom 'workingday' dan 'mnth' menjadi 'str'

- yr : year (0: 2011, 1:2012)
- mnth : month ( 1 to 12)
- workingday : if day is neither weekend nor holiday is 1, otherwise is 0

In [None]:
day_df['yr'] = day_df['yr'].replace({0:2011, 1:2012})
day_df['workingday'] = day_df['workingday'].replace({1 : 'Holiday', 0: 'Workingday'})
day_df['mnth'] = day_df['mnth'].replace({
    1: 'January', 2: 'February', 3: 'March', 4: 'April',
    5: 'May', 6: 'June', 7: 'July', 8: 'August',
    9: 'September', 10: 'October', 11: 'November', 12: 'December'
})

day_df['workingday'] = day_df.workingday.astype('str')
day_df['mnth'] = day_df.mnth.astype('str')
day_df.head()

In [None]:
day_df['yr'] = day_df['yr'].replace({0:2011, 1:2012})
hour_df['workingday'] = hour_df['workingday'].replace({1 : 'Holiday', 0: 'Workingday'}) 
hour_df['mnth'] = hour_df['mnth'].replace({
    1: 'January', 2: 'February', 3: 'March', 4: 'April',
    5: 'May', 6: 'June', 7: 'July', 8: 'August',
    9: 'September', 10: 'October', 11: 'November', 12: 'December'
})
hour_df['workingday'] = hour_df.workingday.astype('str')
hour_df['mnth'] = hour_df.mnth.astype('str')
hour_df.head()

In [None]:
day_df.info()

In [None]:
hour_df.info()

## Exploratory Data Analysis (EDA)

### Explore ...

In [None]:
day_df.sample(10)

In [None]:
day_df.describe(include='all')

In [None]:
year_group = day_df.groupby(by='yr').agg({
    'cnt':'sum'
}).reset_index()

year_group.rename(columns={
    'yr': "Year",
    'cnt': "Total"
}, inplace=True)
year_group


In [None]:
monthly_group_df = day_df.groupby(by=['yr', 'mnth']).agg({
    'cnt': 'sum'
}).reset_index()

monthly_group_df.rename(columns={
    'yr': "Year",
    'mnth': "Month",
    'cnt': "Total"
}, inplace=True)

# Convert 'mnth' to Categorical type than sort the  value
month_order = ['January', 'February', 'March', 'April', 'May', 'June', 
               'July', 'August', 'September', 'October', 'November', 'December']
monthly_group_df['Month'] = pd.Categorical(monthly_group_df['Month'], categories=month_order, ordered=True)

monthly_group_df = monthly_group_df.sort_values(by='Month')

monthly_group_df


In [None]:
status_group_df = day_df.groupby(by=['workingday','mnth']).agg({
    'cnt': 'sum'
}).reset_index()

status_group_df.rename(columns={
    'yr': "Year",
    'mnth': "Month",
    'cnt': "Total"
}, inplace=True)

# Convert 'mnth' to Categorical type than sort the  value
month_order = ['January', 'February', 'March', 'April', 'May', 'June', 
               'July', 'August', 'September', 'October', 'November', 'December']
status_group_df['Month'] = pd.Categorical(status_group_df['Month'], categories=month_order, ordered=True)

status_group_df = status_group_df.sort_values(by='Month')

status_group_df

**Insight:**
- Berdasarkan hasil diatas, terjadi peningkatan pada tahun 2012
- Berdasarkan hasil diatas Total perental pada status Holiday lebih banyak daripada pada status workingday

## Visualization & Explanatory Analysis

### Pertanyaan 1:

In [None]:
# Filter for the years 2011 and 2012
grouped_2011 = monthly_group_df[monthly_group_df['Year'] == 2011]
grouped_2012 = monthly_group_df[monthly_group_df['Year'] == 2012]

plt.figure(figsize=(12, 10))

# Plot for 2011
plt.plot(grouped_2011['Month'], grouped_2011['Total'], marker='o', color='blue', label='2011')
# Plot for 2012
plt.plot(grouped_2012['Month'], grouped_2012['Total'], marker='o', color='orange', label='2012')

plt.title("Total Count by Month for 2011 and 2012")
plt.xlabel('Month')
plt.ylabel('Total Count')
plt.xticks(ticks=month_order, labels=month_order)
plt.legend(title='Year')
plt.grid()
plt.show()

### Pertanyaan 2:

In [None]:
pivot_df = status_group_df.pivot(index='Month', columns='workingday', values='Total').fillna(0)

plt.figure(figsize=(16, 10))
pivot_df.plot(kind='bar', width=0.8)

plt.title('Total Count by Month for Working Days and Holidays')
plt.xlabel('Month')
plt.ylabel('Total Count')
plt.xticks(ticks=range(len(pivot_df.index)), labels=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], rotation=0)
plt.legend(title='Working Day', labels=['Holiday', 'Working Day'])
plt.grid(axis='y')


plt.tight_layout()  
plt.show()

**Insight:**
- Pada Juni 2011 dan september 2012 kita dapat mengetahui puncak dari perental pada tiap tahunnya.
- Total perental turun secara signifikan pada hari libur

## Conclusion

- Conclution pertanyaan 1:
Selama 2 tahun, selalu terjadi peningkatan jumlah perental di Q2 dan Q3. Sedangkan pada 4 selalu mengalami penurunan yang signifikan terhadap jumlah perental.
- Conclution pertanyaan 2