# Proyek Analisis Data: Bike Sharing Dataset
- **Nama:** Adinda Salsabila
- **Email:** chacastastaria@gmail.com
- **ID Dicoding:** adinda_salsabila

## Menentukan Pertanyaan Bisnis

- How does weather impact bike rental demand?
- What is the best time of day and season to promote bike rentals?
- How do holidays and weekends affect bike rentals?

## Import Semua Packages/Library yang Digunakan

In [50]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Data Wrangling

### Gathering Data

# Mengakses data harian dan nunjukin tuh makan tuh

In [51]:
data_day = pd.read_csv('data/day.csv', sep=',', header=0, index_col=0)
print(data_day.head())


             dteday  season  yr  mnth  holiday  weekday  workingday  \
instant                                                               
1        2011-01-01       1   0     1        0        6           0   
2        2011-01-02       1   0     1        0        0           0   
3        2011-01-03       1   0     1        0        1           1   
4        2011-01-04       1   0     1        0        2           1   
5        2011-01-05       1   0     1        0        3           1   

         weathersit      temp     atemp       hum  windspeed  casual  \
instant                                                                
1                 2  0.344167  0.363625  0.805833   0.160446     331   
2                 2  0.363478  0.353739  0.696087   0.248539     131   
3                 1  0.196364  0.189405  0.437273   0.248309     120   
4                 1  0.200000  0.212122  0.590435   0.160296     108   
5                 1  0.226957  0.229270  0.436957   0.186900      82  

**Insight:**
- Ya kita tu ngebuat dataframe dari file csv                   

# data hourly cuyy

In [52]:
data_hourly = pd.read_csv('data/hour.csv')
print(data_hourly.head())

   instant      dteday  season  yr  mnth  hr  holiday  weekday  workingday  \
0        1  2011-01-01       1   0     1   0        0        6           0   
1        2  2011-01-01       1   0     1   1        0        6           0   
2        3  2011-01-01       1   0     1   2        0        6           0   
3        4  2011-01-01       1   0     1   3        0        6           0   
4        5  2011-01-01       1   0     1   4        0        6           0   

   weathersit  temp   atemp   hum  windspeed  casual  registered  cnt  
0           1  0.24  0.2879  0.81        0.0       3          13   16  
1           1  0.22  0.2727  0.80        0.0       8          32   40  
2           1  0.22  0.2727  0.80        0.0       5          27   32  
3           1  0.24  0.2879  0.75        0.0       3          10   13  
4           1  0.24  0.2879  0.75        0.0       0           1    1  


### Assessing Data

In [53]:
print("Data untuk hari: ")
data_day.isnull().sum()


Data untuk hari: 


dteday        0
season        0
yr            0
mnth          0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

In [54]:
data_types = data_day.dtypes
print("\nData Types:\n", data_types)




Data Types:
 dteday         object
season          int64
yr              int64
mnth            int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
dtype: object


In [55]:
data_types = data_hourly.dtypes
print("\nData Types:\n", data_types)


Data Types:
 instant         int64
dteday         object
season          int64
yr              int64
mnth            int64
hr              int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
dtype: object


In [56]:
# Check the shape of the data
print(f"Data Shape: {data_day.shape}")

# Check for missing values
missing_values = data_day.isnull().sum()
print("Missing Values:\n", missing_values)

# Check data types
print("Data Types:\n", data_day.dtypes)

Data Shape: (731, 15)
Missing Values:
 dteday        0
season        0
yr            0
mnth          0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64
Data Types:
 dteday         object
season          int64
yr              int64
mnth            int64
holiday         int64
weekday         int64
workingday      int64
weathersit      int64
temp          float64
atemp         float64
hum           float64
windspeed     float64
casual          int64
registered      int64
cnt             int64
dtype: object


**Insight:**
- xxx
- xxx

### Cleaning Data

In [57]:
# Step 4: Convert 'dteday' to Datetime
data_day['dteday'] = pd.to_datetime(data_day['dteday'])
print("Data Types after conversion:\n", data_day.dtypes)

Data Types after conversion:
 dteday        datetime64[ns]
season                 int64
yr                     int64
mnth                   int64
holiday                int64
weekday                int64
workingday             int64
weathersit             int64
temp                 float64
atemp                float64
hum                  float64
windspeed            float64
casual                 int64
registered             int64
cnt                    int64
dtype: object


In [58]:
# Step 5: Convert Categorical Variables
categorical_cols = ['season', 'holiday', 'weekday', 'workingday', 'weathersit']
data_day[categorical_cols] = data_day[categorical_cols].astype('category')


In [59]:
# Step 6: Check for Outliers
def detect_outliers_iqr(df, column):
    Q1 = df[column].quantile(0.25)
    Q3 = df[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    return df[(df[column] < lower_bound) | (df[column] > upper_bound)]

In [60]:
# Example of detecting outliers in 'cnt'
outliers_cnt = detect_outliers_iqr(data_day, 'cnt')
print("Outliers in 'cnt':\n", outliers_cnt)

Outliers in 'cnt':
 Empty DataFrame
Columns: [dteday, season, yr, mnth, holiday, weekday, workingday, weathersit, temp, atemp, hum, windspeed, casual, registered, cnt]
Index: []


In [61]:
# Step 7: Drop Irrelevant Columns (if needed)
# For example, drop the 'yr' column
# data_day.drop(columns=['yr'], inplace=True)

In [62]:
# Step 8: Final Check of Cleaned Data
print("Cleaned Data Sample:\n", data_day.head())
print("Data Types after cleaning:\n", data_day.dtypes)

Cleaned Data Sample:
             dteday season  yr  mnth holiday weekday workingday weathersit  \
instant                                                                     
1       2011-01-01      1   0     1       0       6          0          2   
2       2011-01-02      1   0     1       0       0          0          2   
3       2011-01-03      1   0     1       0       1          1          1   
4       2011-01-04      1   0     1       0       2          1          1   
5       2011-01-05      1   0     1       0       3          1          1   

             temp     atemp       hum  windspeed  casual  registered   cnt  
instant                                                                     
1        0.344167  0.363625  0.805833   0.160446     331         654   985  
2        0.363478  0.353739  0.696087   0.248539     131         670   801  
3        0.196364  0.189405  0.437273   0.248309     120        1229  1349  
4        0.200000  0.212122  0.590435   0.160296     

**Insight:**
- xxx
- xxx

## Exploratory Data Analysis (EDA)

### Explore ...

* Summary Statistics

In [63]:
# Display summary statistics
data_day.describe()

Unnamed: 0,yr,mnth,temp,atemp,hum,windspeed,casual,registered,cnt
count,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0
mean,0.500684,6.519836,0.495385,0.474354,0.627894,0.190486,848.176471,3656.172367,4504.348837
std,0.500342,3.451913,0.183051,0.162961,0.142429,0.077498,686.622488,1560.256377,1937.211452
min,0.0,1.0,0.05913,0.07907,0.0,0.022392,2.0,20.0,22.0
25%,0.0,4.0,0.337083,0.337842,0.52,0.13495,315.5,2497.0,3152.0
50%,1.0,7.0,0.498333,0.486733,0.626667,0.180975,713.0,3662.0,4548.0
75%,1.0,10.0,0.655417,0.608602,0.730209,0.233214,1096.0,4776.5,5956.0
max,1.0,12.0,0.861667,0.840896,0.9725,0.507463,3410.0,6946.0,8714.0


In [64]:
data_hourly.describe()

Unnamed: 0,instant,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0
mean,8690.0,2.50164,0.502561,6.537775,11.546752,0.02877,3.003683,0.682721,1.425283,0.496987,0.475775,0.627229,0.190098,35.676218,153.786869,189.463088
std,5017.0295,1.106918,0.500008,3.438776,6.914405,0.167165,2.005771,0.465431,0.639357,0.192556,0.17185,0.19293,0.12234,49.30503,151.357286,181.387599
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.02,0.0,0.0,0.0,0.0,0.0,1.0
25%,4345.5,2.0,0.0,4.0,6.0,0.0,1.0,0.0,1.0,0.34,0.3333,0.48,0.1045,4.0,34.0,40.0
50%,8690.0,3.0,1.0,7.0,12.0,0.0,3.0,1.0,1.0,0.5,0.4848,0.63,0.194,17.0,115.0,142.0
75%,13034.5,3.0,1.0,10.0,18.0,0.0,5.0,1.0,2.0,0.66,0.6212,0.78,0.2537,48.0,220.0,281.0
max,17379.0,4.0,1.0,12.0,23.0,1.0,6.0,1.0,4.0,1.0,1.0,1.0,0.8507,367.0,886.0,977.0


**Insight:**
- xxx
- xxx

Correlation Analysis

# ***********************************************************
# RENTAL DEMAND WEATHER WEATHER IMPACT
# ***********************************************************

In [65]:
#yang ini buat data basicnya
rental_demand_daily=pd.read_csv('data/day.csv')
rental_demand_hourly=pd.read_csv('data/hour.csv')


In [66]:
#find the relevant columns
weather_data = data_day[['weathersit','temp','hum','windspeed', 'cnt']]

In [71]:
#group / aggregate relevant weather factors weathersit, temp, humidity and windspeed
weather_agg = weather_data.groupby(['weathersit', 'temp', 'hum', 'windspeed']).agg({'cnt' : 'sum'})

MemoryError: Unable to allocate 4.31 GiB for an array with shape (578964750,) and data type int64

# ***********************************************************
# BEST DAY AND SEASON TO PROMOTE BIKE RENTALS
# ***********************************************************

In [67]:
time_season_data = data_hourly[['hr', 'weekday', 'season', 'cnt']]

# ***********************************************************
# HOW HOLIDAYS N WEEKENDS AFFECT BIKE RENTALS
# ***********************************************************

In [68]:
holiday_weekend_data = data_day[['holiday','weekday','workingday', 'cnt']]

## Visualization & Explanatory Analysis

### Pertanyaan 1:

### Pertanyaan 2:

**Insight:**
- xxx
- xxx

## Analisis Lanjutan (Opsional)

## Conclusion

- Conclution pertanyaan 1
- Conclution pertanyaan 2