## Bike-Sharing Statistical Analysis


**Project Overview**

**Purpose:**
The purpose of this statistical analysis is to assist in developing a proposal for a new bike-sharing program. To achieve this goal, an analysis was conducted on the NYC CitiBike business data for the month of August to establish baseline information. This baseline data will provide insights into the operational aspects and user behavior of a successful bike-sharing program, which can be used to inform the development of the new program..

**Resources:**
- **Data Source:** Citi Bike Data, 201908-citibike-tripdata.csv.zip
- **Software:**
  - Python 3.11.1
  - Anaconda Navigator 2022.10
  - Jupyter Notebook 6.5.2
  - Tableau Public 2022.1

In [2]:
import pandas as pd
import os
import datetime

In [3]:
# 1. Create a DataFrame for the 201908-citibike-tripdata data. 
citibike_data = os.path.join('201908-citibike-tripdata.csv')
citibike_df = pd.read_csv(citibike_data)

In [4]:
# 2. Check the datatypes of your columns. 
citibike_df.dtypes

tripduration                 int64
starttime                   object
stoptime                    object
start station id           float64
start station name          object
start station latitude     float64
start station longitude    float64
end station id             float64
end station name            object
end station latitude       float64
end station longitude      float64
bikeid                       int64
usertype                    object
birth year                   int64
gender                       int64
dtype: object

In [5]:
tripduration = citibike_df["tripduration"]
print(tripduration)

0           393
1           627
2          1132
3          1780
4          1517
           ... 
2344219     216
2344220     117
2344221    1614
2344222    1301
2344223     419
Name: tripduration, Length: 2344224, dtype: int64


In [6]:
# 3. Convert the 'tripduration' column to datetime datatype.
from datetime import datetime

citibike_df["tripduration"] = pd.to_datetime(citibike_df["tripduration"], unit='s')

In [7]:
# 4. Check the datatypes of your columns. 
citibike_df.dtypes

tripduration               datetime64[ns]
starttime                          object
stoptime                           object
start station id                  float64
start station name                 object
start station latitude            float64
start station longitude           float64
end station id                    float64
end station name                   object
end station latitude              float64
end station longitude             float64
bikeid                              int64
usertype                           object
birth year                          int64
gender                              int64
dtype: object

In [8]:
citibike_df.head(5)

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,end station longitude,bikeid,usertype,birth year,gender
0,1970-01-01 00:06:33,2019-08-01 00:00:01.4680,2019-08-01 00:06:35.3780,531.0,Forsyth St & Broome St,40.718939,-73.992663,408.0,Market St & Cherry St,40.710762,-73.994004,35305,Subscriber,1996,2
1,1970-01-01 00:10:27,2019-08-01 00:00:01.9290,2019-08-01 00:10:29.7840,274.0,Lafayette Ave & Fort Greene Pl,40.686919,-73.976682,3409.0,Bergen St & Smith St,40.686744,-73.990632,38822,Subscriber,1998,2
2,1970-01-01 00:18:52,2019-08-01 00:00:04.0480,2019-08-01 00:18:56.1650,2000.0,Front St & Washington St,40.702551,-73.989402,3388.0,President St & Henry St,40.6828,-73.999904,18373,Subscriber,1988,1
3,1970-01-01 00:29:40,2019-08-01 00:00:04.1630,2019-08-01 00:29:44.7940,479.0,9 Ave & W 45 St,40.760193,-73.991255,473.0,Rivington St & Chrystie St,40.721101,-73.991925,25002,Subscriber,1988,1
4,1970-01-01 00:25:17,2019-08-01 00:00:05.4580,2019-08-01 00:25:23.4550,3312.0,1 Ave & E 94 St,40.781721,-73.94594,3312.0,1 Ave & E 94 St,40.781721,-73.94594,31198,Subscriber,1965,2


In [None]:
# 5. Export the Dataframe as a new CSV file without the index.
citibike_df.to_csv('201908-citibike-tripdataNEW.csv', index=False, header=True)

print(citibike_df)

**Results Summary:**

1. **Total Number of Trips:**
   - In August, the total number of trips was 2,344,224.
   - **Customer Breakdown:** 
     - Short-term riders: 157,671
     - Annual subscribers: 1,900,359
   - **Insight:** The majority (81%) of NYC CitiBike customers are annual subscribers, indicating a strong income foundation by attracting long-term customers.

2. **Gender Breakdown:**
   - **Male:** 65.2%
   - **Female:** 25%
   - **Chart:** !Gender Breakdown

3. **Trip Duration by Birth Year:**
   - **Insight:** Younger riders tend to use the bikes for longer periods compared to other age groups.
   - **Graph:** !Average Trip Duration by Birth Year

4. **Peak Usage Hours:**
   - **Peak Hours:** 5:00 PM to 7:00 PM
   - **Least Active Hours:** 2:00 AM to 5:00 AM
   - **Insight:** Peak hours require the most bikes, while least active hours are ideal for bike maintenance.
   - **Chart:** !Peak Usage Hours

5. **High-Traffic Locations:**
   - **Insight:** NYC CitiBike customers prefer starting and ending their bike rental journeys in commercial and high-tourist areas of Manhattan.
   - **Maps:** ![High-Traffic Locations](image) !High-Traffic Locations

6. **Weekly Usage Patterns:**
   - **Weekday Commute Times:** Heavy usage around 7:00 AM to 9:00 AM and 5:00 PM to 7:00 PM
   - **Weekend Usage:** Highest from 10:00 AM to 7:00 PM
   - **Insight:** Most rides are taken by male users.
   - **Heatmap:** !Weekly Usage Patterns

7. **Trip Duration Distribution:**
   - **Insight:** The majority of trips are under an hour in length, with male customers taking significantly more rides.
   - **Graph:** !Trip Duration Distribution

**Summary:**

The bike-share data analysis shows that bike-share services are popular in high-traffic areas of New York, especially during August 2019 due to favorable weather and a higher number of tourists. Most rides were concentrated in tourist and commercial areas of Manhattan. The majority of customers are male, and they rent bikes during morning and evening rush hours, indicating that CitiBike services are used as an alternative to public transportation by commuting workers.

**Recommendations:**

Due to the seasonality of this business, it is recommended to conduct additional statistical analysis and visualization to compare data across different months to determine yearly trends. Additionally, a weather impact analysis should be performed to check the correlation between weather conditions and bike usage.