<a href="https://www.kaggle.com/code/williamross94/bike-share-case-study?scriptVersionId=110332570" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# **Bike-Share Case Study**

**Table of contents**
* [1. Ask ](#section-one)
* [2. Prepare ](#section-two)
* [3. Process ](#section-three)
* [4. Analyze](#section-four)
* [5. Share ](#section-five)
* [6. Act](#section-six)

<a id="introduction"></a> <br>
## **Scenario**

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director
of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore,
your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights,
your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives
must approve your recommendations, so they must be backed up with compelling data insights and professional data
visualizations.

### **Stakeholders**

**Lily Moreno:** The director of marketing and your manager. Moreno is responsible for the development of campaigns
and initiatives to promote the bike-share program. These may include email, social media, and other channels.


**Cyclistic executive team:** The notoriously detail-oriented executive team will decide whether to approve the
recommended marketing program.

**Cyclistic marketing analytics team:** A team of data analysts who are responsible for collecting, analyzing, and
reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy
learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic
achieve them.

<a id="section-one"></a>
# 1. Ask


**Primary Guiding Question:**  How do annual members and casual riders use Cyclistic bikes
differently?

**Other Guiding Questions:**

 Why would casual riders buy Cyclistic annual memberships?
 
 How can Cyclistic use digital media to influence casual riders to become members?
 

 **Business Task:**
 
Design marketing strategies aimed at converting casual riders into annual members. Use the marketing analyst team to better understand how annual members and casual riders differ, why
casual riders would buy a membership?

<a id="section-two"></a>
# 2. Prepare


* The data's origin is from a Chicago based bikeshare company named DIVY. However for this case study the name has been changed to the fictional name Cyclistic.
* The data has been provided by [Motivate International Inc](https://divvy-tripdata.s3.amazonaws.com/index.html) under this ([license](https://ride.divvybikes.com/data-license-agreement)).
* 12 months of data was used to conduct this case study. It ranged from August 2021 to July 2022.

<a id="section-three"></a>
# 3. Process


In [None]:
from datetime import datetime
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

#searching for relevant csv files

import glob
glob.glob('/kaggle/input/ride-share-data/Trip_data2/*.csv')

In [None]:
# Go through the 12 csv files with for loop and append to list combined_df

combined_df = []

for file in glob.glob('/kaggle/input/ride-share-data/Trip_data2/*.csv'):
    print(f' file: {file}')
    df = pd.read_csv(file)
    combined_df.append(df)

In [None]:
# combined into one data frame

df = pd.concat(combined_df)

In [None]:
# drop any duplicate rows

df.drop_duplicates()

In [None]:
df.shape

In [None]:
# Find total null values for each column

df.isnull().sum()

<a id="section-four"></a>
# 4. Analyze

**Main objective for analysis:**
* Create a column for the total length of time riders used the bikes

* Create a column for the month

* Create a column for the name of the weekday

* Create a column for the hour(start time)

* Create a column for the hour(end time)

* Find the percentage of member and casual riders 

* Find the percentage of each bike type used 

* Find the percentage of member_type and rideable_type pairs

In [None]:
#Calculate the total ride time (ended_at - started_at)


date_format = "%Y-%m-%d %H:%M:%S"

start_time = []
end_time = []
total = []

#iterate over the 'started_at' column, convert using date_format, and appending to list start_time

for col in df['started_at']:
    time1 = datetime.strptime(col, date_format)
    start_time.append(time1)

#iterate over the 'ended_at' column, convert using date_format, and appending to list end_time    
    
for col in df['ended_at']:
    time2 = datetime.strptime(col, date_format)
    end_time.append(time2)  

#subtracting start_time list values from end_time list values      
    
for i in range(5901463):
    time_diff = end_time[i] - start_time[i]
    total.append(time_diff)

#convert to numpy array 

total_time = np.asarray(total)

# Add column 'total_ride_time' to dataframe

df['total_ride_time'] = total_time.tolist()

In [None]:
# total_ride_time column created

df

In [None]:
# Data types of each column

df.dtypes

In [None]:
#convert the total_ride_time column (timedelta) into seconds

df["ride_time(seconds)"] = df["total_ride_time"]/np.timedelta64(1, 's')

In [None]:
#convert the date into "day of the week (ex.Wednesday)"

date_format = "%Y-%m-%d %H:%M:%S"

week_day = []

for day in df["started_at"]:
    d = datetime.strptime(day, date_format).strftime('%A')
    week_day.append(d)

day = np.asarray(week_day)

df["day_of_week"] = day.tolist()

In [None]:
# day_of_week column created 

df

In [None]:
df.dtypes

In [None]:
# min shows that their are negative values

df["ride_time(seconds)"].describe()

In [None]:
# Reomoving negative values 

negative_values = df[df["ride_time(seconds)"] < 0].index
df.drop(negative_values, inplace=True)

In [None]:
df["ride_time(seconds)"].describe()

In [None]:
# Creating a month column

months = {
    "01": "January",
    "02": "February",
    "03": "March",
    "04": "April",
    "05": "May",
    "06": "June",
    "07": "July",
    "08": "August",
    "09": "September",
    "10": "October",
    "11": "November",
    "12" : "December"
    
}


Months = []

for date in df["ended_at"]:
    a, b, c = date.split("-")
    month_word = months.get(b)
    Months.append(month_word)
    
m_w = np.asarray(Months)

df["Month"] = m_w.tolist()

In [None]:
# Creating a start hour column

hour = []

for date in df["started_at"]:
    if date[11] == "0":
        hour.append(date[12])
    else:
        hour.append(date[11:13])

h = np.asarray(hour)

df["Hour(start_time)"] = h.tolist()   

In [None]:
# Creating a end hour column

hour = []

for date in df["ended_at"]:
    if date[11] == "0":
        hour.append(date[12])
    else:
        hour.append(date[11:13])

h = np.asarray(hour)

df["Hour(end_time)"] = h.tolist() 

In [None]:
df.head()

In [None]:
# Finding the average value of member_casual ride_time(seconds)

df.groupby("member_casual")["ride_time(seconds)"].describe().round(2)

**The average casual user rode for 1752.79 seconds per session (29.21 minutes)**


**The average member user rode for 775.97 seconds per session (12.93 minutes)**


In [None]:
df["rideable_type"].value_counts()

In [None]:
# Calculating the percentage of each bike type

classic_bike = 3054969
electric_bike = 2618541
docked_bike = 226676

total = classic_bike + electric_bike + docked_bike
classic = round((classic_bike / total) * 100,2)
electric = round((electric_bike / total) * 100,2)
docked = round((docked_bike / total) * 100,2)

print(f"classic_bike : {classic}%")
print(f"electric_bike : {electric}%")
print(f"docked_bike : {docked}%")

**51.78% of riders used classic bikes**

**44.38% of riders used electric bikes** 

**3.84% of riders used docked bikes** 


In [None]:
df["member_casual"].value_counts()

In [None]:
# Calculating the percentage of each member type

member = 3378502
casual = 2521684
total = member + casual
members = round((member / total) * 100,2)
casuals = round((casual / total) * 100,2)

print(f"member : {members}%")
print(f"casual : {casuals}%")

**57.26% of riders were members**

**42.74% of riders were casual**

In [None]:
# member_type and rideable_type pairs

df_ridetype_member = df.groupby(['rideable_type', 'member_casual']).size().unstack(fill_value=0)
df_ridetype_member

In [None]:
classic_bike_member = 1922330
electric_bike_member = 1456172
docked_bike_member = 0
classic_bike_casual = 1132639
electric_bike_casual = 1162369
docked_bike_casual = 226676

In [None]:
total = classic_bike_casual + classic_bike_member + electric_bike_casual + electric_bike_member + docked_bike_casual + docked_bike_member
cb_casual = round((classic_bike_casual / total) * 100,2)
eb_casual = round((electric_bike_casual / total) * 100,2)
db_casual = round((docked_bike_casual / total) * 100,2)
cb_member = round((classic_bike_member / total) * 100,2)
eb_member = round((electric_bike_member / total) * 100,2)
db_member = round((docked_bike_member / total) * 100,2)

print(f"classic_bike_casual : {cb_casual}%")
print(f"electric_bike_casual : {eb_casual}%")
print(f"docked_bike_casual : {db_casual}%")
print(f"classic_bike_member : {cb_member}%")
print(f"electric_bike_member : {eb_member}%")
print(f"docked_bike_member : {db_member}%")

In [None]:
# Create pie chart showing (member_type)-(rideable_type) pairs

data = [19.2, 19.7, 3.84, 32.58, 24.68, 0]
labels = ['classic_bike_casual', 'electric_bike_casual', 'docked_bike_casual', 'classic_bike_member', 'electric_bike_member', 'docked_bike_member']

#=Seaborn color palette 
colors = sns.color_palette('pastel')

#create the pie chart
plt.pie(data, labels = labels, colors = colors, autopct='%.0f%%')

<a id="section-five"></a>
# 5. Share


This project's tableau visualizations can be found here: [click here](https://public.tableau.com/views/Bike-ShareCaseStudy_16631858139840/Story1?:language=en-US&publish=yes&:display_count=n&:origin=viz_share_link)

<a id="section-six"></a>
# 6. Act

**Key findings:**
* The average casual user rode for 29.21 minutes compared to members users who rode for 12.93 minutes.


* Casual user's start and end locations were more densely populated near the shorline. However, member's were more evenly distrubited among locations and went further inland.


* Streeter Dr and Grand Ave was the most popular station for casual users. Casual riders had 3.6 times more users than members in this location.


* Casual user's were more likely to ride on weekends than weekdays.


* Member's were more consistant in riding througout the week.


* The summer month's were most popular among casual riders.


* the classic bike was the most popular rideable type for members. While casual rider's had about an even preference for electric and classic bikes.

**Recommendations**

1. Provide a limited-time offer during the summer months, which gives those who sign up for annual memberships the first year at a discounted price. Focus on the stations with a high population of casual riders such as Streeter Dr and Grand Ave.

2. Since casual users are more likely to ride on the weekends, provide memberships that give perks such as unlimited ride times on the weekends or some type of point reward system that gives bonuses to weekend users.

3. Prioritize putting more bike docking stations along the shoreline where casual riders frequent. In order the give greater accessibility.