# Case Study: How does a bike-share navigate speedy successs

## Introduction
I have been working on the Google Data Analytics Professional Certificate, which has equipped me with essential skills to analyze and interpret data. As a junior data analyst on the marketing team at Cyclistic, a bike-share company in Chicago, I’ve been tasked with understanding the differences in how casual riders and annual members use our bikes.
Our director of marketing believes that the future success of Cyclistic hinges on maximizing annual memberships. To achieve this, we need to gather insights on user behavior that will help us convert casual riders into loyal annual members. This case study will examine how data analytics can reveal patterns in bike usage, such as ride frequency, duration, and peak times.
By presenting compelling data insights and professional visualizations, we aim to develop a strategic marketing plan that appeals to casual riders. Our ultimate goal is to secure executive approval for our recommendations, ensuring that Cyclistic continues to thrive in a competitive market.

## Background

### About the Company
Founded in 2016, Cyclistic has established a successful bike-share program in Chicago, featuring a fleet of 5,824 geotracked bicycles and 692 docking stations. Users can easily unlock bikes at one station and return them to any other within the network.

### Marketing Strategy
Cyclistic’s marketing strategy has traditionally focused on general awareness and broad consumer appeal, utilizing flexible pricing plans such as single-ride passes, full-day passes, and annual memberships. Casual riders are defined as those who purchase single-ride or full-day passes, while annual members represent a more profitable customer segment.

### Objective
Recognizing the potential for growth, Director of Marketing Lily Moreno aims to convert casual riders into annual members, emphasizing that these riders are already familiar with Cyclistic’s offerings. To achieve this, her team seeks to understand the differences between casual riders and members, explore motivations for membership, and assess how digital media can enhance marketing efforts. Analyzing historical bike trip data will be crucial in identifying key trends to inform these strategies.

### Characters and Teams
* Cyclistic: A bike-share program in Chicago featuring over 5,800 bicycles and 600 docking stations. Cyclistic offers a variety of bikes, including reclining bikes, hand tricycles, and cargo bikes, making it inclusive for riders with disabilities. While most users prefer traditional bikes, about 8% opt for assistive options. The majority of rides are for leisure, with approximately 30% used for commuting.
* Lily Moreno: The director of marketing and your manager, responsible for developing campaigns across email, social media, and other channels to promote the bike-share program.
* Cyclistic Marketing Analytics Team: A team of data analysts focused on collecting, analyzing, and reporting data to guide marketing strategies. As a junior data analyst on this team for six months, you are learning how to support Cyclistic’s mission and goals.
* Cyclistic Executive Team: A detail-oriented group that will review and decide whether to approve the recommended marketing program based on data-backed insights and proposals.

## Google Data Analytics Stages
Google Data Analytics follows a six-stage process: Ask, Prepare, Process, Analyze, Share, and Act. In the Ask stage, analysts define key questions to guide their work. The Prepare stage involves gathering and organizing relevant data. Next, in Process, data is cleaned and structured for accuracy. During Analyze, various techniques are applied to extract insights. The Share stage focuses on communicating findings through reports and visualizations. Finally, in Act, insights inform decision-making and drive actions aligned with organizational goals. This structured approach ensures effective and impactful data analysis.

### Ask
#### Main Objective
To understand How annual members and casual riders use Cyclist bikes differently

#### Business Task

* Identify Business Task: What motivates casual riders to transition into annual members?
* Consider key stakeholders: 
    * Director of Marketing: Lily Moreno, responsible for the development of campaigns and initiatives to promote the bike-share program.
    * Executive Team: Determines whether to approve the recommended marketing program.
    * Analytics Team: Collects, analyzes, and reports data to inform Cyctistic's marketing strategy.
    
#### Deliverable
* A clear statement of the business task: Determine the main factors that encourage riders to transition to annual membership.
* Problem statement: How annual members and casual riders use Cyclistic bikes differently. Identifying these differences is crucial for developing targeted marketing strategies that address the unique behaviors and preferences of each group, ultimately enhancing rider engagement and satisfaction.
* Insights for Business Decisions: Identifying the differences in usage patterns between annual members and casual riders is essential for defining and designing an effective marketing campaign. By understanding these distinctions, Cyclistic can create targeted strategies that attract more members and ultimately increase profits.

### Prepare
#### Guiding questions
* Where is your data located? 
     Click [here](https://divvy-tripdata.s3.amazonaws.com/index.html) for the Cyclistic historical data from 2013 to 2024.
* How is the data organized? 
    The data is organized into 12 Excel CSV files by quarters from 2013 to 2019 and by month from 2020 to 2024. The analysis focus on data from 2023 with 12 files named YYYYMM-divvy-tripdata.csv
    Columns:
        * ride_id: Ride identifier
        * rideable_type: Type of bike
        * started_at: Trip start time in YYYY-MM-DD HH:MM:SS
        * ended_at: Trip end time in YYYY-MM-DD HH:MM:SS 
        * start_station_name, start_station_id, start_lat, start_lng: Start station details
        * end_station_name, end-station_id, end_lat, end_lng: end station detai
        * member_casual: Casual or Annual Member type 
        
* Are there issues with bias or credibility in this data? Does your data ROCCC? 
    The data is reliable, original, comprehensive, current and cited, provided by Lyft Bikes, LLC.
* How are you addressing licensing, privacy, security, and accessibility? 
    The data is open and maintained by Motivate International Inc, following Data Licence Agreement on Divvy Bikes
* How did you verify the data’s integrity?
    I conducted data cleaning to address missing values and duplicates, cross-verified it against trusted external sources, performed consistency checks for logical accuracy, conducted random sampling for manual review, examined the data collection methods and documentation, and performed basic statistical analyses to identify any anomalies.

#### Key Tasks
* Download data and store it appropriately.
* Identify how it’s organized.
* Sort and filter the data.
* Determine the credibility of the data.

### Process
#### Key tasks
* Check the data for errors.
* Choose your tools.
* Transform the data so you can work with it effectively.
* Document the cleaning process.


In [None]:
library(tidyverse)
library(dplyr)
library(lubridate)
library(skimr)
library(ggplot2)
library(janitor)

In [None]:
m01 <- read.csv("/kaggle/input/cyclist-dataset-2023/202301-divvy-tripdata.csv")
m02 <- read.csv("/kaggle/input/cyclist-dataset-2023/202302-divvy-tripdata.csv")
m03 <- read.csv("/kaggle/input/cyclist-dataset-2023/202303-divvy-tripdata.csv")
m04 <- read.csv("/kaggle/input/cyclist-dataset-2023/202304-divvy-tripdata.csv")
m05 <- read.csv("/kaggle/input/cyclist-dataset-2023/202305-divvy-tripdata.csv")
m06 <- read.csv("/kaggle/input/cyclist-dataset-2023/202306-divvy-tripdata.csv")
m07 <- read.csv("/kaggle/input/cyclist-dataset-2023/202307-divvy-tripdata.csv")
m08 <- read.csv("/kaggle/input/cyclist-dataset-2023/202308-divvy-tripdata.csv")
m09 <- read.csv("/kaggle/input/cyclist-dataset-2023/202309-divvy-tripdata.csv")
m10 <- read.csv("/kaggle/input/cyclist-dataset-2023/202310-divvy-tripdata.csv")
m11 <- read.csv("/kaggle/input/cyclist-dataset-2023/202311-divvy-tripdata.csv")
m12 <- read.csv("/kaggle/input/cyclist-dataset-2023/202312-divvy-tripdata.csv")

In [None]:
View(m01)

In [None]:
str(m01)
str(m02)
str(m03)
str(m04)
str(m05)
str(m06)
str(m07)
str(m08)
str(m09)
str(m10)
str(m11)
str(m12)

In [None]:
head(m01)
head(m02)
head(m03)
head(m04)
head(m05)
head(m06)
head(m07)
head(m08)
head(m09)
head(m10)
head(m11)
head(m12)

All the 12 datasets have 13 columns. Combine the 12 datasets to create a single dataset

In [None]:
# Combine all the datasets using rbind

cyclistic_data <- rbind(m01, m02, m03, m04, m05, m06, m07, m08, m09, m10, m11, m12)
View(cyclistic_data)

#### Clean and transform the cyclistic dataset

In [None]:
#Clean the column names
cyclistic_dataset <- clean_names(cyclistic_data)

#Convert chr to date time format
cyclistic_data <- cyclistic_data %>% mutate(started_at = ymd_hms(started_at), ended_at = ymd_hms(ended_at))

#Calculate Ride length 
cyclistic_data <- cyclistic_data %>% mutate(ride_length = as.numeric(difftime(ended_at, started_at, units='mins' )))

#Calculate the Day of Week
cyclistic_data <- cyclistic_data %>% mutate( day_of_week = wday(started_at, label=TRUE))

#Remove NA or negative ride length
cyclistic_data <- cyclistic_data %>%  filter(!is.na(ride_length) & ride_length>0)

cyclistic_data <- drop_na(cyclistic_data)

#Remove duplicates based on column ride_id

cyclistic_data <- cyclistic_data[!duplicated(cyclistic_data$ride_id), ]

glimpse(cyclistic_data)

### Analyze 

#### Key tasks
* Aggregate your data so it’s useful and accessible.
* Organize and format your data.
* Perform calculations.
* Identify trends and relationships.



In [None]:
#Calculate Average, maximum and minimum ride lengths

calc_ride_length <- cyclistic_data %>% group_by(member_casual) %>%  summarise(avg_ride_length = mean(ride_length), max_ride_length = max(ride_length), min_ride_length = min(ride_length))
print(calc_ride_length)

In [None]:
#Calculate the mode of day of week
mode_day_of_week <- cyclistic_data %>% count(day_of_week) %>% slice_max(n, n = 1) %>% pull(day_of_week)
print(mode_day_of_week)

In [None]:
#Calculate number of rides by days of week
ride_count_by_day <- cyclistic_data %>%  group_by(member_casual, day_of_week) %>%  summarise(number_of_rides = n(), avg_ride_length = mean(ride_length), .groups = "drop") 
ride_count_by_day

In [None]:
# Start and end station usage

station_usage <- cyclistic_data %>% group_by(member_casual, start_station_name, end_station_name) %>% summarise(number_of_rides = n(), .groups = 'drop') %>% arrange(desc(number_of_rides))
top_station <- station_usage %>% filter(member_casual %in% c("member", "casual")) %>% group_by(member_casual) %>% top_n(10, number_of_rides)
print(top_station)

### Share 

#### Key tasks
* Determine the best way to share your findings.
* Create effective data visualizations.
* Present your findings.
* Ensure your work is accessible.

#### Visualization
#### Average Ride length by member type


In [None]:
ggplot(calc_ride_length, aes(x= member_casual, y= avg_ride_length, fill= member_casual)) + geom_bar(stat="identity") + labs(title= "Average Ride Length by Member type", x="Member Type", y="Average Ride Length")

#### Number of rides by days of week

In [None]:
ggplot(ride_count_by_day, aes(x=day_of_week, y=number_of_rides, fill=member_casual)) + geom_bar(stat="identity", position='dodge') + labs(title = "Number of Rides by Days Of Week", x = "Days Of Week", y= "Number of Rides")

In [None]:


ggplot(top_station, aes(x= start_station_name, y = number_of_rides, fill=member_casual)) + geom_bar(stat='identity', position='dodge') + coord_flip() + labs(title = "Top Start Stations by Member Type", x= "Start Station", y="Number of rides")

### Share

#### Average Ride length by member type
Casual riders have long ride lengths(20.7 mins) when compare to the annual members(12.1 mins) of the cyclistic bike share. The visualizations shows that casual members are using boke share more for leisure trips when compared to annual members

#### Number of Rides by Days Of Week
Casual members have high ride counts on the weekends, especially on Saturdays and Sundays. While the annual members have a consistent ride counts during the whole week, with slight increase during the weekdays

#### Top Start and end stations by Member type
Popular start and end stations vary differently between casual riders vs annual riders. Stations like DuSable Harbor, DuSable Lake Shore Dr & Monroe St are hightly used by casual riders, while annual members used various different start stations.



### Act
* What is your final conclusion based on analysis?
Casual riders have long ride lengths when compared to annual riders. Based on my analysis, we should create a campaign that focus on leisure activities especially during the weekends to attract the casual members. Highlight benefits such as weekend ride package, leisure ride events etc.

* Recommendations and Actions
Create a campaign for casual members that focus on leisure activities especially during the weekends to attract the casual members. Highlight benefits such as weekend ride package, leisure ride events etc.

For annual members, create the campaign that emphasizes the convenience and cost saving on annual membership. Plus promote benefits such as easy access to docking stations, faster commute times and dedicated bike lanes.

Optimize the stations that are popular among the casual riders, by ensuring that these stations are well maintained and have ample of bikes especially during the weekends.

Offer discounted incentives to casual riders to become annual members. 