Case Study 1 
 
Case Study: How Does a Bike-Share Navigate Speedy Success?
 
Introduction to the case study
I am a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, my team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, my team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve my recommendations, so they must be backed up with compelling data insights and professional data visualizations. In order to answer the key business questions, I will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act.

Phase 1- Ask:
Business Tasks : Design marketing strategies aimed at converting casual riders into annual members.
Stakeholders
●Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
●Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy.
●Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

Phase 2 - Prepare:
I will use Cyclistic’s historical trip data to analyze and identify trends. For the purposes of this case study, the datasets are appropriate and will enable you to answer the business questions. Attached is the link to access the data https://divvy-tripdata.s3.amazonaws.com/index.html. The data has been made available by Motivate International Inc. under this license https://www.divvybikes.com/data-license-agreement
Phase 3 - Process:
I will be using R programming language for my analysis


In [15]:
install.packages("tidyverse")
install.packages("lubridate")
install.packages("ggplot2")

library(tidyverse)
library(lubridate)
library(ggplot2)

In [16]:
q2_2019 = read_csv("../input/divvy-trips/Divvy_Trips_2019_Q2.csv")
q3_2019 = read_csv("../input/divvy-trips/Divvy_Trips_2019_Q3.csv")
q4_2019 = read_csv("../input/divvy-trips/Divvy_Trips_2019_Q4.csv")
q1_2020 = read_csv("../input/divvy-trips/Divvy_Trips_2020_Q1.csv")

In [17]:
colnames(q3_2019)
colnames(q4_2019)
colnames(q2_2019)
colnames(q1_2020)

In [18]:
(q4_2019 <- rename(q4_2019
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q3_2019 <- rename(q3_2019
                   ,ride_id = trip_id
                   ,rideable_type = bikeid 
                   ,started_at = start_time  
                   ,ended_at = end_time  
                   ,start_station_name = from_station_name 
                   ,start_station_id = from_station_id 
                   ,end_station_name = to_station_name 
                   ,end_station_id = to_station_id 
                   ,member_casual = usertype))

(q2_2019 <- rename(q2_2019
                   ,ride_id = "01 - Rental Details Rental ID"
                   ,rideable_type = "01 - Rental Details Bike ID" 
                   ,started_at = "01 - Rental Details Local Start Time"  
                   ,ended_at = "01 - Rental Details Local End Time"  
                   ,start_station_name = "03 - Rental Start Station Name" 
                   ,start_station_id = "03 - Rental Start Station ID"
                   ,end_station_name = "02 - Rental End Station Name" 
                   ,end_station_id = "02 - Rental End Station ID"
                   ,member_casual = "User Type"))

In [19]:
str(q1_2020)
str(q4_2019)
str(q3_2019)
str(q2_2019)

In [20]:
q4_2019 <-  mutate(q4_2019, ride_id = as.character(ride_id)
                   ,rideable_type = as.character(rideable_type)) 
q3_2019 <-  mutate(q3_2019, ride_id = as.character(ride_id)
                   ,rideable_type = as.character(rideable_type)) 
q2_2019 <-  mutate(q2_2019, ride_id = as.character(ride_id)
                   ,rideable_type = as.character(rideable_type))
q2_2019 <- mutate(q2_2019, started_at = as.character(started_at))
q3_2019 <- mutate(q3_2019, started_at = as.character(started_at))
q1_2020 <- mutate(q1_2020, started_at = as.character(started_at))
q2_2019 <- mutate(q2_2019, ended_at = as.character(ended_at))
q3_2019 <- mutate(q3_2019, ended_at = as.character(ended_at))
q1_2020 <- mutate(q1_2020, ended_at = as.character(ended_at))

In [21]:
all_trips <- bind_rows(q2_2019, q3_2019, q4_2019, q1_2020)
summary(all_trips)

In [22]:
all_trips <- all_trips %>%  
  select(-c(start_lat, start_lng, end_lat, end_lng, birthyear, gender, "01 - Rental Details Duration In Seconds Uncapped", "05 - Member Details Member Birthday Year", "Member Gender", "tripduration"))


In [23]:
colnames(all_trips)  #List of column names
nrow(all_trips)  #How many rows are in data frame?
dim(all_trips)  #Dimensions of the data frame?
head(all_trips)  #See the first 6 rows of data frame.  Also tail(all_trips)
str(all_trips)  #See list of columns and data types (numeric, character, etc)
summary(all_trips)  #Statistical summary of data. Mainly for numerics


In [24]:
table(all_trips$member_casual)

In [25]:
all_trips <-  all_trips %>% 
  mutate(member_casual = recode(member_casual
                           ,"Subscriber" = "member"
                           ,"Customer" = "casual"))

In [26]:
table(all_trips$member_casual)

In [27]:
all_trips$date <- as.Date(all_trips$started_at) #The default format is yyyy-mm-dd
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")

In [28]:
str(all_trips)

In [29]:
all_trips <- mutate(all_trips, started_at= as.POSIXct(started_at, format = "%Y-%m-%d %H:%M:%S") , ended_at= as.POSIXct(ended_at, format = "%Y-%m-%d %H:%M:%S") )

In [30]:
all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)

In [31]:
str(all_trips)

In [32]:
is.factor(all_trips$ride_length)
all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))
is.numeric(all_trips$ride_length)

In [33]:
all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),]

In [34]:
head(all_trips_v2)

In [35]:
summary(all_trips_v2$ride_length)

In [36]:
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)

In [37]:
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)

In [38]:
all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

In [39]:
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)

In [40]:
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>%  #creates weekday field using wday()
  group_by(member_casual, weekday) %>%  #groups by usertype and weekday
  summarise(number_of_rides = n()							#calculates the number of rides and average duration 
  ,average_duration = mean(ride_length)) %>% 		# calculates the average duration
  arrange(member_casual, weekday)								# sorts

In [42]:
trip1 <- na.omit(all_trips_v2)

In [43]:
trip1 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>%  #creates weekday field using wday()
  group_by(member_casual, weekday) %>%  #groups by usertype and weekday
  summarise(number_of_rides = n()							#calculates the number of rides and average duration 
  ,average_duration = mean(ride_length)) %>% 		# calculates the average duration
  arrange(member_casual, weekday)								# sorts

In [44]:
trip1 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge")+
  labs(title = "Number of rides during the week")


In [46]:
trip1 %>%
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) +
  geom_col(position = "dodge")+
  labs(title = "Average duration during the week ")



In [47]:
ggplot(trip1,aes(x=member_casual, fill = member_casual))+
  theme_bw()+
  geom_bar()+
  labs(y="Member counts ",
       title = "Annual members  Vs Casual Members")


In [48]:
ggplot(all_trips,aes(x=month, fill = member_casual))+
  theme_bw()+
  geom_bar()+
  #facet_wrap(~member_casual)+
  labs(y="Ride counts",
       title = "Cyclic Rides by Month")