## **Introduction**
In this case study. I'll be working with a fictional company called Cyclistic as a Junior Data Analyst. Cyclistic is a bike sharing company in Chicago. They offer a bike share program that features more than 5800 bikes and 600 docking stations.

Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.

The business task here is to understand how casual riders and annual members use Cyclistic bikes differently which will be used to design a new marketing strategy to convert casual riders into annual members.

As a Junior Data Analyst, I have been assigned the task of understanding how annual members and casual riders differ which would guide the marketing strategy team in converting casual riders to annual members.

### **PHASE 1 - Ask**
*Identify the Business Task*

In this phase, I will need to understand the business question or the problem the company is trying to solve and how my analysis can help drive the business decision. In this instance, the business task is to know how casual and member riders use Cyclistic differently. The insights gotten from this can be used in marketing strategy to convert casual riders to annual members.

*Consider Key Stakeholders*

In this case study, I'll be working with a few stakeholders:

- Lily Moreno: She is my manager and the director of marketing. She is responsible for the development of campaigns and initiative to promote the bike share program.
- Cyclistic Executive team: They will be responsible on deciding whether to approve the recommended marketing program.
- Cyclistic marketing analytics team: They are a team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy.

### **PHASE 2 - Prepare**
In this phase, I'll be loading the datasets that will be used for the analysis. The datasets were gotten from this source link

Before working with the datasets, I had to load some libraries that would be used in the analysis. The libraries are Tidyverse for data analysis, readr for loading the datasets, ggplot2 for visualization and lubridate package for working with dates.

In [1]:
# Installing the Packages
install.packages("tidyverse")
install.packages("lubridate")

In [2]:
# Loading the Packages
library(tidyverse)
library(readr)
library(ggplot2)
library(lubridate)

In [3]:
# Importing and preparing the data
df_apr_2021 <- read_csv('../input/cyclistic-bike-share-last-12-months/202104-divvy-tripdata.csv')
df_may_2021 <- read_csv('../input/cyclistic-bike-share-last-12-months/202105-divvy-tripdata.csv')
df_jun_2021 <- read_csv('../input/cyclistic-bike-share-last-12-months/202106-divvy-tripdata.csv')
df_jul_2021 <- read_csv('../input/cyclistic-bike-share-last-12-months/202107-divvy-tripdata.csv')
df_aug_2021 <- read_csv('../input/cyclistic-bike-share-last-12-months/202108-divvy-tripdata.csv')
df_sep_2021 <- read_csv('../input/cyclistic-bike-share-last-12-months/202109-divvy-tripdata.csv')
df_oct_2021 <- read_csv('../input/cyclistic-bike-share-last-12-months/202110-divvy-tripdata.csv')
df_nov_2021 <- read_csv('../input/cyclistic-bike-share-last-12-months/202111-divvy-tripdata.csv')
df_dec_2021 <- read_csv('../input/cyclistic-bike-share-last-12-months/202112-divvy-tripdata.csv')
df_jan_2022 <- read_csv('../input/cyclistic-bike-share-last-12-months/202201-divvy-tripdata.csv')
df_feb_2022 <- read_csv('../input/cyclistic-bike-share-last-12-months/202202-divvy-tripdata.csv')
df_mar_2022 <- read_csv('../input/cyclistic-bike-share-last-12-months/202203-divvy-tripdata.csv')

In [4]:
# Checking each dataframe to look for inconsistencies 
glimpse(df_apr_2021)
glimpse(df_may_2021)
glimpse(df_jun_2021)
glimpse(df_jul_2021)
glimpse(df_aug_2021)
glimpse(df_sep_2021)
glimpse(df_oct_2021)
glimpse(df_nov_2021)
glimpse(df_dec_2021)
glimpse(df_jan_2022)
glimpse(df_feb_2022)
glimpse(df_mar_2022)

In [5]:
# Merging the different dataframes into one for easier analysis
bike_share <- bind_rows(df_apr_2021, df_may_2021, df_jun_2021,
                        df_jul_2021, df_aug_2021, df_sep_2021,
                        df_oct_2021, df_nov_2021, df_dec_2021,
                        df_jan_2022, df_feb_2022, df_mar_2022)

In [6]:
# Exploring the data to understand the structure and variables
dim(bike_share) # to know the number of observations and variables
head(bike_share, 10) # getting the first 10 rows
colnames(bike_share) # checking the variable names
glimpse(bike_share) # shows an overview of the dataframe

### **PHASE 3 - Process**

In [7]:
# To get the distinct values in ride types and membership types
unique(bike_share$rideable_type)
unique(bike_share$member_casual)

In [8]:
# Checking for duplicated rides
bike_share[duplicated(bike_share$ride_id), ]

There is no duplicate rides

In [9]:
# Checking for negative time
bike_share %>% 
  filter(ended_at < started_at) %>% 
  count()

There are 145 incorect time values. This might be due to data collection error.

I should remove it to continue my analysis correctly.

In [10]:
# Removing negative time
bike_share <- bike_share %>% 
  filter(ended_at > started_at)

glimpse(bike_share)

In [11]:
# Removing rows with na values
bike_share <- bike_share %>% 
  na.omit()

In [12]:
# Renaming variables
bike_share <- bike_share %>% 
  rename(ride_type = rideable_type, 
         user_type = member_casual) %>%  
  unite(start_lat_lng , c('start_lng', 'start_lat'), sep = "") %>% 
  unite(end_lat_lng, c('end_lng', 'end_lat'), sep = " ")

glimpse(bike_share)

In [13]:
# Extracting the days from the started_at variable
bike_share <- bike_share %>% 
  mutate(weekday = wday(started_at, label=TRUE, abbr=TRUE))

In [14]:
# Extracting the months from the started_at variable
bike_share <- bike_share %>% 
  mutate(month = month(started_at, label=TRUE, abbr=TRUE))

In [15]:
# Extracting the years from the started_at variable
bike_share <- bike_share %>% 
  mutate(year = year(started_at))

In [16]:
# Extracting the time from the started_at and ended_at as a variable
bike_share <- bike_share %>% 
  mutate(start_time = format(started_at, "%H:%M:%S")) %>% 
  mutate(end_time = format(ended_at, "%H:%M:%S")) %>% 
  mutate(start_time = hms(start_time)) %>% 
  mutate(end_time = hms(end_time))

In [17]:
# Extracting the hour from the timestamp as a variable
bike_share <- bike_share %>% 
  mutate(hour = hour(start_time))

In [18]:
# Creating duration variable in mins
bike_share$duration <- difftime(bike_share$ended_at, 
                               bike_share$started_at, units = "mins")

head(bike_share)

### **PHASE 4 - Analyze**

In [19]:
# Checking for the count of members and casual riders
table(bike_share$user_type)
table(bike_share$ride_type)

In [20]:
# Visualizing the user count
ggplot(bike_share, aes(user_type, fill = user_type)) +
    geom_bar() +
    labs(title = 'User type Count', x = 'User Type', y = 'Count')

In [21]:
# Plotting a graph to better visualize the result
bike_share %>%
    ggplot(aes(x = ride_type, fill = ride_type)) +
        geom_bar() +
        facet_wrap(~user_type) +
        labs(title = 'Count of Casual and Members Riders', x = 'Ride Type', y = 'Count')

In [22]:
# Getting the total number of rides for each user type for each ride type and the percentage
df <- bike_share %>% 
  select(ride_id, user_type, ride_type) %>% 
  group_by(ride_type, user_type) %>% 
  count() %>% 
  mutate(percentage = (n/4912025*100)) %>% 
  arrange(desc(percentage))

head(df)

In [23]:
# Getting the no of rides for each month
df_month <- bike_share %>% 
    select(user_type, month) %>% 
    group_by(month, user_type) %>% 
    count()  

head(df_month)

In [24]:
# Visualizing the number of rides per month
ggplot(df_month, aes(x=month, y=n, fill = month)) +
  geom_col() +
  facet_wrap(~user_type) +
  labs(title = 'Number of Rides per Months', x = 'Months', y = 'Count')

In [25]:
# Getting the no of rides for each weekday
df_weekdays <- bike_share %>% 
  select(user_type, weekday) %>% 
  group_by(weekday, user_type) %>% 
  count() 

head(df_weekdays)

In [26]:
# Visualizing the number of rides per day
ggplot(df_weekdays, aes(weekday, y=n, fill = weekday)) +
  geom_col(position = "dodge") +
  facet_wrap(~user_type) +
  labs(title = "Number of Rides Per Weekday", x = 'Weekdays', y = 'Count')

In [27]:
# Getting the number of rides for each hour of the day
df_hour <- bike_share %>% 
  select(hour, user_type) %>% 
  group_by(hour, user_type) %>% 
  count() 

head(df_hour)

In [28]:
# Visualizing the number of rides per hour
ggplot(df_hour, aes(hour, n, fill = hour)) +
  geom_col() +
  facet_wrap(~user_type) +
  labs(title = 'Frequency of rides in each hour of the Day', x = 'Hour of the Day', y = 'Count of Rides')

In [29]:
# Finding the mean duration
bike_share %>% 
  select(user_type, duration) %>% 
  summarise(mean_duration = mean(duration))

In [30]:
# Creating a dataframe for only casual riders
duration_casual <- bike_share %>% 
  filter(user_type == 'casual')

# Getting the average duration for casual riders 
duration_casual %>% 
  select(user_type, duration) %>% 
  summarise(mean_duration = mean(duration))

In [31]:
# Creating a dataframe for only member riders
duration_member <- bike_share %>% 
  filter(user_type == 'member')

# Getting the average duration for member riders
duration_member %>% 
  select(user_type, duration) %>% 
  summarise(mean_duration = mean(duration))

- Mean duration of ride = 21.35 mins
- Mean duration for Casual riders = 32.04mins
- Mean duration for Member riders = 12.94mins

In [32]:
mean_duration <- bike_share %>% 
  group_by(user_type) %>% 
  summarise(mean_time = mean(duration))

In [33]:
# Visualizing the mean duration for each user type
ggplot(mean_duration, aes(user_type, mean_time, fill = user_type)) +
  geom_col() +
  labs(title = 'Mean Time Travelled by User', x = 'User', y = 'Mean time travelled in mins')

### **PHASE 5 - Share**

#### *Conclusions*

- It can be seen that riders are more active in summer. So, rides are influenced by temperature.
- The casual riders are more active in the weekends which shows that they use the bikes for leisure, but the members use the bikes for transportaion purposes on a daily basis.
- Casual riders are only active in the evening but members are more active in the morning as well as in the evening.
- Casual riders average ride time is about three times of that of members.
- Casual riders make more use of docked bike than bike share members who makes more use of classic bikes.

##### *Purpose of Use of Bikes*

- Member Riders = Work or Excercise
- Casual Riders = Leisure

#### *Recommendations*

- Discounts should be given on weekends to casual riders when they use the bike share service most.
- Ads could also be created outlining the benefits of resorting to using bikes to commute to work on a daily basis - like how it is a form of exercise and helps maintain good health and also how it contributes to maintaining a green environment.
- Since more of the casual riders make more use of docked bikes, plans on getting more docked bikes in place should be considered.
- We could start family plans for casual riders as leisure time is usually spent with family. 
- Increase benefits for riding during cold months. Coupons and discounts could be handed out.

#### *Next Steps*

Further analysis could be done to improve the findings, besides that, the marketing team can take the main information to build a marketing campaign.

### **PHASE 6 - Act**

The act phase would be done by the marketing team of the company. The main takeaway will be the top three recommendations for the marketing.