# About the Company
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

# Scenerio
As a junior data analyst at Cyclistic, a bike-share company in Chicago, your team's objective is to analyze the usage patterns of casual riders and annual members to devise a marketing strategy that converts casual riders into annual members. To gain approval from Cyclistic executives, your recommendations need to be supported by compelling data insights and professional data visualization. The company's future success hinges on maximizing annual memberships, making this analysis crucial for Cyclistic's marketing team.

**This project will be completed by using the 6 Data Analytics stages:**

Ask: Identify the business task and determine the key stakeholders.
Prepare: Collect the data, identify how it’s organized, determine the credibility of the data.
Process: Select the tool for data cleaning, check for errors and document the cleaning process.
Analyze: Organize and format the data, aggregate the data so that it’s useful, perform calculations and identify trends and relationships.
Share: Use design thinking principles and data-driven storytelling approach, present the findings with effective visualization. Ensure the analysis has answered the business task.
Act: Share the final conclusion and the recommendations.

## Phase 1: Ask
**Business Task**

As an analyst on the team, I will conduct a comprehensive analysis of the data to identify meaningful trends and insights related to the distinctions between annual members and casual riders. The primary objective is to gain a deep understanding of the factors that motivate casual riders to purchase a membership and leveraging this knowledge, to contribute to the development of data-driven marketing strategies that effectively target and convert casual riders into annual members, thereby driving growth for Cyclistic.

**Three questions will guide the future marketing program:**

How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?

**Key Stakeholders**

Lily Moreno - Director of Marketing
Cyclistic Marketing Analytics Team
Cyclistic Executive Team

## Phase 2: Prepare
The data required for analysis was provided by Google itself. However, it was already uploaded onto kaggle under the datasets "Cyclistic_Bike_Share_Apr_22-Mar_23", "Cyclistic Bike-Share / May 2022 though April 2023", "Cyclistic Ride Share May 2020- May 2023". The data has been updated regularly, and adheres to the principle of ROCCC (Reliable, Original, Current, Comprehensive, and Cited). The dataset encompasses the latest 12 month period, ranging from June 2022 - May 2023.

**Data Preperation** 
Firstly, we'll import the datasets so it can be merged into one file and data cleaning and analysis can be conducted easily

In [1]:
# Loading the libraries

library(tidyverse)
library(dplyr)
library(ggplot2)
library(lubridate)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.2     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.4.2     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.2     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.1     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


In [2]:
# Changing Working Directory
setwd("/kaggle/input/cyclistic-bike-share-apr-22-mar-23/New folder")

In [3]:
# Loading the dataframes
June_2022 <- read.csv("Chic_bike_Jun_22.csv")
July_2022 <- read.csv("Chic_bike_Jul_22.csv")
August_2022 <- read.csv("Chic_bike_Aug_22.csv")
September_2022 <- read.csv("Chic_bike_Sep_22.csv")
October_2022 <- read.csv("Chic_bike_Oct_22.csv")
November_2022 <- read.csv("Chic_bike_Nov_22.csv")
December_2022 <- read.csv("Chic_bike_Dec_22.csv")
January_2023 <- read.csv("Chic_bike_Jan_23.csv")
February_2023 <- read.csv("Chic_bike_Feb_23.csv")
March_2023 <- read.csv("Chic_bike_Mar_23.csv")
April_2023 <- read.csv("/kaggle/input/cyclistic-bike-share-may-2022-though-april-2023/202304-divvy-tripdata.csv")
May_2023 <- read.csv(file("/kaggle/input/cyclistic-ride-share-may-2020-may-2023/202305-divvy-tripdata/202305-divvy-tripdata.csv"))

In [4]:
# To check consistency of the datasets

colnames(June_2022)
colnames(July_2022)
colnames(August_2022)
colnames(September_2022)
colnames(October_2022)
colnames(November_2022)
colnames(December_2022)
colnames(January_2023)
colnames(February_2023)
colnames(March_2023)
colnames(April_2023)
colnames(May_2023)


str(June_2022)
str(July_2022)
str(August_2022)
str(September_2022)
str(October_2022)
str(November_2022)
str(December_2022)
str(January_2023)
str(February_2023)
str(March_2023)
str(April_2023)
str(May_2023)

'data.frame':	769204 obs. of  13 variables:
 $ ride_id           : chr  "600CFD130D0FD2A4" "F5E6B5C1682C6464" "B6EB6D27BAD771D2" "C9C320375DE1D5C6" ...
 $ rideable_type     : chr  "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
 $ started_at        : chr  "2022-06-30 17:27:53" "2022-06-30 18:39:52" "2022-06-30 11:49:25" "2022-06-30 11:15:25" ...
 $ ended_at          : chr  "2022-06-30 17:35:15" "2022-06-30 18:47:28" "2022-06-30 12:02:54" "2022-06-30 11:19:43" ...
 $ start_station_name: chr  "" "" "" "" ...
 $ start_station_id  : chr  "" "" "" "" ...
 $ end_station_name  : chr  "" "" "" "" ...
 $ end_station_id    : chr  "" "" "" "" ...
 $ start_lat         : num  41.9 41.9 41.9 41.8 41.9 ...
 $ start_lng         : num  -87.6 -87.6 -87.7 -87.7 -87.6 ...
 $ end_lat           : num  41.9 41.9 41.9 41.8 41.9 ...
 $ end_lng           : num  -87.6 -87.6 -87.6 -87.7 -87.6 ...
 $ member_casual     : chr  "casual" "casual" "casual" "casual" ...
'data.frame':	823488 obs. of  

# Phase 3: Process

**Data Integration and Merging:**

We have loaded the required R libraries and datasets for our analysis. We have also conducted a thorough inspection of the datasets to ensure compatibility for merging and identified any structural inconsistencies. Moving forward, we will manipulate and merge the datasets to streamline the analysis stage

In [5]:
# Combining all dataframes

all_trips <- bind_rows(June_2022,July_2022,August_2022,September_2022,October_2022,November_2022,December_2022,January_2023,February_2023,March_2023,April_2023,May_2023)

In [6]:
# Inspecting combined dataframe

colnames(all_trips)
nrow(all_trips)
head(all_trips)
summary(all_trips)

Unnamed: 0_level_0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
1,600CFD130D0FD2A4,electric_bike,2022-06-30 17:27:53,2022-06-30 17:35:15,,,,,41.89,-87.62,41.91,-87.62,casual
2,F5E6B5C1682C6464,electric_bike,2022-06-30 18:39:52,2022-06-30 18:47:28,,,,,41.91,-87.62,41.93,-87.63,casual
3,B6EB6D27BAD771D2,electric_bike,2022-06-30 11:49:25,2022-06-30 12:02:54,,,,,41.91,-87.65,41.89,-87.61,casual
4,C9C320375DE1D5C6,electric_bike,2022-06-30 11:15:25,2022-06-30 11:19:43,,,,,41.8,-87.66,41.8,-87.65,casual
5,56C055851023BE98,electric_bike,2022-06-29 23:36:50,2022-06-29 23:45:17,,,,,41.91,-87.63,41.93,-87.64,casual
6,B664188E8163D045,electric_bike,2022-06-30 16:42:10,2022-06-30 16:58:22,,,,,42.03,-87.71,42.06,-87.73,casual


   ride_id          rideable_type       started_at          ended_at        
 Length:5829030     Length:5829030     Length:5829030     Length:5829030    
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
 start_station_name start_station_id   end_station_name   end_station_id    
 Length:5829030     Length:5829030     Length:5829030     Length:5829030    
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            

In [7]:
# To Check all unique values inside the column

table(all_trips$member_casual)


 casual  member 
2312073 3516957 

**Data Transformation and Column Addition:**

In this section, we will perform various operations to modify the dataset and introduce new columns.

In [8]:
# Splitting for easy aggregation

all_trips$date <- as.Date(all_trips$started_at)
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")

In [9]:
# New Ride Duration Column

all_trips$ride_duration <- difftime(all_trips$ended_at,all_trips$started_at)

all_trips$ride_duration <- as.numeric(all_trips$ride_duration)

**Data Cleaning and Outlier Removal:**

This section focuses on improving the data quality and ensuring the accuracy of our analysis.

The following steps will be taken:

Filtering out observations with negative length of time.
Removing data related to test sites and empty stations.
Handling observations with missing values (NA data) appropriately.

In [10]:
# The dataframe includes a few hundred entries where ride_duration is negative
# We will create a new version of the dataframe (v2) since data is being removed

all_trips_v2 <- all_trips[!(all_trips$ride_duration < 0),]

# Phase 4: Analyze
In this phase, we will delve into our dataset and leverage various analytical tools to examine user behaviors and identify trends. The objective is to uncover valuable insights that can be utilized for marketing initiatives and identifying monetization opportunities. By analyzing the data, we aim to gain a deeper understanding of user preferences, patterns, and trends that will guide strategic decision-making. This analysis will enable us to target specific user segments, optimize marketing campaigns, and explore avenues for generating revenue within our business model.

In [11]:
# Comparing usage during each month
all_trips_v2 %>%
    group_by(month) %>%
    summarise(Members = sum(member_casual == "member"), 
              Casuals = sum(member_casual == "casual")
             )

month,Members,Casuals
<chr>,<int>,<int>
1,150293,40008
2,147428,43016
3,196477,62201
4,279302,147284
5,370639,234178
6,400148,369044
7,417426,406046
8,427000,358917
9,404636,296694
10,349693,208988


**Insight:** Both Members and Casuals use our service more during summer months

In [12]:
# Overview of Mean, Median, Max, Min of "ride_duration"
summary(all_trips_v2$ride_duration)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0     335     591    1123    1057 2483235 

In [13]:
# Compare members and casual users
all_trips_v2 %>%
  group_by(member_casual) %>%
  summarise(mean(ride_duration),max(ride_duration),min(ride_duration))

member_casual,mean(ride_duration),max(ride_duration),min(ride_duration)
<chr>,<dbl>,<dbl>,<dbl>
casual,1692.5512,2483235,0
member,748.1938,93580,0


**Insight**: The analysis reveals distinct patterns in ride durations between casual riders and members, suggesting divergent usage patterns. Casual riders exhibit longer and more variable ride durations, which aligns with the hypothesis that they are more likely to use the service for leisure activities. On the other hand, members demonstrate shorter and more consistent ride durations, implying that they utilize the bikes as a means of regular transportation.

In [14]:
# Fixing the unordered days of the week
all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week,levels=c("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"))

all_trips_v2 %>%
  group_by(member_casual,day_of_week) %>%
  summarise(number_of_rides=n())

[1m[22m`summarise()` has grouped output by 'member_casual'. You can override using the
`.groups` argument.


member_casual,day_of_week,number_of_rides
<chr>,<ord>,<int>
casual,Sunday,375613
casual,Monday,257820
casual,Tuesday,270962
casual,Wednesday,291902
casual,Thursday,313339
casual,Friday,346555
casual,Saturday,455826
member,Sunday,394535
member,Monday,471393
member,Tuesday,552529


**Insight:** From the analysis, it is clear that casuals prefer using our service during weekends, where as members prefer using our service during weekdays

In [15]:
# Comparing bike types
all_trips_v2 %>%
    group_by(member_casual) %>%
    summarise(Electric = sum(rideable_type == "electric_bike"),
              Classic = sum(rideable_type == "classic_bike"), 
              Docked = sum(rideable_type == "docked_bike"),  
              Classic_Percentage = paste0(round(Classic / nrow(all_trips_v2) * 100, 2), "%"),
              Ebike_Percentage = paste0(round(Electric / nrow(all_trips_v2) * 100, 2), "%"),
              Docked_Percentage = paste0(round(Docked / nrow(all_trips_v2) * 100, 2), "%")
             )

member_casual,Electric,Classic,Docked,Classic_Percentage,Ebike_Percentage,Docked_Percentage
<chr>,<int>,<int>,<int>,<chr>,<chr>,<chr>
casual,1297064,857752,157201,14.72%,22.25%,2.7%
member,1786252,1730649,0,29.69%,30.64%,0%


**Insight:** Upon analyzing the data, it can be observed that members do not use docked bikes at all. Also, the proportions of classic bikes and electric bikes are relatively balanced for members, with casuals having a clear preference for electric bikes.

## Insights summary:

1. Both Casuals and Members prefer using services during summer, and customers count drops drastically during winters

2. Casual riders take longer and more variable rides for leisure activities, while members prefer shorter and consistent rides for regular transportation.

3. Casual rider prefer using services during weekends and Member use services during weekdays

4. Targeting casual riders whose ride patterns align with members' can drive annual membership growth through tailored promotions and benefits.

5. Casual riders have a preference for electric bikes, whereas Members use both electric bikes and classic bikes with usage being fairly balanced for both types of bike

6. Members do not use docked bikes at all

# Phase 5 : Share
This phase mainly involves the data visualisations of that support our findings. The viz are created using Power BI

1. Month vs Rides

![Screenshot (44).png](attachment:c5b4c258-3a53-4ff8-8e9f-1cddf25c682d.png)

**Key Findings:**
The peak months of the cyclist service are June, July and August (Summer Months) for both type of customers 

2. Casual Riders vs Day of The Week

![Screenshot (45).png](attachment:d42bbcb8-e643-4ced-97c8-e2918374d88f.png)

**Key Findings:**
Casual Riders cycle mostly on Saturdays and Sundays, with a proclivity towards electric bike

3. Members vs Day of The Week

![Screenshot (47).png](attachment:d638f536-032a-46d9-938f-e503818ad3a4.png)

**Key Findings:** Members have a predilection for weekdays towards using our bikes. They don't use docked bikes at all

**Phase 6: Act**
**Recommendations:** 6 Plans to Convert Casual Riders to Annual Members and Increasing Annual Membership

**Commuter Targeted Plan:** Develop a comprehensive plan that specifically targets commuters and emphasizes the cost-saving benefits and incentives of using Cyclistic bikes for daily transportation. Highlight the environmental-conscious aspect of biking as a sustainable and "green" lifestyle choice.

**Similar Riding Patterns Plan:** Create a targeted plan that focuses on casual riders who exhibit ride patterns similar to annual members. Highlight the benefits of annual membership that align with their riding preferences, such as short, consistent, and efficient bike rides. Customized promotions and tailored messaging can effectively attract these users to become annual members.

**Business Sponsorship Plan:** Collaborate with local businesses to form partnerships that sponsor annual memberships for their employees as perk of their comprehensive benefits package. Emphasizing the positive impact on physical and mental health of using Cyclistic bikes, positioning them as a convenient and economical substitute for daily commuting. By promoting these benefits to employees, it can foster a happier and healthier workforce while increasing membership conversions among the employed population with access to enjoyable, eco-friendly and reliable means of transportation.

**Seasonal Promotion Plan:** Develop a spring-summer promotion with an early bird special that coincides with the peak bike usage period observed in the data. By offering discounted rates or incentives for early sign-ups, it can encourage casual riders to take advantage of the peak season and consider the benefits of annual membership.

**Vibrant City Campaign:** Launch a social media campaign that highlights the fun, leisure, and cultural aspects of the city. Market the Cyclistic bikes as a means to explore and be part of the vibrant city experience. Utilize visually appealing content and user-generated content to engage with potential customers and create a sense of community and excitement around using Cyclistic bikes.

**Targeting Popular Routes Plan:** Focus marketing efforts on promoting annual memberships in the most popular routes and areas of the city where bike usage is highest. These routes demonstrate high user awareness and present an opportunity for easier conversion of casual riders to annual members. Targeted promotions and incentives specific to these areas can effectively drive membership conversions and increase annual membership.

By implementing these recommendations, Cyclistic can strategically target casual riders, leverage their riding patterns, and create enticing marketing campaigns that appeal to their needs and motivations. This focused approach will lead to increased annual membership conversions, enhanced customer loyalty, and overall growth of the bike-sharing service.