![image](https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/image1_hH9B4gs.jpg)

## About the Company 
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime

Customers who purchase single-ride or full-day passes are referred to as **casual riders**. And customers who purchase annual memberships are called **members.**

## Business Task
1. How do annual members and casual riders use Cyclistic bikes differently?
2. Why would casual riders buy Cyclistic annual memberships?
3. How can Cyclistic use digital media to influence casual riders to become members?

## Overview of the Data Source 
The data was made available by Motivate International Inc, under the Data license agreement. Motive International Inc, (“Motivate”) operates the city of Chicago Divvy Bicycle sharing service, which the fictional company “Cyclistic” is based on for this case study. More information can be found here: https://www.divvybikes.com/data-license-agreement The previous 12 month of bicycle trip data are organized into individual cvs files according to year and month (“202005” = May 2020). Each csv file contains 13 columns including information such as bicycle stations for arrival and departure of bicycles, membership type, and much more concerning location. A link to the data: https://divvy-tripdata.s3.amazonaws.com/index.html


In [None]:
library(tidyverse)  #helps wrangle data
library(lubridate)  #helps wrangle date attributes
library(ggplot2)

trips_05_21 <- read_csv ("../input/capstone2022/202105-divvy-tripdata.csv")
trips_06_21 <- read_csv ("../input/capstone2022/202106-divvy-tripdata.csv")
trips_07_21 <- read_csv ("../input/capstone2022/202107-divvy-tripdata.csv")
trips_08_21 <- read_csv ("../input/capstone2022/202108-divvy-tripdata.csv")
trips_09_21 <- read_csv ("../input/capstone2022/202109-divvy-tripdata.csv")
trips_10_21 <- read_csv ("../input/capstone2022/202110-divvy-tripdata.csv")
trips_11_21 <- read_csv ("../input/capstone2022/202111-divvy-tripdata.csv")
trips_12_21 <- read_csv ("../input/capstone2022/202112-divvy-tripdata.csv")
trips_01_22 <- read_csv ("../input/capstone2022/202201-divvy-tripdata.csv")
trips_02_22 <- read_csv ("../input/capstone2022/202202-divvy-tripdata.csv")
trips_03_22 <- read_csv ("../input/capstone2022/202203-divvy-tripdata.csv")
trips_04_22 <- read_csv ("../input/capstone2022/202204-divvy-tripdata.csv")

# Prepare and Clean the Data 

The 12 months of bike sharing data stretches from **May 2021**, to **April 2022.**  

In [None]:
#Preview each month 
glimpse(trips_05_21)
glimpse(trips_06_21)
glimpse(trips_07_21)
glimpse(trips_08_21)
glimpse(trips_09_21)
glimpse(trips_10_21)
glimpse(trips_11_21)
glimpse(trips_12_21)
glimpse(trips_01_22)
glimpse(trips_02_22)
glimpse(trips_03_22)
glimpse(trips_04_22)

In [None]:
all_trips <- rbind (trips_05_21, trips_06_21, trips_07_21, trips_08_21, trips_09_21, trips_10_21, trips_11_21, trips_12_21, trips_01_22, trips_02_22, trips_03_22, trips_04_22)

#preview the data
glimpse(all_trips)

In [None]:
all_trips <- drop_na(all_trips)
# Add columns that list the date, month, day, and year of each ride
# This will allow us to aggregate ride data for each month, day, or year ... before completing these operations we could only aggregate at the ride level
all_trips$date <- as.Date(all_trips$started_at) #The default format is yyyy-mm-dd
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")

all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)

#The dataframe contains a few hundred rows of "bad" data, bikes were taken out of docks 
# and checked for quality by Divvy or ride length was negative (< 0) 
all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),]

# Analyze 

Time to identify trends and insights

In [None]:
#Calculate the average ride time by each day for members and casual users
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)



# Visualize the data.

In [None]:
# analyze ridership data by type and weekday
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>%  
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)	

# Visualize the number of rides by rider type
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge")

# Create a visualization for average duration
all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) +
  geom_col(position = "dodge")

In [None]:
#Average duration per bike type and rider  
all_trips_v2 %>%
  group_by(member_casual, rideable_type) %>%
  summarize(number_of_riders = n(),
  average_duration = mean(ride_length)) %>%
  arrange(member_casual, rideable_type)
  
  
  #Top 5 starting stations for both casual riders and members 
top_5_start_station_names <- sort(table(all_trips_v2$start_station_name), decreasing=TRUE)[2:6]
knitr::kable(top_5_start_station_names,
             col.names = c("Starting Station Name", "Number of Rides"), 
             caption = "Top 5 Starting Stations for Both Casual Riders and Members (May 2020 - Apr 2021)")

#Top 5 ending stations for both casual riders and members 
top_5_end_station_names <- sort(table(all_trips_v2$end_station_name), decreasing=TRUE)[2:6]
knitr::kable(top_5_end_station_names,
             col.names = c("End Station Name", "Number of Rides"), 
             caption = "Top 5 Ending Stations for Both Casual Riders and Members (May 2020 - Apr 2021)")

In [None]:
#Create separate Top 5 starting and ending stations for both casual riders and members 

only_members <- all_trips_v2[!(all_trips_v2$member_casual == "casual"), ]
only_casuals <- all_trips_v2[!(all_trips_v2$member_casual == "member"), ]

top_5_member_starts <- sort(table(only_members$start_station_name), decreasing= TRUE)[2:6]
top_5_member_ends <- sort(table(only_members$end_station_name), decreasing = TRUE)[2:6]

top_5_casual_starts <- sort(table(only_casuals$start_station_name), decreasing = TRUE)[2:6]
top_5_casual_ends <- sort(table(only_casuals$end_station_name), decreasing = TRUE)[2:6]

#Visualize each table 
knitr::kable(top_5_member_starts,
             col.names = c("Starting Stations", "Number of Rides"), 
             caption = "Top 5 Starting Stations for Members (May 2021 - April 2022)")

knitr::kable(top_5_member_ends,
             col.names = c("Ending Stations", "Number of Rides"), 
             caption = "Top 5 Ending Stations for Members (May 2021 - April 2022)")

knitr::kable(top_5_casual_starts,
             col.names = c("Starting Stations", "Number of Rides"), 
             caption = "Top 5 Starting Stations for Casual Riders (May 2021 - April 2022)")

knitr::kable(top_5_casual_ends,
             col.names = c("Ending Stations", "Number of Rides"), 
             caption = "Top 5 Ending Stations for Casual Riders (May 2021 - April 2022)")
 

 *  The top 5 starting and ending stations for casual riders are exactly the same, but ordered differently. The name of the stations: **Kingsbury St & Kinzie St**, **Clark St & Elm St , Wells St & Concord Ln**, **Theater on the lake**, and **Streeter Dr & Grand Ave**
  

Key Insights and Recommendations:
  1) Casual cyclists tend to ride for longer and typically ride most during the weekend. They may be more inclined to purchase memberships then.
  
  2) **Kingsbury St & Kinzie St**, **Clark St & Elm St , Wells St & Concord Ln**, **Theater on the lake**, and **Streeter Dr & Grand Ave** are the most popular stations. We can push for memberships most at these locations. 
  
  3) Cyclists can run a campain using the classic bikes as they were the most popular bikes. Additionally we can boost availibility of classic bike types by running campagins using docked or electic bikes.
  
  4) A cyclist may be more likely to convert to a membership if they are presented with their ride length cost over the month vs what they could have saved on a membership.