# Cyclistic : How Does a Bike-Share Navigate Speedy Success?
## Author : Shilpi Dubey
### Data source:
Motivate International Inc.

In [None]:
## loading necessary packages
library(dplyr)
library(tidyr)

### Ask

##### Business task
How do annual members and casual riders use cyclistic bike differently?
##### Stakeholders
Cyclistic executive team, Director of marketing(Lily Moreno) ,Cyclistic marketing analytics team

### Prepare

##### Download the data and store it appropriately
The data has been downloaded and is stored locally on desktop as well as google drive.
##### Identify how data is organized
Seperate comma seperated files are available for 12 months with 13 columns namely ride_id, rideable_type, started_at, ended_at, start_station_name, start_station_id, end_station_name, end_station_id,start_lat, start_lng, end_lat, end_long and member/casual.
##### Determine the credibility of the data
Since the data is publicly available by Motivate International Inc., we can assume that the data is credible.

### Process

##### Tools used
For this analysis, I will use R for statistical analysis and tableau for data visualization.
##### Transform the data
Since, the dataframe y2020_12, y2021_01, y2021_02 and y2021_03 contains start_station_id and end_station_id as character, therefore, they need to be converted to integer on order to combine the data into single dataframe. 

In [2]:
## Load all datasets
y2020_04 <- read.csv("C:/Users/lenovo/Desktop/case study/202004-divvy-tripdata.csv")
y2020_05 <- read.csv("C:/Users/lenovo/Desktop/case study/202005-divvy-tripdata.csv")
y2020_06 <- read.csv("C:/Users/lenovo/Desktop/case study/202006-divvy-tripdata.csv")
y2020_07 <- read.csv("C:/Users/lenovo/Desktop/case study/202007-divvy-tripdata.csv")
y2020_08 <- read.csv("C:/Users/lenovo/Desktop/case study/202008-divvy-tripdata.csv")
y2020_09 <- read.csv("C:/Users/lenovo/Desktop/case study/202009-divvy-tripdata.csv")
y2020_10 <- read.csv("C:/Users/lenovo/Desktop/case study/202010-divvy-tripdata.csv")
y2020_11 <- read.csv("C:/Users/lenovo/Desktop/case study/202011-divvy-tripdata.csv")
y2020_12 <- read.csv("C:/Users/lenovo/Desktop/case study/202012-divvy-tripdata.csv")
y2021_01 <- read.csv("C:/Users/lenovo/Desktop/case study/202101-divvy-tripdata.csv")
y2021_02 <- read.csv("C:/Users/lenovo/Desktop/case study/202102-divvy-tripdata.csv")
y2021_03 <- read.csv("C:/Users/lenovo/Desktop/case study/202103-divvy-tripdata.csv")

In [6]:
y2020_12 <- mutate(y2020_12, start_station_id =  as.integer(start_station_id), end_station_id = as.integer(end_station_id))
y2021_01 <- mutate(y2021_01, start_station_id =  as.integer(start_station_id), end_station_id = as.integer(end_station_id))
y2021_02 <- mutate(y2021_02, start_station_id =  as.integer(start_station_id), end_station_id = as.integer(end_station_id))
y2021_03 <- mutate(y2021_03, start_station_id =  as.integer(start_station_id), end_station_id = as.integer(end_station_id))

In [7]:
all_trips <- bind_rows(
                        y2020_04,
                        y2020_05,
                        y2020_06,
                        y2020_07,
                        y2020_08,
                        y2020_09,
                        y2020_10,
                        y2020_11,
                        y2020_12,
                        y2021_01,
                        y2021_02,
                        y2021_03
)

In [8]:
all_trips <- mutate(all_trips, started_at = as.POSIXct(started_at), ended_at = as.POSIXct(ended_at))

In [11]:
all_trips$year <- format(as.Date(all_trips$date),"%Y")
all_trips$month <- format(as.Date(all_trips$date),"%m")
all_trips$day <- format(as.Date(all_trips$date),"%d")
all_trips$day_of_week <- format(as.Date(all_trips$date),"%A")

In [9]:
all_trips <- mutate(all_trips, date = as.Date(started_at), ride_length_mins
                    = as.numeric(difftime(ended_at, started_at, units = "mins")))

In [4]:
all_trips <- all_trips %>% select(-c(started_at, ended_at, start_lat, start_lng, end_lat, end_lng))
all_trips <- drop_na(all_trips)

In [16]:
all_trips <- all_trips[!all_trips$ride_length_mins <0,]

##### Documentation for cleaning or manipulation
After the gathering of all datasets in single dataframe,
* started_at and ended_at column was transformed to timestamp format
* ride_length(min) column was added
* seperate columns for year, month, day and day_of_week was extracted from the started_at date
* unneccesary columns like started_at, ended_at, start_lat, start_lng, end_lat, end_lng was removed
* NA's were removed(since they accounted for only 0.37% of total)
* removed rows with negative ride length value

### Analyze

##### Perform Calculations
Let's first look at a statistical summary of the aggregated and transformed data frame. Let's also look at the structure of the columns.

In [None]:
summary(all_trips)
str(all_trips)

In [None]:
aggregate(all_trips$ride_length_mins ~ all_trips$member_casual, FUN = mean)
aggregate(all_trips$ride_length_mins ~ all_trips$member_casual, FUN = median)

In [None]:
Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}
aggregate(all_trips$day_of_week ~ all_trips$member_casual, FUN = Mode)

In [None]:
all_trips$day_of_week <- ordered(all_trips$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
aggregate(all_trips$ride_length_mins ~ all_trips$member_casual + all_trips$day_of_week, FUN = mean)

In [None]:
all_trips %>% 
  group_by(member_casual, day_of_week) %>%  
  summarise(number_of_rides = n()  
  ,average_duration = mean(ride_length_mins)) %>% 
  arrange(member_casual, day_of_week)

##### Identify trends and relationship
Here are key observations usinf simple analysis,
* Average ride time for casual riders is more than members. 
* Riders usually prefer to ride on saturdays
* Members are consistent is using bikes all the days of the week for specific amount of time with little bit increase on  saturdays while for casual riders duartion of using bike is less on monday and it increases at the weekends.

### Share

##### Determine best way to share your findings
I will use tableau to create data visualization and share the findings.
##### Create effective visualization
[Link to tableau dashboard](https://public.tableau.com/profile/shilpi.dubey#!/vizhome/CyclisticHowcasualridersusebikedifferentlythanmembers/CyclisticHowcasualridersusebikesdifferentlythanmembers)

### Act

##### Recommendations based on Analysis
First let's revisit the business task: How do annual members and casual riders use cyclistic bike differently?

From the analysis, it can be seen that most of the commuters used docked_bikes to ride with rise in commuters in the month of August. Also, members use bikes on regular basis but their average ride time is less than the casual riders which shows that they must be using bikes for work purposes. It is also visible through analysis that casual riders use bikes mostly on saturdays and sundays while members use their bike consistently for each day of week.

My three recommendations for key marketing strategy are:
* Weekend only membership could be launched to attract casual riders.
* A campaign could be launched for casual riders that if they would take annual membership they would get a chance to see the city for free for a weekend.
* All kind of campaigns should be launched around autumn months when ridership is at its annual peak.