# 1- Introduction:
This my capstone case study of Google data analytics professional certificate. I’ll perform real-world tasks as a junior data analyst for a fictional company (Cyclistic bike-share). In order to answer key business question, I’ll follow the steps of data analysis process (Ask, Prepare, Process, Analyze, Share and Act).

### About Cyclistic bike-share company:
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

### Scenario:
I’m a junior data analyst working in the marketing analyst team at Cyclistic. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, our team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, our team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve our recommendations, so they must be backed up with compelling data insights and professional data visualizations.

# 2- ASK:
### Objective:
Design marketing strategies aimed at converting casual riders into annual members. In order to achieve that three questions, need be answered
* How do annual members and casual riders use Cyclistic bikes differently?
* Why would casual riders buy Cyclistic annual memberships?
* How can Cyclistic use digital media to influence casual riders to become members?

### Business task:
How do annual members and casual riders use Cyclistic bikes differently?

### Key stakeholders:
* Lily Moreno: The director of marketing and my manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
* Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy.
* Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

# 3- Prepare:
* the data located in divvy-tripdata divvy.
* Data organized by month and year saved to CSV (comma separated values) file. In this case study I’m going to use 12 files from July 2022 through June 2023 each file contains columns named as follow (ride_id, rideable_type, started_at, ended_at, start_station_name, start_station_id, end_station_name, end_station_id, start_lat, start_lng, end_lat, end_lng, member_casual) saved to My computer.
* For the purposes of this case study, the datasets are appropriate so it’s ROCCC (reliable, original, comprehensive, current and cited).
* The data has been made available by Motivate International Inc. under this license. privacy issues prohibit me from using riders’ personally identifiable information. This means that I won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.

### Install required packages
* install.packages("tidyverse")
* install.packages("lubridate")
* install.packages("readr")
* install.packages("ggplot2")

In [None]:
### Load required packages
library(tidyverse)
library(lubridate)
library(ggplot2)
library(readr)
library(dplyr)
getwd()
setwd("/kaggle/input/bike-share-case-study")

In [None]:
### Import data
jul22 <- read_csv("202207-divvy-tripdata.csv")
aug22 <- read_csv("202208-divvy-tripdata.csv")
sep22 <- read_csv("202209-divvy-tripdata.csv")
oct22 <- read_csv("202210-divvy-tripdata.csv")
nov22 <- read_csv("202211-divvy-tripdata.csv")
dec22 <- read_csv("202212-divvy-tripdata.csv")
jan23 <- read_csv("202301-divvy-tripdata.csv")
feb23 <- read_csv("202302-divvy-tripdata.csv")
mar23 <- read_csv("202303-divvy-tripdata.csv")
apr23 <- read_csv("202304-divvy-tripdata.csv")
may23 <- read_csv("202305-divvy-tripdata.csv")
jun23 <- read_csv("202306-divvy-tripdata.csv")

### Making columns consistent and merging them into a single dataframe

In [None]:
### Compare columns names
colnames(jul22)
colnames(aug22)
colnames(sep22)
colnames(oct22)
colnames(nov22)
colnames(dec22)
colnames(jan23)
colnames(feb23)
colnames(mar23)
colnames(apr23)
colnames(may23)
colnames(jun23)

In [None]:
### Check data structre
str(jul22)
str(aug22)
str(sep22)
str(oct22)
str(nov22)
str(dec22)
str(jan23)
str(feb23)
str(mar23)
str(apr23)
str(may23)
str(jun23)

In [None]:
### change date and time format in (aug22) file form character format to datetime fromat
aug22<- read_csv("202208-divvy-tripdata.csv",
    col_types = cols(started_at = col_datetime(format = "%m/%d/%Y %H:%M"),ended_at = col_datetime(format = "%m/%d/%Y %H:%M")))
aug22

In [None]:
### combine into a single file
tripdata <- bind_rows(jul22,aug22,sep22,oct22,nov22,dec22,jan23,feb23,mar23,apr23,may23,jun23)

# 4- Process:
Cleaning and preparing data for analysis.
* By sorting and filtering one of the twelve files (202207-divvy-tripdata.csv) using Microsft office Excel I noticed that the data too big for Excel to handle so I decide to use Rstudio to complete the analysis process.
### Inspect the new table that has been created

In [None]:
### List of column names
colnames(tripdata)

In [None]:
### How many rows are in data frame?
nrow(tripdata)

In [None]:
### Dimensions of the data frame?
dim(tripdata)

In [None]:
### See the first 6 rows of data frame
head(tripdata)

In [None]:
### See list of columns and data types
str(tripdata)

In [None]:
### Statistical summary of data. Mainly for numeric}
summary(tripdata)

In [None]:
### check ride_id column for duplicates (each observation must contain unique ride id)
sum(duplicated(tripdata$ride_id))

In [None]:
### check rideable_type column for errors (each observation must contain one of: classic_bike, docked_bike or electric_bike)
table(tripdata$rideable_type)

In [None]:
### check member_casual column for errors (each observation must contain one of: casual or member)
table(tripdata$member_casual)

### clean and add data the new table

In [None]:
### Add columns that list the date, month, day, and year of each ride
tripdata <- tripdata %>%
  mutate(year=format(as.Date(started_at), "%Y")) %>%
  mutate(month=format(as.Date(started_at), "%B")) %>%
  mutate(day=format(as.Date(started_at), "%d")) %>%
  mutate(day_of_week=format(as.Date(started_at), "%A"))

In [None]:
### Add ride_length calculation column
tripdata$ride_length <- difftime(tripdata$ended_at,tripdata$started_at)

In [None]:
### Convert ride length to numeric
tripdata$ride_length <- as.numeric(tripdata$ride_length)
is.numeric(tripdata$ride_length) # to check

#### Remove bad data
##### The dataframe includes a few hundred entries when bikes were taken out of docks and checked for quality by Divvy or ride_length was negative We will create a new version of the dataframe (v2) since data is being removed

In [None]:
tripdata_V2 <- tripdata[!(tripdata$ride_length <= 0),]

# 5- Analyze

In [None]:
### Descriptive analysis on ride_length (all figures in seconds)
summary(tripdata_V2$ride_length)

In [None]:
### Compare members and casual users
aggregate(tripdata_V2$ride_length ~ tripdata_V2$member_casual, FUN = mean)
aggregate(tripdata_V2$ride_length ~ tripdata_V2$member_casual, FUN = median)
aggregate(tripdata_V2$ride_length ~ tripdata_V2$member_casual, FUN = max)
aggregate(tripdata_V2$ride_length ~ tripdata_V2$member_casual, FUN = min)

In [None]:
### See the average ride time by each day for members vs casual users orderd by the day of the week
tripdata_V2$day_of_week <- ordered(tripdata_V2$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
aggregate(tripdata_V2$ride_length ~ tripdata_V2$member_casual + tripdata_V2$day_of_week, FUN = mean)

In [None]:
### analyze ridership data by type and weekday
tripdata_V2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>%  #creates weekday field using wday()
  group_by(member_casual, weekday) %>%  #groups by usertype and weekday
  summarise(number_of_rides = n()							#calculates the number of rides and average duration 
  ,average_duration = mean(ride_length)) %>% 		# calculates the average duration
  arrange(member_casual, weekday)								# sorts

# 6- Share

In [None]:
### Let's visualize the number of rides by rider type
tripdata_V2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge")

![drive.google.com/file/d/1nSwZotancXlQFMrw626tZ5Qu3P6kcj3Y/view?usp=sharing](https://)

In [None]:
### Let's create a visualization for average duration
tripdata_V2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) +
  geom_col(position = "dodge")

![https://drive.google.com/file/d/1irQSnwKGfCYJFI1FqpaNr0hfM6RG33wv/view?usp=drive_link](http://)

### findings:
* Member riders hold bigger volume of number of rides than casual riders.
* member riders usage of bicycles increases during working days and casual casual increase during. weekends.
* Casual riders spend more time riding bicycles than member riders.
* average duration for casual riders increases in weekends and for member riders approximately constant.

# 7- Act
### Recommendation:
Based on findings we recommend:
* offer discounts, bonuses or any other techniques on weekends to attract casual riders to buy membership.
* increasing on number of rides by member riders indicate that they using it to go to the work so based on that creating marketing program targeted companies and employees will increase member riders.
* For further analysis we need rides prices to study the relation between pricing model for each type of riders and the number of rides.