# Table of Contents

* [Introduction](#Introduction)
* [Ask](#ask)
    * [Guiding questions](#ask-guiding)
    * [Key tasks](#ask-key)
    * [Deliverable](#ask-deliverable)
* [Prepare](#prepare)
    * [Guiding questions](#prepare-guiding)
    * [Key tasks](#prepare-key)
    * [Deliverable](#prepare-deliverable)
* [Process](#process)
    * [Code](#process-code)
        * [Dependences](#process-code-dependences)
        * [Concatenating](#process-code-concatenating)
        * [Data cleaning](#process-code-data)
        * [Manipulating the data](#process-code-manipulating)
        * [Saving the result as a CSV](#process-code-saving)
    * [Guiding questions](#process-guiding)
    * [Key tasks](#process-key)
    * [Deliverable](#process-deliverable)
* [Analyze](#analyze)
    * [Code](#analyze-code)
        * [Data distribution](#analyze-code-data)
        * [Other variables](#analyze-code-other)
    * [Guiding questions](#analyze-guiding)
    * [Key tasks](#analyze-key)
    * [Deliverable](#analyze-deliverable)
* [Share](#share)
    * [Guiding questions](#share-guiding)
    * [Key tasks](#share-key)
    * [Deliverable](#share-deliverable)
* [Act](#act)
    * [Guiding questions](#act-guiding)
    * [Key tasks](#act-key)
    * [Deliverable](#act-deliverable)
* [Conclusion](#conclusion)

<a id="Introduction"></a>
# Introduction

This is my version of the Google Data Analytics Capstone - Case Study 1. The full document to the case study can be found in the [Google Data Analytics Capstone: Complete a Case Study](https://www.coursera.org/learn/google-data-analytics-capstone) course.

For this project this steps will be followed to ensure its completion:
* It will follow the steps of the data analysis process: Ask, prepare, process, analyze, share, and act.
* Each step will follow its own roadmap with:
    * Code, if needed on the step.
    * Guiding questions, with answers.
    * Key tasks, as a checklist.
    * Deliverable, as a checklist.


<a id="ask"></a>
# Ask

For the ask step, first let's get some context from the cyclistic document:

    Scenario
    
    You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes dierently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
    
    Characters and teams
    Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also oering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.
    Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
    Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them.
    Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.
    


<a id="ask-guiding"></a>
## Guiding questions

* **What is the problem you are trying to solve?**

The main objective is to determine a way to build a profile for annual members and the best marketing strategies to turn casual bike riders into annual members.
    
* **How can your insights drive business decisions?**

The insights will help the marketing team to increase annual members.
    

<a id="ask-key"></a>
## Key tasks

- [x] Identify the business task
- [x] Consider key stakeholders

<a id="ask-deliverable"></a>
## Deliverable

- [x] A clear statement of the business task
    
    Find the keys differences between casual and members riders and how digital midia could influence them

<a id="prepare"></a>
# Prepare

The project will use the data provided by [this kaggle dataset](https://www.kaggle.com/timgid/cyclistic-dataset-google-certificate-capstone). But Google also provided their [own link](https://divvy-tripdata.s3.amazonaws.com/index.html) with the same dataset but expanded with more years and station descriptions.


<a id="prepare-guiding"></a>
## Guiding questions

* **Where is your data located?**

The data is located in a kaggle dataset.

* **How is the data organized?**

The data is separated by month, each on it's own csv.

* **Are there issues with bias or credibility in this data? Does your data ROCCC?**

Bias isn't a problem, the population of the dataset is it's own clients as bike riders. And have full credibility for the same reason. And finally, it's ROCCC because it's reliable, original, comprehensive, current and cited.

* **How are you addressing licensing, privacy, security, and accessibility?**

The company has  their own licence over the dataset. Besides that, the dataset doesn't have any personal information about the riders.

* **How did you verify the data’s integrity?**

All the files have consistent columns and each column has the correct type of data.

* **How does it help you answer your question?**

It may have some key insights about the riders and their riding style

* **Are there any problems with the data?**

It would be good to have some updated information about the bike stations. Also more information about the riders could be useful.

 <a id="prepare-key"></a>
 ## Key tasks

- [x] Download data and store it appropriately.
- [x] Identify how it’s organized.
- [x] Sort and filter the data.
- [x] Determine the credibility of the data.

<a id="prepare-deliverable"></a>
## Deliverable

- [x] A description of all data sources used

The main data source is 12 months (Between april 2020 and march 2021) of riding data provided by the Cicylistic company.

<a id="process"></a>
# Process

This step will prepare the data for analysis. All the csv files will be merged into one file to improve workflow

<a id="process-code"></a>
## Code

<a id="process-code-dependences"></a>
### Dependences

The main dependencie for the project will be tidyverse.

In [None]:
library(tidyverse)

<a id="process-code-concatenating"></a>
### Concatenating
All the csvs files will be concatenated into one dataframe.

In [None]:
csv_files <- list.files(path = "../input", recursive = TRUE, full.names=TRUE)

cyclistic_merged <- do.call(rbind, lapply(csv_files, read.csv))

In [None]:
head(cyclistic_merged)

<a id="process-code-data"></a>
### Data cleaning

<a id="process-code-data-removing"></a>
#### Removing Duplicates

In [None]:
cyclistic_no_dups <- cyclistic_merged[!duplicated(cyclistic_merged$ride_id), ]
print(paste("Removed", nrow(cyclistic_merged) - nrow(cyclistic_no_dups), "duplicated rows"))

<a id="process-code-data-parsing"></a>
#### Parsing datetime columns

In [None]:
cyclistic_no_dups$started_at <- as.POSIXct(cyclistic_no_dups$started_at, "%Y-%m-%d %H:%M:%S")
cyclistic_no_dups$ended_at <- as.POSIXct(cyclistic_no_dups$ended_at, "%Y-%m-%d %H:%M:%S")

<a id="process-code-manipulating"></a>
### Manipulating the data
New columns will help improve calculation time in the future

<a id="process-code-manipulating-riding"></a>
#### ride_time_m
Represents the total time of a bike ride, in minutes

In [None]:
cyclistic_no_dups <- cyclistic_no_dups %>%
    mutate(ride_time_m = as.numeric(cyclistic_no_dups$ended_at - cyclistic_no_dups$started_at) / 60)
summary(cyclistic_no_dups$ride_time_m)

<a id="process-code-manipulating-year"></a>
#### year_month
Separate the year and the month into one column

In [None]:
cyclistic_no_dups <- cyclistic_no_dups %>%
    mutate(year_month = paste(strftime(cyclistic_no_dups$started_at, "%Y"),
                              "-",
                              strftime(cyclistic_no_dups$started_at, "%m"),
                              paste("(",strftime(cyclistic_no_dups$started_at, "%b"), ")", sep="")))
unique(cyclistic_no_dups$year_month)

<a id="process-code-manipulating-weekday"></a>
#### weekday
The weekday will be useful to determine patterns of travels in the week

In [None]:
cyclistic_no_dups <- cyclistic_no_dups %>%
    mutate(weekday = paste(strftime(cyclistic_no_dups$ended_at, "%u"), "-", strftime(cyclistic_no_dups$ended_at, "%a")))
unique(cyclistic_no_dups$weekday)

<a id="process-code-manipulating-start"></a>
#### start_hour
Getting the hour of the day also may be useful for intra day analysis

In [None]:
cyclistic_no_dups <- cyclistic_no_dups %>%
    mutate(start_hour = strftime(cyclistic_no_dups$ended_at, "%H"))
unique(cyclistic_no_dups$start_hour)

<a id="process-code-saving"></a>
### Saving the result as a CSV

In [None]:
cyclistic_no_dups %>%
  write.csv("cyclistic_clean.csv")

<a id="process-guiding"></a>
## Guiding questions

* **What tools are you choosing and why?**

I'm using R for this project, for two main reasons: Because of the large dataset and to gather experience with the language.

* **Have you ensured your data’s integrity?**

Yes, the data is consistent throughout the columns.

* **What steps have you taken to ensure that your data is clean?**

First the duplicated values where removed, then the columns where formatted to their correct format.

* **How can you verify that your data is clean and ready to analyze?**

It can be verified by this notebook.

* **Have you documented your cleaning process so you can review and share those results?**

Yes, it's all documented in this R notebook.

<a id="process-key"></a>
## Key tasks

- [x] Check the data for errors.
- [x] Choose your tools.
- [x] Transform the data so you can work with it eectively
- [x] Document the cleaning process.

<a id="process-deliverable"></a>
## Deliverable

- [x] Documentation of any cleaning or manipulation of data

<a id="analyze"></a>
# Analyze
The data exploration will consist of building a profile for annual members and how they differ from casual riders.

Putting in a new variable with a simpler name will help reduce some typing in the future.

<a id="analyze-code"></a>
## Code

In [None]:
# This function help to resize the plots
fig <- function(width, heigth){options(repr.plot.width = width, repr.plot.height = heigth)}

In [None]:
cyclistic <- cyclistic_no_dups
head(cyclistic)

To quick start, let's generate a summary of the dataset

In [None]:
summary(cyclistic)

One thing that immediately catches the attention is ride_time_m. This field has negative values, and the biggest value is 58720.03, which is 40 days and 46 hours. This field will be explored further in the document.

<a id="analyze-code-data"></a>
### Data distribution
Here we want to try to answer the most basic questions about how the data is distributed.

<a id="analyze-code-data-casuals"></a>
#### Casuals vs members
How much of the data is about members and how much is about casuals?

In [None]:
cyclistic %>% 
    group_by(member_casual) %>% 
    summarise(count = length(ride_id),
              '%' = (length(ride_id) / nrow(cyclistic)) * 100)

In [None]:
fig(16,8)
ggplot(cyclistic, aes(member_casual, fill=member_casual)) +
    geom_bar() +
    labs(x="Casuals x Members", title="Chart 01 - Casuals x Members distribution")

As we can see on the member x casual table, members have a bigger proporcion of the dataset, composing ~59%, ~19% bigger than the count of casual riders.

<a id="analyze-code-data-month"></a>
#### Month
How much of the data is distributed by month?

In [None]:
cyclistic %>%
    group_by(year_month) %>%
    summarise(count = length(ride_id),
              '%' = (length(ride_id) / nrow(cyclistic)) * 100,
              'members_p' = (sum(member_casual == "member") / length(ride_id)) * 100,
              'casual_p' = (sum(member_casual == "casual") / length(ride_id)) * 100,
              'Member x Casual Perc Difer' = members_p - casual_p)

In [None]:
cyclistic %>%
  ggplot(aes(year_month, fill=member_casual)) +
    geom_bar() +
    labs(x="Month", title="Chart 02 - Distribution by month") +
    coord_flip()

Some considerations can be taken by this chart:
* There's more data points at the last semester of 2020.
* The month with the biggest count of data points was August with ~18% of the dataset.
* In all months we have more members' rides than casual rides (Maybe because of returning members).
* The difference of proporcion of member x casual is smaller in the last semester of 2020.

The distribution looks cyclical. Let's compare it with climate data for Chicago.
The data will be taken by [Climate of Chicago](https://en.wikipedia.org/wiki/Climate_of_Chicago) (Daily mean °C, 1991–2020).

In [None]:
chicago_mean_temp <- c(-3.2, -1.2, 4.4, 10.5, 16.6, 22.2, 24.8, 23.9, 19.9, 12.9, 5.8, -0.3)
month <- c("001 - Jan","002 - Feb","003 - Mar","004 - Apr","005 - May","006 - Jun","007 - Jul","008 - Aug","009 - Sep","010 - Oct","011 - Nov","012 - Dec")

data.frame(month, chicago_mean_temp) %>%
    ggplot(aes(x=month, y=chicago_mean_temp)) +
    labs(x="Month", y="Mean temperature", title="Chart 02.5 - Mean temperature for Chicago (1991-2020)") +
    geom_col()


The main takeaway is:
* Temperature heavily influence the volume of rides in the month.

<a id="analyze-code-data-weekday"></a>
#### Weekday
How much of the data is distributed by weekday?

In [None]:
cyclistic %>%
    group_by(weekday) %>% 
    summarise(count = length(ride_id),
              '%' = (length(ride_id) / nrow(cyclistic)) * 100,
              'members_p' = (sum(member_casual == "member") / length(ride_id)) * 100,
              'casual_p' = (sum(member_casual == "casual") / length(ride_id)) * 100,
              'Member x Casual Perc Difer' = members_p - casual_p)

In [None]:
ggplot(cyclistic, aes(weekday, fill=member_casual)) +
    geom_bar() +
    labs(x="Weekdady", title="Chart 03 - Distribution by weekday") +
    coord_flip()

It's interesting to see:
* The biggest volume of data is on the weekend.
* Saturday has the biggest data points.
* Members may have the biggest volume of data, besides on saturday. On this weekday, casual take place as having most data points.
* Weekends have the biggest volume of casual, starting on friday, a ~20% increase.

<a id="analyze-code-data-hour"></a>
#### Hour of the day

In [None]:
cyclistic %>%
    group_by(start_hour) %>% 
    summarise(count = length(ride_id),
          '%' = (length(ride_id) / nrow(cyclistic)) * 100,
          'members_p' = (sum(member_casual == "member") / length(ride_id)) * 100,
          'casual_p' = (sum(member_casual == "casual") / length(ride_id)) * 100,
          'member_casual_perc_difer' = members_p - casual_p)

In [None]:
cyclistic %>%
    ggplot(aes(start_hour, fill=member_casual)) +
    labs(x="Hour of the day", title="Chart 04 - Distribution by hour of the day") +
    geom_bar()

From this chart, we can see:
* There's a bigger volume of bikers in the afternoon.
* We have more members during the morning, mainly in between 5am and 11am
* And more casuals between 11pm and 4am

This chart can be expanded ween seen it divided by day of the week.

In [None]:
cyclistic %>%
    ggplot(aes(start_hour, fill=member_casual)) +
    geom_bar() +
    labs(x="Hour of the day", title="Chart 05 - Distribution by hour of the day divided by weekday") +
    facet_wrap(~ weekday)

There's a clear diferrence between the midweek and weekends. Let's generate charts for this two configurations.

In [None]:
cyclistic %>%
    mutate(type_of_weekday = ifelse(weekday == '6 - Sat' | weekday == '7 - Sun',
                                   'weekend',
                                   'midweek')) %>%
    ggplot(aes(start_hour, fill=member_casual)) +
    labs(x="Hour of the day", title="Chart 06 - Distribution by hour of the day in the midweek") +
    geom_bar() +
    facet_wrap(~ type_of_weekday)
    

The two plots differs on some key ways:
* While the weekends have a smooth flow of data points, the midweek have a more steep flow of data.
* The count of data points doesn't have much meaning knowing each plot represents a different amount of days.
* There's a big increase of data points in the midween between 6am to 8am. Then it fall a bit.
* Another big increase is from 5pm to 6pm.
* During the weekend we have a bigger flow of casuals between 11am to 6pm.

It's fundamental to question who are the riders who use the bikes during this time of day.
We can assume some factors, one is that members may are people who use the bikes during they 
daily routine activities, like go to work (data points between 5am to 8am in midweek), go back from work (data points between 5pm to 6pm).


<a id="analyze-code-data-rideable"></a>
#### Rideable type

In [None]:
cyclistic %>%
    group_by(rideable_type) %>% 
    summarise(count = length(ride_id),
          '%' = (length(ride_id) / nrow(cyclistic)) * 100,
          'members_p' = (sum(member_casual == "member") / length(ride_id)) * 100,
          'casual_p' = (sum(member_casual == "casual") / length(ride_id)) * 100,
          'member_casual_perc_difer' = members_p - casual_p)

In [None]:
ggplot(cyclistic, aes(rideable_type, fill=member_casual)) +
    labs(x="Rideable type", title="Chart 07 - Distribution of types of bikes") +
    geom_bar() +
    coord_flip()

It's important to note that:
* Docked bikes have the biggest volume of rides, but this can be that the company may have more docked bikes.
* Members have a bigger preference for classic bikes, 56% more.
* Also for electric bikes.

<a id="analyze-code-other"></a>
### Other variables

Now let's get a look at some variables of the dataset.

<a id="analyze-code-other-ride"></a>
#### ride_time_m

First get some summarized statistic from the dataset

In [None]:
summary(cyclistic$ride_time_m)

The min and the max may be a problem to plot some charts. How the ride time of some bikes is a negative value? Maybe there's some malfunction stations return bad dates.
Checking the start and end stations doesn't appear to have a problem.

In [None]:
ventiles = quantile(cyclistic$ride_time_m, seq(0, 1, by=0.05))
ventiles

We can see that:
* The difference between 0% and 100% is 87770.0 minutes.
* The difference between 5% and 95% is 69.95 minutes.
Because of that, in the analysis of this variable we are going to use a subset of the dataset without outliners. The subset will contain 95% of the dataset.

In [None]:
cyclistic_without_outliners <- cyclistic %>% 
    filter(ride_time_m > as.numeric(ventiles['5%'])) %>%
    filter(ride_time_m < as.numeric(ventiles['95%']))

print(paste("Removed", nrow(cyclistic) - nrow(cyclistic_without_outliners), "rows as outliners" ))

<a id="analyze-code-other-ride-multivariable"></a>
##### ride_time_m multivariable exploration

One of the first interactions between the columns and ride_length is a box plot, with subplots based on the casual_members column. Also the summarized data.

In [None]:
cyclistic_without_outliners %>% 
    group_by(member_casual) %>% 
    summarise(mean = mean(ride_time_m),
              'first_quarter' = as.numeric(quantile(ride_time_m, .25)),
              'median' = median(ride_time_m),
              'third_quarter' = as.numeric(quantile(ride_time_m, .75)),
              'IR' = third_quarter - first_quarter)

In [None]:
ggplot(cyclistic_without_outliners, aes(x=member_casual, y=ride_time_m, fill=member_casual)) +
    labs(x="Member x Casual", y="Riding time", title="Chart 08 - Distribution of Riding time for Casual x Member") +
    geom_boxplot()

It's important to note that:
* Casual have more riding time thant members.
* Mean and IQR is also bigger for casual.

Let's see if we can extract more informations when ploting with weekday.

In [None]:
ggplot(cyclistic_without_outliners, aes(x=weekday, y=ride_time_m, fill=member_casual)) +
    geom_boxplot() +
    facet_wrap(~ member_casual) +
    labs(x="Weekday", y="Riding time", title="Chart 09 - Distribution of Riding time for day of the week") +
    coord_flip()

* Riding time for members keeps unchanged during the midweek, increasing during weekends
* Casuals follow a more curve distribution, peaking on sundays and valleying on wednesday/thursday.

Lastly, let's do rideable_type.

In [None]:
ggplot(cyclistic_without_outliners, aes(x=rideable_type, y=ride_time_m, fill=member_casual)) +
    geom_boxplot() +
    facet_wrap(~ member_casual) +
    labs(x="Rideable type", y="Riding time", title="Chart 10 - Distribution of Riding time for rideeable type") +
    coord_flip()

* Electric bikes have less riding time than other bikes, for both members and casuals.
* Docked bikes have more riding time. And for docked bikes, members have more riding time than casuals. 

<a id="analyze-guiding"></a>
## Guiding questions

* **How should you organize your data to perform analysis on it?**

The data has been organized into a single CSV concatenating all the files from the dataset.

* **Has your data been properly formatted?**

Yes, all the columns have their correct data type.

* **What surprises did you discover in the data?**

One of the main surprises is how members differ from casuals when analysed from weekdays. Also that members have less riding time than casual.

* **What trends or relationships did you find in the data?**
    * There are more members than casuals in the dataset.
    * There are more data points in the last semester of 2020.
    * There are more of a difference between the flow of members/casual from midweek to weekends.
    * Members use bikes on schedules that differs from casual.
    * Members have less riding time.
    * Members tend to prefer docked bikes.


* **How will these insights help answer your business questions?**

This insights helps to build a profile for members.

<a id="analyze-key"></a>
## Key tasks

- [x] Aggregate your data so it’s useful and accessible.
- [x] Organize and format your data.
- [x] Perform calculations.
- [x] Identify trends and relationships.

<a id="analyze-deliverable"></a>
## Deliverable

- [x] A summary of your analysis

<a id="share"></a>
# Share

The share phase is usually done by building a presentation. But for kaggle, the best representation of the analysis and conclusions is it's own notebook.

Let's go through the main finds and try to arrive at a conclusion.


What we know about the dataset:
* Members have the biggest proportion of the dataset, ~19% bigger thand casuals.
* There's more data points at the last semester of 2020.
* The month with the biggest count of data points was August with ~18% of the dataset.
* In all months we have more members' rides than casual rides.
* The difference of proporcion of member x casual is smaller in the last semester of 2020.
* Temperature heavily influences the volume of rides in the month.
* The biggest volume of data is on the the weekend.
* There's a bigger volume of bikers in the afternoon.

It's possible to notice that the distribution of rides by month is cyclical through years, as seen on chart 02 and it's influenced by the temperature. The remaining question is: Why are there more members than casual? One plausible answer is that members have a bigger need for the bikes than casuals, as can be seen on how there are more members than casuals on cold months.

Besides that, we have more bike rides on the weekends. Maybe because on those days the bikes were utilized for more recreational ways. This even more plausible when knowing that *There's a bigger volume of bikers in the afternoon*.

Now for how members differs from casuals:
* Members may have the biggest volume of data, besides on saturday. On this weekday, casuals take place as having the most data points.
* Weekends have the biggest volume of casuals, starting on friday, a ~20% increase.
* We have more members during the morning, mainly between 5am and 11am. And more casuals between 11pm and 4am.
* There's a big increase of data points in the midweek between 6am to 8am for members. Then it fell a bit. Another big increase is from 5pm to 6pm.
* During the weekend we have a bigger flow of casuals between 11am to 6pm.
* Members have a bigger preference for classic bikes, 56% more.
* Casuals have more riding time than members.
* Riding time for members keeps unchanged during the midweek, increasing during weekends.
* Casuals follow a more curve distribution, peaking on sundays and valleying on wednesday/thursday.

What we can take from this information is that members have a more fixed use for bikes besides casuals. Their uses is for more routine activities, like:
* Go to work.
* Use it as an exercise.

This can be proven we state that *we have more members in between 6am to 8am and at 5pm to 6pm*. Also, members may have set routes when using the bikes, as proven by *riding time for members keeps unchanged during the midweek, increasing during weekends*. The bikes is also 
heavily used for recreation on the weekends, when riding time increases and casuals take place. 

Members also have a bigger preference for classic bikes, so they can exercise when going to work.

Concluding:
* Members use the bikes for fixed activities, one of those is going to work.
* Bikes are used for recreation on the weekends.
* Rides are influenced by temperature.


<a id="share-guiding"></a>
## Guiding questions

* **Were you able to answer the question of how annual members and casual riders use Cyclistic bikes differently?**

Yes. The data points to several differences between casuals and members.

* **What story does your data tell?**

The main story the data tells is that members have set schedules, as seen on chart 06 on key timestamps. Those timestamps point out that members use the bikes for routine activities, like going to work. Charts like 08 also point out that they have less riding time, because they have a set route to take.

* **How do your findings relate to your original question?**

The findings build a profile for members, relating to "*Find the keys differences between casuals and annual riders*", also knowing whey they use the bikes helps to find "*How digital media could influence them*".

* **Who is your audience? What is the best way to communicate with them?**

The main target audience is my cyclistic marketing analytics team and Lily Moreno. The best way to communicate is through a slide presentation of the findings.

* **Can data visualization help you share your findings?**

Yes, the main core of the finds is through data visualization.

* **Is your presentation accessible to your audience?**

Yes, the plots were made using vibrant colors, and corresponding labels.

<a id="share-key"></a>
## Key tasks

- [x] Determine the best way to share your findings.
- [x] Create effective data visualizations.
- [x] Present your findings.
- [x] Ensure your work is accessible.

<a id="share-deliverable"></a>
## Deliverable

- [x] Supporting visualizations and key findings

<a id="act"></a>
# Act

The act phase would be done by the marketing team of the company. The main takeaway will be the top three recommendations for the marketing.

<a id="act-guiding"></a>
## Guiding questions

* **What is your final conclusion based on your analysis?**

Members and casual have different habits when using the bikes. The conclusion is further stated on the share phase.

* **How could your team and business apply your insights?**

The insights could be implemented when preparing a marketing campaign for turning casual into members. The marketing can have a focus on workers as a green way to get to work.

* **What next steps would you or your stakeholders take based on your findings?**

Further analysis could be done to improve the findings, besides that, the marketing team can take the main information to build a marketing campaign.

* **Is there additional data you could use to expand on your findings?**
    * Mobility data.
    * Improved climate data.
    * More information members.

<a id="act-key"></a>
## Key tasks

- [x] Create your portfolio.
- [x] Add your case study.
- [x] Practice presenting your case study to a friend or family member.

<a id="act-deliverable"></a>
## Deliverable

* Your top three recommendations based on your analysis
1. Build a marketing campaign focusing on show how bikes help people to get to work, while maintaining the planet green and avoid traffic. The ads could be show on  professional social networks.
2. Increase benefits for riding during cold months. Coupons and discounts could be handed out.
3. As the bikes are also used for recreations on the weekends, ads campaigns could also be made showing people using the bikes for exercise during the weeks. The ads could focus on how practical and **consistent** the bikes can be.

<a id="conclusion"></a>
# Conclusion

The [Google Analytics Professional Certificate](https://www.coursera.org/professional-certificates/google-data-analytics?utm_source=gg&utm_medium=sem&utm_campaign=15-GoogleDataAnalytics-LATAM&utm_content=15-GoogleDataAnalytics-LATAM&campaignid=12686019520&adgroupid=120140812253&device=c&keyword=google%20analytics%20course&matchtype=b&network=g&devicemodel=&adpostion=&creativeid=512414119178&hide_mobile_promo&gclid=EAIaIQobChMIwPGyn7f68AIVAgyRCh2nCwfVEAAYAyAAEgKjpvD_BwE) teached me a lot and the R language is really useful for analysing data (although I prefer preffer pandas). This took me more time than I expected, but it was fun.

Sorry for my bad english during the notebook. And thanks for anyone who is reading this.