In [None]:
library(gapminder)
library(dplyr)
library(ggplot2)

## Arrange and filter
You use arrange() to sort observations in ascending or descending order of a particular variable
<br>You use filter() to filter observations with logical operator

In [None]:
# Sort in ascending order of lifeExp
gapminder %>%
  arrange(lifeExp)

  
# Sort in descending order of lifeExp
gapminder %>%
  arrange(desc(lifeExp))


In [None]:
# Filter for the year 1957, then arrange in descending order of population
gapminder %>%
  filter(year == 1957) %>%
  arrange(desc(pop))

## mutate
Using mutate to change or create a column

In [None]:
# find the countries with the highest life expectancy, in months, in the year 2007.
gapminder %>%
  filter(year == 2007) %>%
  mutate(lifeExpMonths = 12 * lifeExp) %>%
  arrange(desc(lifeExpMonths))

## visualization in ggplot2

In [None]:
#Comparing population and GDP per capita

gapminder_1952 <- gapminder %>%
  filter(year == 1952)

# Change to put pop on the x-axis and gdpPercap on the y-axis
ggplot(gapminder_1952, aes(x = pop, y = gdpPercap)) +
  geom_point()

In [None]:
# Comparing population and life expectancy
ggplot(gapminder_1952, aes(x = pop, y = lifeExp) +
  geom_point()
  
# You might notice the points are crowded towards the left side of the plot, making them hard to distinguish. 

### Putting the x-axis on a log scale

You previously created a scatter plot with population on the x-axis and life expectancy on the y-axis. Since population is spread over several orders of magnitude, with some countries having a much higher population than others, it's a good idea to put the x-axis on a log scale.

In [None]:
ggplot(gapminder_1952, aes(x = pop, y = lifeExp)) +
  geom_point() +
  scale_x_log10()
  
# This makes it easy to see that there isn't a correlation between population and life expectancy.

### Putting the x- and y- axes on a log scale
Suppose you want to create a scatter plot with population on the x-axis and GDP per capita on the y-axis. Both population and GDP per-capita are better represented with log scales, since they vary over many orders of magnitude.

In [None]:
ggplot(gapminder_1952, aes(x = pop, y = gdpPercap))+
  geom_point() +
  scale_x_log10() +
  scale_y_log10()

### additional aesthetics
color and size

In [None]:
ggplot(gapminder_1952, aes(x = pop, y = lifeExp, color = continent, size = gdpPercap)) +
  geom_point() +
  scale_x_log10()

### Faceting
Creating a subgraph for each continent

You've learned to use faceting to divide a graph into subplots based on one of its variables, such as the continent.

In [None]:
ggplot(gapminder_1952, aes(x = pop, y = lifeExp)) + 
  geom_point() +
  scale_x_log10() +
  facet_wrap(~ continent)

In [None]:
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
  geom_point() +
  scale_x_log10() +
  facet_wrap(~ year)

## Summarize
Summarizing the median life expectancy

In [None]:
gapminder %>%
  summarize(medianLifeExp = median(lifeExp))

In [None]:
gapminder %>%
  filter(year == 1957) %>%
  summarize(medianLifeExp = median(lifeExp), maxGdpPercap = max(gdpPercap))

## Group by


In [None]:
# Find median life expectancy and maximum GDP per capita in each year

gapminder %>%
  group_by(year) %>%
  summarize(medianLifeExp = median(lifeExp), maxGdpPercap = max(gdpPercap))
  
#Interesting: notice that median life expectancy across countries is generally going up over time, but maximum GDP per capita is not. 

In [None]:
# Find median life expectancy and maximum GDP per capita in each continent in 1957
gapminder %>%
  group_by(continent) %>%
  filter(year == 1957) %>%
  summarize(medianLifeExp = median(lifeExp), maxGdpPercap = max(gdpPercap))

## Visiualizing summarized data


In [None]:
by_year <- gapminder %>%
  group_by(year) %>%
  summarize(medianLifeExp = median(lifeExp),
            maxGdpPercap = max(gdpPercap))

# Create a scatter plot showing the change in medianLifeExp over time
ggplot(by_year, aes(x = year, y = medianLifeExp)) +
  geom_point() +
  expand_limits(y = 0)

In [None]:
# Summarize medianGdpPercap within each continent within each year: by_year_continent
by_year_continent <- gapminder %>%
  group_by(continent, year) %>%
  summarize(medianGdpPercap = median(gdpPercap))

# Plot the change in medianGdpPercap in each continent over time
ggplot(by_year_continent, aes(x = year, y = medianGdpPercap, color = continent)) +
  geom_point() +
  expand_limits(y = 0)

In [None]:
# Summarize the median GDP and median life expectancy per continent in 2007
by_continent_2007 <- gapminder %>% 
  group_by(continent) %>%
  filter(year == 2007) %>%
  summarize(medianLifeExp = median(lifeExp), medianGdpPercap = median(gdpPercap))

# Use a scatter plot to compare the median GDP and median life expectancy
ggplot(by_continent_2007, aes(x = medianGdpPercap, y = medianLifeExp, color = continent)) +
  geom_point() +
  expand_limits(y = 0)

## Line Plots
Visualizing median GDP per capita by continent over time

In [None]:
# Summarize the median gdpPercap by year & continent, save as by_year_continent
by_year_continent <- gapminder %>%
  group_by(year, continent) %>%
  summarize(medianGdpPercap = median(gdpPercap)) 

# Create a line plot showing the change in medianGdpPercap by continent over time
ggplot(by_year_continent, aes(x = year, y = medianGdpPercap, color = continent)) +
  geom_line() +
  expand_limits(y = 0)

## Bar plots
Visualizing median GDP per capita by continent

A bar plot is useful for visualizing summary statistics, such as the median GDP in each continent.

In [None]:
# Summarize the median gdpPercap by year and continent in 1952
by_continent <- gapminder %>%
  group_by(continent) %>%
  filter(year == 1952) %>%
  summarize(medianGdpPercap = median(gdpPercap))

# Create a bar plot showing medianGdp by continent
ggplot(by_continent, aes(x = continent, y = medianGdpPercap))+
  geom_col()

## Histogram
A histogram is useful for examining the distribution of a numeric variable. In this exercise, you'll create a histogram showing the distribution of country populations in the year 1952.

In [None]:
gapminder_1952 <- gapminder %>%
  filter(year == 1952)

# Create a histogram of population (pop)
ggplot(gapminder_1952, aes(x = pop)) +
  geom_histogram()
  
#  Notice that most of the distribution is in the smallest (leftmost) bins.

Visualizing population with x-axis on a log scale

In the last exercise you created a histogram of populations across countries. You might have noticed that there were several countries with a much higher population than others, which causes the distribution to be very skewed, with most of the distribution crammed into a small part of the graph. (Consider that it's hard to tell the median or the minimum population from that histogram).

In [None]:
gapminder_1952 <- gapminder %>%
  filter(year == 1952)

# Create a histogram of population (pop), with x on a log scale
ggplot(gapminder_1952, aes(x = pop)) +
  geom_histogram()+
  scale_x_log10()

## Boxplots
Comparing GDP per capita across continents

A boxplot is useful for comparing a distribution of values across several groups. In this exercise, you'll examine the distribution of GDP per capita by continent. Since GDP per capita varies across several orders of magnitude, you'll need to put the y-axis on a log scale.

In [None]:
gapminder_1952 <- gapminder %>%
  filter(year == 1952)

# Create a boxplot comparing gdpPercap among continents
ggplot(gapminder_1952, aes(x = continent, y = gdpPercap)) +
  geom_boxplot() +
  scale_y_log10() +
  ggtitle("Comparing GDP per capita across continents")