# Set-up

🚀 Get the R magic going:

In [0]:
%load_ext rpy2.ipython

In [0]:
%%R
install.packages("ggplot2")   #install package once then use loading the library
library("ggplot2")  #use installed package

ggplot2 comes with a selection of built-in datasets. Let's use ggplot2 [mpg](https://ggplot2.tidyverse.org/reference/mpg.html) dataset.

In [0]:
%%R
head(mpg)

Or you can use read.csv to load data from CSV:

In [0]:
%%R
df <- read.csv("/content/sample_data/california_housing_test.csv")

# Scatterplot

This is a minimal scatterplot with mpg dataset:

In [0]:
%%R
ggplot(mpg, aes(x = displ, y = cty)) + #data and mappings
 geom_point() #geometry

# Bar chart

This is a minimal scatterplot with mpg dataset:

In [0]:
%%R
ggplot(mpg, aes(x = manufacturer)) +
 geom_bar()

🧙the magic is in the geom_bar() stat! The default for the `stat`, i.e., the statistical transformation to use is set to 'count'.

In [0]:
%%R
?geom_bar

# Boxplot

In [0]:
%%R
ggplot(mpg, aes(x = manufacturer, y = hwy)) +
 geom_boxplot()

# Histogram

In [0]:
%%R
ggplot(mpg, aes(x = hwy)) +
 geom_histogram()

# Customizations
- Axis labels
- Title
- Graph size


In [0]:
%%R -h 400 -w 700 #set height and width of the figure

ggplot(mpg, aes(x = displ, y = cty, color=manufacturer)) +
 geom_point(alpha=0.5) +  #transparency
 xlab('displacement') +
 ylab('city (mpg)') +
 ggtitle('mpg dataset')


# Exercises

## 🤔 Exercise 1

Use `ggplot2` to create a scatterplot of the `iris` dataset of `Sepal.Length` vs. `Sepal.Width` where the dots are colored by `Species`. Set axes labels and title. Use alpha, add jitter and resize the figure to deal with overplotting.

Hints: 
- `iris` is a part of ggplot2 built-in datasets, run `head(iris)` to see the dataframe
- Add `geom_jitter()` to the plot to add jitter 

## 😜 Exercise 2 

Use `ggplot2` to create a scatterplot of the `iris` dataset of `Sepal.Length` vs. `Sepal.Width` with the circles radius set to `Petal.Length`. Set axes labels, title. Use transparency and resize the figure to deal with overplotting.

Hints:
- Use `size` on `geom_point` to set the size of the circles


## 🤔 Exercise 3

Use `ggplot2` to create a scatterplot of `alt` vs. `ptime` for the `SMO-VOR-2015` dataset. Set axes labels and title. Use transparency and resize the figure to deal with overplotting. Add a smooth regression line to the plot.
Map color to `month`.

Hints:
- Make `ptime` a datetime with:
```
df$ptime <- as.POSIXct(df$ptime,"%Y-%m-%d %H:%M:%S", tz = "America/Los_Angeles")
```
- Add `geom_smooth()` to the plot to add a smooth regression line

## 🤔 Exercise 4

Use `ggplot2` to create a boxplot of `alt` by `month` for the `SMO-VOR-2015` dataset. Set axes labels and title. Resize the figure as needed.

Hints:
- Order `month` with:
```R
df$month = factor(df$month, levels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"))
```

## 😜 Exercise 5

Use `ggplot2` to create an histogram of `alt` for the `SMO-VOR-2015` dataset. Set axes labels and title. Resize the figure as needed.

## 😜 Exercise 6

Use `ggplot2` to create an histogram of the `alt` for the `SMO-VOR-2015` dataset with faceting per `month`. Set axes labels and title. Resize the figure as needed.

Hints:
- Order `month` with:
```R
df$month = factor(df$month, levels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"))
```
- Add `facet_wrap(~month)` to the plot for faceting.