## Set Up

It is better to use `conda install -c r r-tidyverse` to install packages.

* [Using R on Jupyter Notebook](https://dzone.com/articles/using-r-on-jupyternbspnotebook)
* [Using `conda` to install R packages](https://community.rstudio.com/t/failing-to-install-r-packages/104963/5)

## R base functions

| Function | Example | Explanation
| :------- |:------- | :-----------
| rm(list=ls()) | rm(list=ls()) | Removes all the current variables in the workspace.
| install.packages() | install.packages('gapminder') | Installs the given package.
| library() | library('gapminder') | Loads the given package into the workspace.
| detach() | detach('gapminder', unload=TRUE) | Removes the given package from the workspace.
| c() | c('a', 'b', 'c') | (Stands for concatenate) Creates a vector.
| seq() | seq(from=1, to=10, by=2) | (Stands for sequence) Creates a sequence.
| length() | length(v) | Returns the length of vector `v`.
| class() | class(v) | Returns the class of vector `v`.
| as.character() | as.character(v) | Turns the elements of vector `v` into characters.
| as.numeric() | as.numeric(v) | Turns the elements of vector `v` into numeric.
| as.factor() | as.factor(v) | Turns the elements of vector `v` into factors.
| sort() | sort(v, decreasing=FALSE) | Returns the sorted vector in increasing order.
| order() | order(v) | Returns the index vector (i.e. it's index in the original vector `v`)
| rank() | rank(v) | Sorts the vector, and returns the indices of the original vector `v` in the sorted order.
| max() | max(v) | Returns the maximum value in vector `v`.
| min() | min(v) | Returns the minimum value in vector `v`.
| which.max() | which.max(v) | Returns the index of the maximum value in vector `v`.
| which.min() | which.min(v) | Returns the index of the minimum value in vector `v`.
| which() | which(df\$x==10) | Returns the index/indices of the entries that satisfy the condition.
| match() | match(10, df\$x) | Returns the index of the first entry that satisfies the condition.
| %in% | 10 %in% df\$x | Returns `TRUE` if the value is in the object.
| getwd() | getwd() | Returns the working directory.
| setwd() | setwd(choose.dir()) | Allows you to browse your files to set the working directory.
| ls() | ls() | Lists all the objects in the workspace.
| set.seed() | set.seed(5) | Set the seed of R‘s random number generator. This is useful to ensure reproducible results.
| make.names() | make.names(name(v)) | Makes syntactically valid names out of character vectors.
| str() | str(o) | Displays the internal structure of object `o`.
| names() | names(o) | Displays the column names of object `o`.
| dim() | dim(o) | Displays the dimensions of object `o`.
| table() | table(o) | (Think cross-tabs in SPSS) It is a count of the categorical values.
| prop.table() | prop.table(table(o)) | Proportions table.

## `readr` functions

| Function | Example | Explanation
| :------- |:------- | :-----------
| read_csv() | read_csv(file, col_types=cols()) | Reads in a `.csv` file. `col_types=cols()` is quite important as it suppresses the column specification message.

## `dplyr` functions

| Function | Example | Explanation
| :------- |:------- | :-----------
| filter() | filter(df, col1<5) | Subsets the specified rows from `df`.
| select() | select(df, col1, col2, col3, ...) | Subsets the specified columns from `df`.
| mutate() | mutate(df, col_name=values) | Adds a column to the `df`.
| arrange() | arrange(df, col) | Arranges/sorts rows within column(s) of interest.
| summarise() | summary(mean=mean(v), sd=sd(v)) | Creates a new data frame with a summary of what's inside the brackets.
| pull() | pull(object) | Literally 'pulls' the value(s) out from the object and returns it.
| group_by() | group_by(col) | Takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". `ungroup()` removes grouping.
| distinct() | Retains only unique/distinct rows from an input tbl.

## `tidyr` functions

| Function | Example | Explanation
| :------- |:------- | :-----------
| gather() | gather(data, key, value, selected_cols) | Takes the selected columns and collapses into key-value pairs, duplicating all other columns.
| spread() | spread(data, key, value) | Spreads a key-value pair across multiple columns.
| separate() | separate(data, col, into, sep) | Turns a single column into multiple columns.
| unite() | unite(data, col, selected_cols, sep, remove=TRUE) | Paste together the selected columns into one (and by default removes the separate columns).

## `ggplot2` functions

**One variable: Discrete**

| Function | Explanation
| :------- | :-----------
| geom_bar() |Creates a bar plot.

**One variable: Continuous**

| Function | Explanation
| :------- | :----------
| geom_histogram() | Creates a histogram plot
| geom_freqpoly() | Creates a frequency polygon.
| geom_density() | Creates a smooth density plot.
| stat_ecdf() | Creates an empiracl cumulative density function plot.
| geom_boxplot() | Creates a box and whisker plot.

**Two variables: Discrete/Categorical and Continuous**

| Function | Explanation
| :------- | :-----------
| geom_boxplot() | Creates a boxplot.
| geom_violin | Creates a violin plot.
| geom_jitter() | Creates a strip plot. This is like a scatter plot, `geom_point()`. However, this plots points along a straight line which could lead to an overplotting issue. The jitter adds horizontal noise to the points so we can see where each individual point lies.

**Two variables: Continuous and Continuous**

| Function | Explanation
| :------- | :-----------
| geom_point() | Creates a scatter plot.
| geom_smooth() | Adds smoothed lines such as a regression line.
| geom_line() | Creates a line plot (e.g. time series plot)

**Scatter plot matrices**

| Function | Explanation
| :------- | :-----------
| pairs() | Creates a graphical representation of the correlations between the numeric features (base `R`).
| corrplot(cor()) | This does the same, but is especially handy when datasets are large (`corrplot` package).

**Grouping within aesthetics**

| Function | Explanation
| :------- | :-----------
| ggplot(aes(df, group, colour, fill)) | Groups values together by colour.

**Scaling the axis**

| Function | Explanation
| :------- | :-----------
| scale_x_continuous(trans="") | Transforms the $x$ axis.
| scale_y_continuout(trans="") | Transforms the $y$ axis.
| scale_x_log10() | Log transforms the $x$ axis.
| scale_y_log10() | Log transforms the $y$ axis.
| scale_x_reverse() | Reverses the $x$ axis.
| scale_y_reverse() | Reverses the $y$ axis.

**Arranging plots side-by-side**

| Function | Explanation
| :------- | :-----------
| facet_grid() | Allows you to facet up to 2 variables using columns to represent one variable and rows to represent the other.
| facet_wrap() | This will wrap graphs to another row for neatness.
| **Note:** | The nice thing about faceting compared with `grid.arrange()` is that it keeps the axes fixed across all plots. Think `sharex=True` and `sharey=True` in Python.
| grid.arrange() | Arranges plots side-by-side (``gridExtra`` package).

**Themes and labels**

| Function | Explanation
| :------- | :-----------
| geom_text() | Maps the label of points to a variable within the aesthetics of ggplot.
| xlab() | Adds an $x$ label.
| ylab() | Adds a $y$ label.
| ggtitle() | Adds a title to the plot.
| theme_economist() | Produces plots that are similar in style to that of the Economist.
| theme_fivethirtyeight() | Produces plots that are similar in style to that of the FiveThirtyEight website.
| theme_bw() | Produces plots with a black and white theme.