# Gestion des graphiques avec R

R offers two main graphical systems: `base`  and `grid`. The latter is exposed in two core packages: `lattice` and `ggplot2`. We will use the later, which relies on the idea of a "Grammar of Graphics".

In [None]:
library(ggplot2)
theme_set(theme_bw())

The above instructions allow to load the required package and to set a defualt theme. There are meant to be run only once, when the R session is started. However, it is still possible to change the theme at any time, or inline when building a custom graphical display.

Let's look at a random sample of the GSOEP dataset described in the handout "lang-r-base". Recall that this is a Stata file built upon the [German Socio Economic Survey](https://www.eui.eu/Research/Library/ResearchGuides/Economics/Statistics/DataPortal/GSOEP) from 2009.

In [None]:
library(haven)
d <- read_dta("data/gsoep09.dta")
head(d)

## Data preprocessing

We will first subset the data frame by selecting only a dozen of variables variables, and then draw a random sample of 10% of the original dataset:

In [None]:
vars <- c("persnr", "hhnr2009", "ybirth", "sex", "mar", "egp", "yedu", "income", "rel2head", 
          "wor01", "wor02", "wor03", "wor04", "wor05", "wor06", "wor07", "wor08", "wor09", "wor10", "wor11", "wor12")
set.seed(101)
idx <- sample(1:nrow(d), floor(nrow(d)*.1))
d <- subset(d[idx, ], select = vars)
dim(d)

The next step consists in reencoding categorical variables and computing auxiliary variables:

In [None]:
d$persnr <- factor(d$persnr)
d$hhnr2009 <- factor(d$hhnr2009)
d$sex <- droplevels(as_factor(d$sex))
d$mar <- droplevels(as_factor(d$mar))
d$egp <- droplevels(as_factor(d$egp))
d$rel2head <- droplevels(as_factor(d$rel2head))
d$age <- 2009 - d$ybirth