## 1 - What is R?


R is a computing environment that combines:

* a programming language called S, developed by John Chambers at Bell Labs, that implements the idea of programming with data (Chambers 1998),

* an extensive set of functions for classical and modern statistical data analysis and modeling,

* powerful numerical analysis tools for linear algebra, differential equations, and stochastics,

* graphics functions for visualizing data and model output,

* a modular and extensible structure that supports a vast array of optional **add-on packages**, and

* extensive help and documentation facilities.

* **free** and **open source**

* widely usedboth in academia and industry

* teaser:  http://shiny.rstudio.com/gallery


R is an open source software project, available for free download (R Core Team 2014a). Originally a research project in statistical computing (Ihaka and Gentleman 1996), it is now managed by a development team that includes a number of well-regarded statisticians, and is widely used by statistical researchers and working scientists as a platform for making new methods available to users.

R has been developed by statisticians and is hence very **convenient for actuaries**.


## 2 - What is RStudio?

Rtudio (https://www.rstudio.com/) is an integrated Development Environment (IDE) for R:

* like Microsoft Word, Excel, etc.
* built to help you write R code, run R code, and analyze data with R
* text editor, latex integration, debugging tool, version control
* Easy reporting via RShiny

To work with RStudio is one option to work with R. Other options are using Jupyter Notebooks (https://jupyter.org/).

RStudio consists of four different panes, each keeps track of separate information. 

* R Console
* R Scipt
* Plot
* Help files

See a short video on https://www.rstudio.com/products/RStudio/#Desktop

## 3 - Calculations

### R as a simple caclulator

In [44]:
# Calculate 3 + 4
sqrt(2)
x <- 3
y <- x^2
x + y
sin(2*pi)

### Creating vectors

In [47]:
c(1, 5, 80)
2:11
a <- c(1, 6, 10, 22, 7, 13)
mean(a)
sum(a)

### Creating matrices and data frames

In [56]:
matrix()
m <- matrix(1:6, nrow=3, ncol=2, byrow = TRUE)
m

0
""


0,1
1,2
3,4
5,6


In [60]:
data.frame()
df <- data.frame(Name = c("I","You","He"),Gender = as.factor(c(0,1,0)), Age = c(21,47,33))
df

Name,Gender,Age
I,0,21
You,1,47
He,0,33


### An **R statement** may consist of...

* an asignment stores the result of the calculation under temp_a and temp_b

In [74]:
temp_a <- 3 * (4 + 2)
temp_b <- temp_a + 2.5

* a name of an object: display object

In [83]:
temp_a

* a call to a function:  numerical or graphical result

In [97]:
mean(c(temp_a,temp_b))
mn <- mean(c(temp_a,temp_b))

A function is called by its name followed by ().

### Assignment and Workspace

* Everything in R is an object and has a certain name like temp_a, mean, mn.
* R stores objects in your workspace

In [123]:
temp_a <- 3 * (4 + 2)

* ATTENTION: Overwriting an object in R throughs no warning

In [135]:
temp_a <- temp_b^2
temp_a

## 4 - Where to find help?

* Help about any function

In [154]:
# ?foo
?lm

0,1
lm {stats},R Documentation

0,1
formula,"an object of class ""formula"" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’."
data,"an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which lm is called."
subset,an optional vector specifying a subset of observations to be used in the fitting process.
weights,"an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-NULL, weighted least squares is used with weights weights (that is, minimizing sum(w*e^2)); otherwise ordinary least squares is used. See also ‘Details’,"
na.action,"a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful."
method,"the method to be used; for fitting, currently only method = ""qr"" is supported; method = ""model.frame"" returns the model frame (the same as with model = TRUE, see below)."
"model, x, y, qr","logicals. If TRUE the corresponding components of the fit (the model frame, the model matrix, the response, the QR decomposition) are returned."
singular.ok,logical. If FALSE (the default in S but not in R) a singular fit is an error.
contrasts,an optional list. See the contrasts.arg of model.matrix.default.
offset,"this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one are specified their sum is used. See model.offset."

0,1
coefficients,a named vector of coefficients
residuals,"the residuals, that is response minus fitted values."
fitted.values,the fitted mean values.
rank,the numeric rank of the fitted linear model.
weights,(only for weighted fits) the specified weights.
df.residual,the residual degrees of freedom.
call,the matched call.
terms,the terms object used.
contrasts,(only where relevant) the contrasts used.
xlevels,(only where relevant) a record of the levels of the factors used in fitting.


* If you have any question, google 'how do I...with R'.
+ huge community
+ already asked by somebody else
* Very useful and helpful Q&A website:  http://stackexchange.com/Cheat
* Sheet for Base R https://www.rstudio.com/resources/cheatsheets/
* R Reference Card https://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf
* **Learning by doing** is particularly true for programming

## 5 - Data Import and Export

How do we get our data into R?

### Loading data from R (for training)

There are several packages containing claims data for Non-Life insurance.

In [195]:
library(MASS)
#library(CASdatasets)
#library(insuranceData)

For example, laod the data "Insurance"" from the MASS package:

In [207]:
data("Insurance")
?Insurance
head(Insurance)

District,Group,Age,Holders,Claims
1,<1l,<25,197,38
1,<1l,25-29,264,35
1,<1l,30-35,246,20
1,<1l,>35,1680,156
1,1-1.5l,<25,284,63
1,1-1.5l,25-29,536,84


0,1
Insurance {MASS},R Documentation


### Loading data from file

Often, you have your data as a .csv file available. Chek *?read.table* for more information about this function.

In [236]:
data_path <- "/home/s3m3wx/SAA_Training_2019/"
df <- read.table(paste(data_path,"dataCar.csv",sep=""), header = TRUE,sep=",")
# ?read.csv and many other specific loading packages available

Some useful functions to correctly load/save the data.

Let's set the working directory - the folder from which to operate (e.g.,save and load from).  Use:

In [254]:
getwd()  ## print the current working directory
#setwd("yourpath")

Or alternatively in RStudio use 'Session -> Set Working Directory -> Choose Directory...'

Importing data in R is easy. Different ways depending on the format (csv, txt, xlsx, etc.).
Alternative:  use the 'Import Dataset' tool in RStudio (upper-rightpanel)

In [275]:
data_path <- "/home/s3m3wx/SAA_Training_2019/"
df <- read.table(paste(data_path,"dataCar.csv",sep=""), header = TRUE,sep=",")
str(df)

'data.frame':	67856 obs. of  12 variables:
 $ X        : int  1 2 3 4 5 6 7 8 9 10 ...
 $ veh_value: num  1.06 1.03 3.26 4.14 0.72 2.01 1.6 1.47 0.52 0.38 ...
 $ exposure : num  0.304 0.649 0.569 0.318 0.649 ...
 $ clm      : int  0 0 0 0 0 0 0 0 0 0 ...
 $ numclaims: int  0 0 0 0 0 0 0 0 0 0 ...
 $ claimcst0: num  0 0 0 0 0 0 0 0 0 0 ...
 $ veh_body : Factor w/ 13 levels "BUS","CONVT",..: 4 4 13 11 4 5 8 4 4 4 ...
 $ veh_age  : int  3 2 2 2 4 3 3 2 4 4 ...
 $ gender   : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 2 1 1 ...
 $ area     : Factor w/ 6 levels "A","B","C","D",..: 3 1 5 4 3 3 1 2 1 2 ...
 $ agecat   : int  2 4 2 2 2 4 4 6 3 4 ...
 $ X_OBSTAT_: Factor w/ 1 level "01101    0    0    0": 1 1 1 1 1 1 1 1 1 1 ...


To save or write data to a file:

In [287]:
#write.table(df, file = "xy.txt", sep = " ")

where x is the data object to be stored an xy.txt.

Excel-files:  Use CSV
* > write.csv(...)
* > write.csv2(...)

## 6 - R Objects

Everything in R is an object, for example
* data frame:  most essential data structure in R

In [327]:
str(Insurance)

'data.frame':	64 obs. of  5 variables:
 $ District: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
 $ Group   : Ord.factor w/ 4 levels "<1l"<"1-1.5l"<..: 1 1 1 1 2 2 2 2 3 3 ...
 $ Age     : Ord.factor w/ 4 levels "<25"<"25-29"<..: 1 2 3 4 1 2 3 4 1 2 ...
 $ Holders : int  197 264 246 1680 284 536 696 3582 133 286 ...
 $ Claims  : int  38 35 20 156 63 84 89 400 19 52 ...


* vector, e.g.  a column of the data set Insurance

In [339]:
Age <- Insurance$Age
str(Age)

 Ord.factor w/ 4 levels "<25"<"25-29"<..: 1 2 3 4 1 2 3 4 1 2 ...


Goal is to look at or use a part of your object. To access only part of an object, use[ ]:
* for vectors: myvector[x]
* for two-dimensional objects, e.g.  data frames or matrices: mydata.frame[x, y]
Specify the indices by a vector (e.g.c(1, 2, 6)) and separate the indices of different dimensions by commas.

Let us play around with the indexing of a data frame:  two-dimensionalobject!

In [351]:
Insurance[ , ]
c(1, 3, 7)
1:10
Insurance[1:10 , ]
Insurance[-c( 1, 3, 7), ] # negative indices are excluded> d.sport[ , 2:3]
Insurance[c(1, 3, 6), 2:3]

District,Group,Age,Holders,Claims
1,<1l,<25,197,38
1,<1l,25-29,264,35
1,<1l,30-35,246,20
1,<1l,>35,1680,156
1,1-1.5l,<25,284,63
1,1-1.5l,25-29,536,84
1,1-1.5l,30-35,696,89
1,1-1.5l,>35,3582,400
1,1.5-2l,<25,133,19
1,1.5-2l,25-29,286,52


District,Group,Age,Holders,Claims
1,<1l,<25,197,38
1,<1l,25-29,264,35
1,<1l,30-35,246,20
1,<1l,>35,1680,156
1,1-1.5l,<25,284,63
1,1-1.5l,25-29,536,84
1,1-1.5l,30-35,696,89
1,1-1.5l,>35,3582,400
1,1.5-2l,<25,133,19
1,1.5-2l,25-29,286,52


Unnamed: 0,District,Group,Age,Holders,Claims
2,1,<1l,25-29,264,35
4,1,<1l,>35,1680,156
5,1,1-1.5l,<25,284,63
6,1,1-1.5l,25-29,536,84
8,1,1-1.5l,>35,3582,400
9,1,1.5-2l,<25,133,19
10,1,1.5-2l,25-29,286,52
11,1,1.5-2l,30-35,355,74
12,1,1.5-2l,>35,1640,233
13,1,>2l,<25,24,4


Unnamed: 0,Group,Age
1,<1l,<25
3,<1l,30-35
6,1-1.5l,25-29


## 7 - R Functions

Example function calls

In [370]:
mean(Insurance$Claims)
quantile(Insurance$Claims)
quantile(Insurance$Claims, probs = c(0.75, 0.9))

Always check out the functions help function with ?mean and ?quantile.

Functions consist of mandatory and optional arguments:
mean(x, trim = 0, na.rm = FALSE, ...)
x:  mandatory argument
trim:  optional argument, default is 0
na.rm:  optional argument, default is FALSE

The arguments of a function have a defined order and each argument has its own unique name.

In [396]:
mean(x = Insurance$Claims, na.rm = TRUE)

You can either use the names of the arguments, or place the values in the correct order (or a mix of both):

In [408]:
mean(Insurance$Claims, ,TRUE)

Example functions with no mandatory arguments:  matrix(), vector(),array(), list()> ?matrix

## 8 - Useful functions

Useful functions (look for help by typing ?str):

* str()
* nrow() and ncol()
* dim()
* summary()
* apply()
* head() and tail()
see also R Reference card

## 9 - Packages

By default, R only provides a basic set of functions.  Additional functions(and datasets) are obtained by loading additional * add-on packages*

Install and load "MASS" package (https://cran.r-project.org/web/packages/MASS/MASS.pdf. There is always a pdf containing information about the package.

In [451]:
#install.packages("MASS") # install onto computer once
require(MASS) # for every R session.
library(MASS)

Online resources:

* list of all packages:http://cran.r-project.org/web/packages/I

* by topic: http://cran.r-project.org/web/views/I

* ask Google

## 10 - Further reading

For continuation on that level, see
ftp://ess.r-project.org/users/sfs/RKurs/R.Intro/slides.pdf

Google's R Style Guide ("how to write good R code")
https://google.github.io/styleguide/Rguide.xml

RStudio Cheat Sheets (phantastic!)
https://www.rstudio.com/resources/cheatsheets/