# A small introduction to R

## Installation of R packages

R is open source statistical software. Many people contribute to R by publishing packages which can be used by data analists. In this course we will use the packages dplyr and tdyr for data manipulation and ggplot2 for data visualisation. These packages are written by Hadley Wickam. Check his [homepage](http://hadley.nz/) for more packages.



But there are many more very useful packages. If you want to analyse something, for search for packages. E.g. for panel data analysis, you could use the package "plm".

To install a package, simply type:

install.packages("package name")



In [4]:
install.packages("plm")

Installing package into 'C:/Users/mcmik/Documents/R/win-library/3.3'
(as 'lib' is unspecified)


ERROR: Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror


## Basic computations

R can be used as a calculator.


In [13]:
3+8

In [14]:
log(100)

If you need irrational numbers (irrational number is a real number that cannot be expressed as a ratio of integers, i.e. as a fraction) such as $\pi$ and Eulers number $e$, you could write "pi" and "exp(1)".


In [15]:
exp(1)

In [12]:
pi

It is possible to create new variables with the "<-":

In [20]:
x <-exp(1) + pi

To get the answer, type x.

In [21]:
x

It is even possible to work with complex numbers

In [27]:
x <- 2
y <- 5
 
z <- complex(real = x, imaginary = y)

In [26]:
z

## Data

R has 5 types of data:

1. Character
2. Numeric (Real Numbers)
3. Integer (Whole Numbers)
4. Complex
5. Logical (True / False)

It is possible to convert one data type into another.


For example, we can create a vector: 



In [None]:
a <- c(1,2,3,4,5)

We can check if the vector is numeric, by 

In [31]:
str(a)

 num [1:5] 1 2 3 4 5


Suppose we want to change this vector into list with characters. We can do that by

In [32]:
a <- as.character(a)

Again we can check by

In [33]:
str(a)

 chr [1:5] "1" "2" "3" "4" "5"


In this course we store data in "Data frames". Data frame is the most used method to store data in R. A dataframe looks like a matrix, with columns and rows. However, it is NOT a matrix. In a matrix, each element is of the same class. In the R dataframe, each column is a vector which can have its own class. E.g. the first column could contain characters (e.g. names) and a second column could contain a factor (e.g. gender) and the third column could contain a numeric variable (e.g. salary).

It is possible to make your own dataframes, but normally you have to read in some dataset. R offers wide range of packages for importing data available in any format such as .xls, .txt, .csv, .json, .sql etc. It is also possible to read in data from programs like Stata en SPSS. Please google if you need to read in a specific data type. For now we concentrate on reading in a csv file.

We are going to read in some data about natural parks. This could be done with a hard coded path. This is not my preferred way. I would suggest to store your datafile and the notebook in the same directory. Then the following command will do:

In [49]:
NP <-read.csv2("NaturalPark.csv")

We can now look at the data by

In [50]:
head(NP)

X.bid1.bidh.bidl.answers.age.sex.income
"1,6,18,3, yy ,1, female ,2"
"2,48,120,24, yn ,2, male ,1"
"3,48,120,24, yn ,2, female ,3"
"4,24,48,12, nn ,5, female ,1"
"5,24,48,12, ny ,6, female ,2"
"6,12,24,6, nn ,4, male ,2"


As you see, a strange thing has happened. This has something to do with the "sign" that seperates the data. In English versions of data, often a comma is used to separate the data. We can read the data again and see that:

In [53]:
NP1 <- read.csv2( "NaturalPark.csv",  sep=",")
head(NP1)

X,bid1,bidh,bidl,answers,age,sex,income
1,6,18,3,yy,1,female,2
2,48,120,24,yn,2,male,1
3,48,120,24,yn,2,female,3
4,24,48,12,nn,5,female,1
5,24,48,12,ny,6,female,2
6,12,24,6,nn,4,male,2


Now we want to inspect the data. If you want the see the total dataframe you just type "NP1". For now, we are interested in the structure of the data.

In [56]:
str(NP1)

'data.frame':	312 obs. of  8 variables:
 $ X      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ bid1   : int  6 48 48 24 24 12 6 12 24 6 ...
 $ bidh   : int  18 120 120 48 48 24 18 24 48 18 ...
 $ bidl   : int  3 24 24 12 12 6 3 6 12 3 ...
 $ answers: Factor w/ 4 levels " nn "," ny ",..: 4 3 3 1 2 1 4 3 3 4 ...
 $ age    : int  1 2 2 5 6 4 2 3 2 3 ...
 $ sex    : Factor w/ 2 levels " female "," male ": 1 2 1 1 1 2 1 2 1 2 ...
 $ income : int  2 1 3 1 2 2 3 2 2 3 ...


Now can see that some colums are characterized as int (integers) and 2 as factors (levels). We can change this. Suppose we want to make bid1 a numerical variable we can use the following command: ($ means column)

In [58]:
NP1$bid1 <-as.numeric(NP1$bid1)
str(NP1)

'data.frame':	312 obs. of  8 variables:
 $ X      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ bid1   : num  6 48 48 24 24 12 6 12 24 6 ...
 $ bidh   : int  18 120 120 48 48 24 18 24 48 18 ...
 $ bidl   : int  3 24 24 12 12 6 3 6 12 3 ...
 $ answers: Factor w/ 4 levels " nn "," ny ",..: 4 3 3 1 2 1 4 3 3 4 ...
 $ age    : int  1 2 2 5 6 4 2 3 2 3 ...
 $ sex    : Factor w/ 2 levels " female "," male ": 1 2 1 1 1 2 1 2 1 2 ...
 $ income : int  2 1 3 1 2 2 3 2 2 3 ...


In the next lecture we will start to manipulate data.