# Introduction to R

To see the currently loaded packages You can use command `print(.packages())`.

In [5]:
print(.packages())

[1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
[7] "base"     


In [7]:
loadedNamespaces()

In [8]:
library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.1.0     ✔ purrr   0.3.0
✔ tibble  2.0.1     ✔ dplyr   0.7.8
✔ tidyr   0.8.2     ✔ stringr 1.3.1
✔ readr   1.3.1     ✔ forcats 0.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()


In [9]:
print(.packages())

 [1] "forcats"   "stringr"   "dplyr"     "purrr"     "readr"     "tidyr"    
 [7] "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics"  "grDevices"
[13] "utils"     "datasets"  "methods"   "base"     


## Creating variables

In [10]:
number <- 1
print(number)
class(number)
typeof(number)
length(number)
vector <- c(1, 2)
print(vector)
class(vector)
typeof(vector)
length(vector)
char_vector <- c(1, "2")
print(vector)
class(char_vector)
typeof(char_vector)

[1] 1


[1] 1 2


[1] 1 2


 In similar way it is possible to see variables in the workspace with command `ls()`.

In [11]:
ls()

## Tibble

In [12]:
df_voters <- data.frame(first_name = c("Jane", "Joe", "Bob"),
                        last_name = c(NA, NA, "Smith"),
                        sex = c("W", "M", "M"),
                        identitification = c(FALSE, FALSE, TRUE),
                        age = c(NA, 45, 60),
                        recycled_variable = 2
                        )
print(df_voters
cat("\nClass of df_voters is:")
class(df_voters)

ERROR: Error in parse(text = x, srcfile = src): <text>:9:1: unexpected symbol
8: print(df_voters
9: cat
   ^


In [13]:
tbl_voters <- tibble(first_name = c("Jane", "Joe", "Bob"),
                     last_name = c(NA, NA, "Smith"),
                     sex = c("W", "M", "M"),
                     identitification = c(FALSE, FALSE, TRUE),
                     age = c(NA, 45, 60),
                     recycled_variable = as.integer(2)
                    )
tbl_voters
cat("\nClass of tbl_voters is:")
class(tbl_voters)

first_name,last_name,sex,identitification,age,recycled_variable
Jane,,W,False,,2
Joe,,M,False,45.0,2
Bob,Smith,M,True,60.0,2



Class of tbl_voters is:

## Functions

In [14]:
square_number <- function(x){
    x^2
}

square_number(4)
square_number("a")

ERROR: Error in x^2: non-numeric argument to binary operator


In [20]:
problem <- function(tbl_voters){
    tbl_voters <- "test"
    tbl_voters
}

problem(tbl_voters)
tbl_voters

first_name,last_name,sex,identitification,age,recycled_variable
Jane,,W,False,,2
Joe,,M,False,45.0,2
Bob,Smith,M,True,60.0,2


# Data import

Many different functions for standard file formats:

| _Type_       | _Package_ | _Command_ |
| ------------ |:---------:|:----------|
| Text (tibble) | readr | `read_csv()` |
| Text (better performance) | data.table | `fread()` |
|Excel       | readxl | `read_excel()` |
|SPSS        | haven | `read_sav()` |
|SAS         | haven | `read_sas()` |
|Stata       | haven | `read_stata()` |
|XML | xml2 | `read_xml()` |
|Matlab | R.matlab | `readMat()`|

For some of these there is special **Import Dataset** button in the *RStudio IDE*.

In this course we will use only the `readr::read_csv()` command to read the text data into tibbles.

In [14]:
write_csv(mtcars, path = "mtcars.csv")
tbl_mtcars <- read_csv("mtcars.csv")

Parsed with column specification:
cols(
  mpg = col_double(),
  cyl = col_double(),
  disp = col_double(),
  hp = col_double(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_double(),
  am = col_double(),
  gear = col_double(),
  carb = col_double()
)


In [23]:
tbl_mtcars2 <- read_csv("mtcars.csv",
                      col_types = list(
                      mpg = col_double(),
                      cyl = col_double(),
                      disp = col_double(),
                      hp = col_double(),
                      drat = col_double(),
                      wt = col_double(),
                      qsec = col_double(),
                      vs = col_double(),
                      am = col_double(),
                      gear = col_double(),
                      carb = col_double()
                      )
                )

In [24]:
print(tbl_mtcars)
print(tbl_mtcars2)

# A tibble: 32 x 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
 2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
 3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
 4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
 6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
 7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
 8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
 9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
# … with 22 more rows
# A tibble: 32 x 11
     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <d

# Tidying data

## Data structures

## Tidy data advantages

## Missing and extreme values

# Data exploration

#### note that dbplyr is dplyr for databases

Filtering

Mutate

Aggregating, grouping, summarizing

# Visulization

## ggplot2 basics

### Data

### Geoms

### Scales

### Legend

# Final remarks

You might want to create an .Rpofile file in your HOME directory, which will load tidyverse package on the R session startup. R will always source this file on load, unless there is .Rprofile file present in the current working directory.

simple .Rprofile file, which will load tidyverse looks like this:

```
# This function is run at the start of the R session
.First <- function(){
    library(tidyverse)
}
```
More on customizing the R session startup can be found here:
https://www.statmethods.net/interface/customizing.html

or in R terminal run `?Startup` for deep technical explanation.