# Introduction to R Programming

R is a programming language and software environment for statistical analysis, graphics representation and reporting.

- The core of R is an interpreted computer language which allows branching and looping as well as modular programming using functions.
- R allows integration with the procedures written in the C, C++, .Net, Python or FORTRAN languages for efficiency.

R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems like Linux, Windows and Mac.

# Evolution of R

R was initially written by Ross Ihaka and Robert Gentleman at the Department of Statistics of the University of Auckland in Auckland, New Zealand. R made its first appearance in 1993.

- A large group of individuals has contributed to R by sending code and bug reports.
- Since mid-1997 there has been a core group (the "R Core Team") who can modify the R source code archive.

# Features of R

As stated earlier, R is a programming language and software environment for statistical analysis, graphics representation and reporting. The following are the important features of R −

- R is a well-developed, simple and effective programming language which includes conditionals, loops, user defined recursive functions and input and output facilities.
- R has an effective data handling and storage facility,
- R provides a suite of operators for calculations on arrays, lists, vectors and matrices.
- R provides a large, coherent and integrated collection of tools for data analysis.
- R provides graphical facilities for data analysis and display either directly at the computer or printing at the papers.

As a conclusion, R is world’s most widely used statistics programming language.

**Note**
After installing R on your respected platforms; To install _packages in R_ use the following command in R prompt.

`install.packages("package_name")`

As a convention, we will start learning R programming by writing a "Hello, World!" program.
Depending on the needs, you can program either at R command prompt or you can use an R script file to write your program. Let's check both one by one.

# R Command Prompt

Once you have R environment setup, then it’s easy to start your R command prompt by just typing the following command at your command prompt −

`R`

This will launch R interpreter and you will get a prompt > where you can start typing your program as follows −


In [None]:
myString <- "Hello WOrld!"
print(myString)

Here first statement defines a string variable myString, where we assign a string "Hello, World!" and then next statement print() is being used to print the value stored in variable myString.

# R Script File

Usually, you will do your programming by writing your programs in script files and then you execute those scripts at your command prompt with the help of R interpreter called **Rscript**.

So let's start with writing following code in a text file called test.R as under

> After writing code follow the below procedure.

Save the above code in a file test.R and execute it at Linux command prompt as given below. Even if you are using Windows or other system, syntax will remain same.

`Rscript test.R`

# Comments

Comments are like helping text in your R program and they are ignored by the interpreter while executing your actual program. Single comment is written using # in the beginning of the statement as follows −

`# My first program in R Programming`

R **does not support multi-line comments** but you can perform a trick which is something as follows −

In [None]:
if(FALSE) {
   "This is a demo for multi-line comments and it should be put inside either a 
      single OR double quote"
}

myString <- "Hello, World!"
print ( myString)

**Though above comments will be executed by R interpreter, they will not interfere with your actual program.** You should put such comments inside, either single or double quote.

# Data Types

Generally, while doing programming in any programming language, you need to use various variables to store various information. Variables are nothing but reserved memory locations to store values.

> This means that, when you create a variable you reserve some space in memory.

You may like to store information of various data types like character, wide character, integer, floating point, double floating point, Boolean etc. Based on the data type of a variable, the operating system allocates memory and decides what can be stored in the reserved memory.

- In contrast to other programming languages like C and java in R, the variables are not declared as some data type.
- The **variables are assigned with R-Objects** and the **data type of the R-object becomes the data type of the variable**.
- There are many types of R-objects. The frequently used ones are −
  - Vectors
  - Lists
  - Matrices
  - Arrays
  - Factors
  - Data Frames
  
The simplest of these objects is the **vector object** and there are six data types of these atomic vectors, also termed as six classes of vectors. The other R-Objects are built upon the atomic vectors.

<table style='font-size: 16px'>
    <tr>
        <th>Data Type</th>
        <th>Example</th>
    </tr>
    <tr>
        <td>Logical</td>
        <td>True, False</td>
    </tr>
    <tr>
        <td>Numeric</td>
        <td>12.5, 5, 999</td>
    </tr>
    <tr>
        <td>Integer</td>
        <td>2L, 34L, 0L</td>
    </tr>
    <tr>
        <td>Complex</td>
        <td>3 + 2i</td>
    </tr>
    <tr>
        <td>Character</td>
        <td>'a', "good", "True", '23.4'</td>
    </tr>
    <tr>
        <td>Raw</td>
        <td>"Hello" is stored as 48 65 6c 6c 6f</td>
    </tr>
</table>

In [None]:
# R-Objects

a <- TRUE
print(class(a))

b <- 23.5
print(class(b))

c <- 2L
print(class(c))

d <- 2+5i
print(class(d))

e <- "Hello World"
print(class(e))

f <- charToRaw("Hello")
print(class(f))

- In R programming, the very basic data types are the R-objects called vectors which hold elements of different classes as shown above.
- **Please note** in R the number of classes is not confined to only the above six types.
- **_For example,_** we can use many atomic vectors and create an array whose class will become array.

# Vector

When you want to **create vector with more than one element**, you should use `c()` function which means to **combine the elements into a vector.**


In [None]:
# Create a vector
apple <- c('red', 'green', 'yellow')
print(apple)

# Get the class of the vector.
print(class(apple))

# List

A list is an R-object which can **contain many different types of elements inside it** like _vectors,_ _functions_ and _even another list_ inside it.



In [None]:
# Create a list
list_1 <- list(c(1,2,3), 21.3, sin)

print(list_1)

# Matrices

A matrix is a 2-dimensional array that has m number of rows and n number of columns. In other words, matrix is a combination of two or more vectors with the same data type.

It can be created using a vector input to the matrix function.

**Note -** It is possible to create more than two dimensions arrays with R.

We can create a matrix with the function `matrix().` This function takes three arguments:
`matrix(data, nrow, ncol, byrow = FALSE)`

**Arguments:** 

- **data:** The collection of elements that R will arrange into the rows and columns of the matrix.
- **nrow:** Number of rows 
- **ncol:** Number of columns 
- **byrow:** The rows are filled from the left to the right. We use `byrow = FALSE` (default values), if we want the matrix to be filled by the columns i.e. the values are filled top to bottom.

![Matrix](./images/matrix.png)

In [None]:
# Create a matrix

matrix_1 = matrix(c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(matrix_1)

# Array

- While matrices are confined to two dimensions, arrays can be of any number of dimensions.
- **The array function takes a dim attribute which creates the required number of dimension.**

In [None]:
# Create an array

a <- array(c('green', 'yellow'), dim = c(3,3,2))
print(a)

# Factors

- Factors are the r-objects which are created using a vector.
- It stores the **vector along with the distinct values of the elements in the vector as labels.**
- The <u>labels are always character</u> irrespective of whether it is numeric or character or Boolean etc. in the input vector.
- They are useful in <u>statistical modeling</u>.

- Factors are created using the `factor()` function.
- The `nlevels` functions gives the count of levels.

In [None]:
# Create a vector
apple_colors <- c('green', 'green', 'yellow', 'red', 'red', 'red', 'green')

# Create a factor object
factor_apple <- factor(apple_colors)

# Print the factor
print(factor_apple)
print(nlevels(factor_apple))

# Data Frames

- **Data frames are tabular data objects**.
- Unlike a matrix **in data frame each column can contain different modes of data**.
- The first column can be numeric while the second column can be character and third column can be logical.
- It is a <u>list of vectors</u> of equal length.

In [None]:
# Create a data frame

BMI <- data.frame(
	gender = c('Male', 'Female', 'Male'),
	height = c(152, 171.1, 165),
	weight = c(81, 55, 65),
	Age = c(25, 32,45)
)

print(BMI)

# Variables

A variable provides us with named storage that our programs can manipulate. <u>A variable in R can store an atomic vector, group of atomic vectors or a combination of many Robjects.</u> A valid variable name consists of *letters*, *numbers* and *the dot* or *underline* characters. **The variable name starts with a letter or the dot not followed by a number.**

<table style='font-size: 16px'>
    <tr>
        <th>Variable Name</th>
        <th>Validity</th>
        <th>Reason</th>
    </tr>
    <tr>
        <td>var_name2.</td>
        <td>valid</td>
        <td>Has letters, numbers, dot and underscore</td>
    </tr>
    <tr>
        <td>var_name%</td>
        <td>invalid</td>
        <td>Has the character '%'. Only dot(.) and underscore allowed.</td>
    </tr>
    <tr>
        <td>2var_name</td>
        <td>invalid</td>
        <td>Starts with a number</td>
    </tr>
    <tr>
        <td>.var_name, var.name</td>
        <td>valid</td>
        <td>Can start with a dot(.) but the dot(.)should not be followed by a number.</td>
    </tr>
    <tr>
        <td>.2var_name</td>
        <td>invalid</td>
        <td>The starting dot is followed by a number making it invalid.</td>
    </tr>
    <tr>
        <td>_var_name</td>
        <td>invalid</td>
        <td>Starts with _ which is not valid
</td>
    </tr>
</table>

# Variable Assignment

- **The variables can be assigned values using leftward, rightward and equal to operator.**

- The values of the variables can be printed using `print()` or `cat()` function.

- The `cat()` function <u>combines multiple items into a continuous print output.</u>

In [None]:
# Assignment using equal operator
var.1 = c(0,1,2,3)

# Assignment using leftward operator
var.2 <- c(9,8,7,6)

# Assignment using rightward operator
c(TRUE, 1) -> var.3

print(var.1)
cat('var.1 is', var.1, '\n')
cat('var.2 is', var.2, '\n')
cat('var.3 is', var.3, '\n')


**Note -** The vector c(TRUE,1) has a mix of logical and numeric class. So logical class is coerced to numeric class making TRUE as 1.

# Data Type of a Variable

In R, a variable itself is not declared of any data type, rather it gets the data type of the R - object assigned to it. So **R is called a dynamically typed language**, which means that we can change a variable’s data type of the same variable again and again when using it in a program.

In [None]:
'hello' -> var_x
cat('The class of var_x is', class(var_x), '\n')

var_x = 34.4
cat("  Now the class of var_x is",class(var_x),"\n")

var_x <- 27L
cat("   Next the class of var_x becomes",class(var_x),"\n")


# Finding Variables

To know all the variables currently available in the workspace we use the `ls()` function. Also the `ls()` function can use **patterns to match the variable names**.

In [None]:
print(ls())

**Note −** It is a sample output depending on what variables are declared in your environment.

The `ls()` function can use patterns to match the variable names.

In [None]:
# List the variables starting with the pattern "var".
print(ls(pattern = "var"))

The variables starting with dot(.) are **hidden,** they can be listed using `"all.names = TRUE"` argument to `ls()` function.

In [None]:
print(ls(all.name = TRUE))

# Deleting Variables

Variables can be deleted by using the `rm()` function.

Below we delete the variable `var_1`. On printing the value of the variable error is thrown.

In [None]:
var_1 <- 'Hello world'
print(var_1)

rm(var_1)
print(var_1)

**NOTE -** All the variables can be deleted by using the `rm()` and `ls()` function together.

In [None]:
rm(list = ls())
print(ls())

# Operators

An operator is a symbol that tells the compiler to perform specific mathematical or logical manipulations. R language is rich in built-in operators and provides following types of operators.

## Types of Operators

We have the following types of operators in R programming −

- Arithmetic Operators
- Relational Operators
- Logical Operators
- Assignment Operators
- Miscellaneous Operators

## Arithmetic Operators

Following table shows the arithmetic operators supported by R language. The operators act on each element of the vector.

<table style='font-size: 16px'>
    <tr>
        <th>Operator</th>
        <th>Description</th>
    </tr>
    <tr>
        <td><b>+</b></td>
        <td>Adds two vectors</td>
    </tr>
    <tr>
        <td><b>-</b></td>
        <td>Subtracts second vector from the first</td>
    </tr>
    <tr>
        <td><b>*</b></td>
        <td>Multiplies both vectors</td>
    </tr>
    <tr>
        <td><b>/</b></td>
        <td>Divide the first vector with the second</td>
    </tr>
    <tr>
        <td><b>%%</b></td>
        <td>Give the remainder of the first vector with the second</td>
    </tr>
    <tr>
        <td><b>%/%</b></td>
        <td>The result of division of first vector with second (quotient)</td>
    </tr>
    <tr>
        <td><b>^</b></td>
        <td>The first vector raised to the exponent of second vector</td>
    </tr>
</table>

In [2]:
# Arithmetic Operators

a <- c(2, 5.5, 6)
b <- c(8, 3, 4)

cat('Addition of a and b vector: ', a + b, '\n')
cat('Subtraction of a and b vector: ', a - b, '\n')
cat('Multiplication of a and b vector: ', a * b, '\n')
cat('Division of a and b vector: ', a / b, '\n')
cat('Remainder of a and b vector: ', a %% b, '\n')
cat('Quotient of a and b vector: ', a %/% b, '\n')
cat('exponent of a and b vector: ', a ^ b, '\n')

Addition of a and b vector:  10 8.5 10 
Subtraction of a and b vector:  -6 2.5 2 
Multiplication of a and b vector:  16 16.5 24 
Division of a and b vector:  0.25 1.833333 1.5 
Remainder of a and b vector:  2 2.5 2 
Quotient of a and b vector:  0 1 1 
exponent of a and b vector:  256 166.375 1296 


# Relational Operators

Following table shows the relational operators supported by R language. Each element of the first vector is compared with the corresponding element of the second vector. The result of comparison is a Boolean value.

<table style='font-size: 16px'>
    <tr>
        <th>Operator</th>
        <th>Description</th>
    </tr>
    <tr>
        <td><b>></b></td>
        <td>Checks if each element of the first vector is greater than the corresponding element of the second vector.</td>
    </tr>
    <tr>
        <td><b><</b></td>
        <td>Checks if each element of the first vector is less than the corresponding element of the second vector.</td>
    </tr>
    <tr>
        <td><b>==</b></td>
        <td>Checks if each element of the first vector is equal to the corresponding element of the second vector.</td>
    </tr>
    <tr>
        <td><b><=</b></td>
        <td>Checks if each element of the first vector is less than or equal to the corresponding element of the second vector.</td>
    </tr>
    <tr>
        <td><b>>=</b></td>
        <td>Checks if each element of the first vector is greater than or equal to the corresponding element of the second vector.</td>
    </tr>
    <tr>
        <td><b>!=</b></td>
        <td>Checks if each element of the first vector is unequal to the corresponding element of the second vector.</td>
    </tr>
</table>

In [3]:
# Relational Operators

a <- c(2, 5.5, 6, 9)
b <- c(8, 2.5, 14, 9)

cat('Greater than', a > b , '\n')
cat('Lesser than', a < b , '\n')
cat('Equal Too', a == b , '\n')
cat('Lesser than Equal Too', a <= b , '\n')
cat('Greater than Equal Too', a >= b , '\n')
cat('Not Equal Too', a != b , '\n')

Greater than FALSE TRUE FALSE FALSE 
Lesser than TRUE FALSE TRUE FALSE 
Equal Too FALSE FALSE FALSE TRUE 
Lesser than Equal Too TRUE FALSE TRUE TRUE 
Greater than Equal Too FALSE TRUE FALSE TRUE 
Not Equal Too TRUE TRUE TRUE FALSE 


# Logical Operator

Following table shows the logical operators supported by R language.

- It is applicable only to vectors of type logical, numeric or complex.
- All numbers **greater than 1 are considered as logical value TRUE**.
- Each element of the first vector is compared with the corresponding element of the second vector.
- The result of comparison is a Boolean value.

<table style='font-size: 16px'>
    <tr>
        <th>Operator</th>
        <th>Description</th>
    </tr>
    <tr>
        <td><b>&</b></td>
        <td>It is called Element-wise Logical AND operator. It combines each element of the first vector with the corresponding element of the second vector and gives a output TRUE if both the elements are TRUE.</td>
    </tr>
    <tr>
        <td><b>|</b></td>
        <td>It is called Element-wise Logical OR operator. It combines each element of the first vector with the corresponding element of the second vector and gives a output TRUE if one of the elements is TRUE.</td>
    </tr>
    <tr>
        <td><b>!</b></td>
        <td>It is called Logical NOT operator. Takes each element of the vector and gives the opposite logical value.</td>
    </tr>
</table>

In [6]:
# Logical Operators

a <- c(3, 1, TRUE, 2+3i)
b <- c(4, 1, FALSE, 2+3i)

cat('AND OPERATOR', a & b , '\n')
cat('OR OPERATOR', a | b , '\n')
cat('NOT OPERATOR', !b , '\n')

AND OPERATOR TRUE TRUE FALSE TRUE 
OR OPERATOR TRUE TRUE TRUE TRUE 
NOT OPERATOR FALSE FALSE TRUE FALSE 


The logical operator && and || considers only the first element of the vectors and give a vector of single element as output.

<table style='font-size: 16px'>
    <tr>
        <th>Operator</th>
        <th>Description</th>
    </tr>
    <tr>
        <td><b>&&</b></td>
        <td>Called Logical AND operator. Takes first element of both the vectors and gives the TRUE only if both are TRUE.</td>
    </tr>
    <tr>
        <td><b>||</b></td>
        <td>Called Logical AND operator. Takes first element of both the vectors and gives the TRUE only if both are TRUE.</td>
    </tr>
</table>

In [8]:
# Logical Operators

a <- c(3, 0, TRUE, 2+2i)
b <- c(1, 3, TRUE, 2+3i)

print(a&&b)
print(a||b)

[1] TRUE
[1] TRUE


# Assignment Operators

These operators are used to assign values to vectors.

<table style='font-size:16px'>
    <tr>
        <th>Operator</th>
        <th>Description</th>
    </tr>
    <tr>
        <td>
            <b>&lt;-</b><br />
            <b>or</b><br />
            <b>=</b><br />
            <b>or</b><br />
            <b>&lt;&lt;-</b><br />
        </td>
        <td>Called Left Assignment</td>
    </tr>
    <tr>
        <td>
            <b>-&gt;</b><br />
            <b>or</b><br />
            <b>-&gt;&gt;</b><br />
        </td>
        <td>Called Right Assignment</td>
    </tr>
</table>

# Miscellaneous Operators

These operators are used to for specific purpose and not general mathematical or logical computation.

<table style='font-size: 16px'>
    <tr>
        <th>Operator</th>
        <th>Description</th>
    </tr>
    <tr>
        <td><b>:</b></td>
        <td>Colon operator. It creates the series of numbers in sequence for a vector.</td>
    </tr>
    <tr>
        <td><b>%in%</b></td>
        <td>This operator is used to identify if an element belongs to a vector.</td>
    </tr>
    <tr>
        <td><b>%*%</b></td>
        <td>This operator is used to multiply a matrix with its transpose.</td>
    </tr>
</table>

In [11]:
# Miscellaneous Operators

a <- 2:8

print(a)
print(4 %in% a)

M = matrix( c(2,6,5,1,10,4), nrow = 2, ncol = 3, byrow = TRUE)
t = M %*% t(M)

print(t)

[1] 2 3 4 5 6 7 8
[1] TRUE
     [,1] [,2]
[1,]   65   82
[2,]   82  117
