# Syllabus
- Introduction
- R installation and basic syntax
- Data Types – Vectors, Lists, Matrices, Arrays, Factors, Data Frames
- Variables – Variable assignment, Data Type of a variable, finding variables, Deleting variables
- Operators in R
- Creating and manipulating objects
- Importing/Exporting data
- Data Distribution
- Data manipulation and extracting components
- Data Shaping and Transformation


# INTRODUCTION

- R is a leading tool for machine learning, statistics, and data analysis.
- Created by Ross Ihaka and Robert Gentleman at the University of Auckland.
- Platform-independent and open-source, accessible on all operating systems.
- Integrates with languages like C and C++ for broader functionality.
- Popular in the Data Science job market with a growing user community.
- Originated from the S programming language with influences from Scheme.
- First stable beta version released in 2000.

# WHY USE R PROGRAM? [FEATURES]

- Ideal for data-driven research with a vast array of statistical techniques.
- Rich ecosystem for advanced data manipulation, visualization, and machine learning.
- Strong Data Visualization: Powerful tools like ggplot2 and plotly for detailed, aesthetically pleasing graphs.
- It is open source and free; accessible without licensing costs.
- Runs on Windows, macOS, and Linux; platform independence.
- Seamless interaction with C, C++, Python, and Java.
- Large, active community with extensive resources.
- Highly sought-after language in the Data Science job market.


# ADVANTAGES
- R is the most comprehensive statistical analysis package. As new technology and concepts often appear first in R.
- As R programming language is an open source. Thus, you can run R anywhere and at any time.
- R programming language is suitable for GNU/Linux and Windows operating systems.
- R programming is cross-platform and runs on any operating system.
- In R, everyone is welcome to provide new packages, bug fixes, and code enhancements.


# DISADVANTAGES

- In the R programming language, the standard of some packages is less than perfect.
- Although, R commands give little pressure on memory management. So, R programming language may consume all available memory.
- In R basically, nobody to complain if something doesn’t work.
- R programming language is much slower than other programming languages such as Python and MATLAB.

# APPLICATIONS
- R for Data Science: Offers a wide range of libraries for statistics and a comprehensive environment for statistical computing.
- Used by Quantitative Analysts: Helps with data importing, cleaning, and analysis.
- Prevalence in Data Analysis: Widely used by data analysts and research programmers, especially in finance.
- Adopted by Tech Giants: Companies like Google, Facebook, Bing, Twitter, Accenture, and Wipro use R.

![image.png](attachment:28c0c444-2f23-47b2-9877-90bc1a50b653.png)

# BASIC SYNTAX

In [1]:
"Hello World!"

In [2]:
print("#WinningAtUni")

[1] "#WinningAtUni"


In [3]:
cat("To write without quotes")

To write without quotes

In [5]:
# Assignment

var1 = "Simple Assignment"
var2 <- "Left Assignment"
"Right Assignment" -> var3

cat(var1,"\n")
cat(var2,"\n")
cat(var3)

Simple Assignment 
Left Assignment 
Right Assignment

In [6]:
print(var1)
print(var2)
print(var3)

[1] "Simple Assignment"
[1] "Left Assignment"
[1] "Right Assignment"


In [7]:
# Single line comment

if(FALSE){
    "This is 
    multiline comment."
}

# Keywords

![image.png](attachment:f2b9ee4d-c20b-45ed-b59e-803a87a03bd6.png)

# Variables and Constants

## Rules for writing the identifiers
- Identifiers can be a combination of letters, digits, period (.) and underscore (_).
- It must start with a letter or a period. If it starts with a period, it cannot be followed by a digit.
- Reserved words in R cannot be used as identifiers.

## Valid identifiers
total, Sum, .fine.with.dot, this_is_acceptable, Number5 

## Invalid identifiers
tot@l, 5um, _fine, TRUE, .0ne 

## Best practices
- Earlier versions of R used underscore (_) as an assignment operator. So, the period (.) was used extensively in variable names having multiple words. - Current versions of R support underscore as a valid identifier but it is good practice to use period as word separators.
- For example, a.variable.name is preferred over a_variable_name or alternatively we could use camel case as aVariableName 

# Constants

In [18]:
# Numeric Constants
print(typeof(8))
print(typeof(8L))
print(typeof(8i))

[1] "double"
[1] "integer"
[1] "complex"


In [17]:
# Numeric constants preceded by 0x or 0X are interpreted as hexadecimal numbers. 
0xff
0XF + 1

In [19]:
# Character Constants
'example'
typeof('8')

In [20]:
# Built-in constants
LETTERS
letters
pi

In [21]:
month.name
month.abb

In [22]:
# But it is not good to rely on these, as they are implemented as variables whose values can be changed. 
pi
pi <- 85
pi

# Data Types

Variables are nothing but *reserved memory locations to store values*.

There are many types of Robjects. The frequently used ones are − 
- Vectors
- Lists
- Matrices
- Arrays
- Factors
- Data Frames 

In [23]:
v <- TRUE
class(v)

In [24]:
v <- charToRaw("Hello")
v
class(v)

[1] 48 65 6c 6c 6f

### Type Verification
`is.data_type(object)`

In [64]:
# Logical
print(is.logical(TRUE))
 
# Integer
print(is.integer(3L))
 
# Numeric
print(is.numeric(10.5))
 
# Complex
print(is.complex(1+2i))
 
# Character
print(is.character("12-04-2020"))
 
print(is.integer("a"))
 
print(is.numeric(2+3i))

[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] FALSE
[1] FALSE


### Converting data type of one object to another
`as.data_type(object) `

In [65]:
# Logical
print(as.numeric(TRUE))
 
# Integer
print(as.complex(3L))
 
# Numeric
print(as.logical(10.5))
 
# Complex
print(as.character(1+2i))
 
# Can't possible
print(as.numeric("12-04-2020"))

[1] 1
[1] 3+0i
[1] TRUE
[1] "1+2i"


"NAs introduced by coercion"

[1] NA


## Vector

A vector is the most common and basic data structure in R and is pretty much the workhorse of R. Vectors can be of two types: 
- atomic vectors
- lists 

In [26]:
# Atomic Vectors: A vector can be a vector of characters, logical, integers or numeric. 
x <- vector() # with a pre-defined length 
x
x <- vector(length = 3) # with a length and type 
vector("character", length = 3) 
vector("numeric", length = 3) 
vector("integer", length = 3) 
vector("logical", length = 3)

In [28]:
# The general pattern is vector(class of object, length). You can also create vectors by concatenating them using the c() function. 
z <- c("Alec", "Dan", "Rob", "Rich") 
typeof(z) 
length(z) 
class(z) 
str(z)

 chr [1:4] "Alec" "Dan" "Rob" "Rich"


In [30]:
# You can also create vectors as sequence of numbers 
series <- 1:5 
series
seq(5) 
seq(1, 5, by = 0.5) 

In [32]:
1/0
1/Inf

In [33]:
0/0

The below is called implicit coercion. The coersion rule goes `logical -> integer -> numeric -> complex -> character`. 

In [34]:
xx <- c(1.7, "a") 
xx
xx <- c(TRUE, 2) 
xx
xx <- c("a", TRUE) 
xx

You can also coerce vectors explicitly using the `as.<class_name>`. Example

In [35]:
x <- 0:6 
as.numeric(x) 
as.logical(x) 
as.character(x) 
as.complex(x) 

In [36]:
x <- c("a", "b", "c") 
as.numeric(x) 
as.logical(x) 
# both don't work 

"NAs introduced by coercion"

In [37]:
1 < '2'
'1' > 2
1 < 'a'

## Matrix

Matrices are a special vector in R. They are not a separate class of object but simply a vector but now with dimensions added on to it. Matrices have rows and columns. 

In [38]:
m <- matrix(nrow = 2, ncol = 2) 
m 
dim(m) 
attributes(m)

0,1
,
,


In [44]:
m <- matrix(1:6, nrow=3, ncol =2)
m

0,1
1,4
2,5
3,6


In [43]:
m <- 1:10 
dim(m) <- c(5,2) 
m

0,1
1,6
2,7
3,8
4,9
5,10


In [45]:
x <- 1:3 
y <- 10:12 
cbind(x,y)

x,y
1,10
2,11
3,12


In [46]:
rbind(x,y) 

0,1,2,3
x,1,2,3
y,10,11,12


## List

In R lists act as containers. Unlike atomic vectors, its contents are not restricted to a single mode and can encompass any data type. Lists are sometimes called recursive vectors, because a list can contain other lists. This makes them fundamentally different from atomic vectors. 

Lists are extremely useful inside functions. You can "staple" together lots of different kinds of results into a single object that a function can return. It doesn't print out like a vector. Prints a new line for each element. 

In [48]:
x <- list(1, "a", TRUE, 1+4i) 
x
x <- 1:5 
x <- as.list(x) 
x
length(x) 

In [49]:
x[1]

In [51]:
xlist <- list(a = "Rich FitzJohn", b = 1:10, data = head(iris)) 
xlist

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa


In [55]:
length(xlist)

In [56]:
temp <- list(list(list(list()))) 
temp 
is.recursive(temp) 

In [61]:
# Create a list.
list1 <- list(c(2,5,3),21.3,sin)

# Print the list.
print(list1)

[[1]]
[1] 2 5 3

[[2]]
[1] 21.3

[[3]]
function (x)  .Primitive("sin")



## Factors

Factors are special vectors that represent categorical data. Factors can be ordered or unordered.

Factors can only contain pre-defined values. 

Factors are pretty much integers that have labels on them.

In [57]:
x <- factor(c("yes", "no", "no", "yes", "yes")) 
x 

In [58]:
table(x)     # will return a frequency table. 

x
 no yes 
  2   3 

In [59]:
unclass(x)      # strips out the class information. 

In [60]:
x <- factor(c("yes", "no", "yes"), levels = c("yes", "no")) 
x
table(x)
unclass(x)

x
yes  no 
  2   1 

## Array

While matrices are confined to two dimensions, arrays can be of any number of dimensions. The array function takes a dim attribute which creates the required number of dimension.

In [62]:
# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)

, , 1

     [,1]     [,2]     [,3]    
[1,] "green"  "yellow" "green" 
[2,] "yellow" "green"  "yellow"
[3,] "green"  "yellow" "green" 

, , 2

     [,1]     [,2]     [,3]    
[1,] "yellow" "green"  "yellow"
[2,] "green"  "yellow" "green" 
[3,] "yellow" "green"  "yellow"



## Data Frame

Useful functions 
1. head() - see first 5 rows
2. tail() - see last 5 rows
3. dim() - see dimensions
4. nrow() - number of rows
5. ncol() - number of columns
6. str() - structure of each column
7. names() - will list column names for a data.frame (or any object really). 

In [63]:
# Create the data frame.
BMI <- 	data.frame(
   gender = c("Male", "Male","Female"), 
   height = c(152, 171.5, 165), 
   weight = c(81,93, 78),
   Age = c(42,38,26)
)
print(BMI)

  gender height weight Age
1   Male  152.0     81  42
2   Male  171.5     93  38
3 Female  165.0     78  26


# Operators

In [66]:
# Arithemetic Operators

a = c(2, 5.3)    # c is used to denote a vector
b <- c(9, 3.06)

cat("\nSum:",a+b)
cat("\nDifference:",a-b)
cat("\nProduct:",a*b)
cat("\nQuotient:",a/b)
cat("\nInteger Quotient:",a%/%b)
cat("\nRemainder:",a%%b)
cat("\nPower:",a^b)


Sum: 11 8.36
Difference: -7 2.24
Product: 18 16.218
Quotient: 0.2222222 1.732026
Integer Quotient: 0 1
Remainder: 2 2.24
Power: 512 164.5448

In [71]:
# Logical Operators

lst1 <- c(TRUE, 0.1)
lst2 <- c(0, 0+9i)

cat("\nAND:", lst1&lst2)
cat("\nOR:", lst1|lst2)
cat("\nNEGATION:", !lst2)             # A unary operator that negates the status of the elements of the operand.
cat("\nLOGICAL AND:", lst1&&lst2)     # Returns True if both the first elements of the operands are True.
cat("\nLOGICAL OR:", lst1||lst2)      # Returns True if either of the first elements of the operands is True.


AND: FALSE TRUE
OR: TRUE TRUE
NEGATION: TRUE FALSE
LOGICAL AND: FALSE
LOGICAL OR: TRUE

In [72]:
# Relational Operators

a <- 56
b <- 9

a<b
a>b
a<=b
a>=b
a==b
a!=b

In [73]:
# Assignment Operators

a = c(9, "a")    # Left Assignment (<- or <<- or =)
87 ->> b         # Right Assignment (-> or ->>)

a
b

In [75]:
# Misc. Operators

# %in%
val <- 0.1
list1 <- c(TRUE, 0.1, "apple")
print(val %in% list1)

# %*%
mat = matrix(c(1,2,3,4,5,6),nrow=2,ncol=3)
print (mat)
print(t(mat))    # transpose
pro = mat %*% t(mat)
print(pro)

[1] TRUE
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6
     [,1] [,2]
[1,]   35   44
[2,]   44   56


# Important Methods for R Variables

In [76]:
# class() function - This built-in function is used to determine the data type of the variable provided to it.
var1 = "hello"
print(class(var1))

[1] "character"


In [77]:
# ls() function - This built-in function is used to know all the present variables in the workspace.

var1 = "hello"
var2 <- "hello"
"hello" -> var3
print(ls())

 [1] "a"      "b"      "BMI"    "list1"  "lst1"   "lst2"   "m"      "mat"   
 [9] "pi"     "pro"    "series" "temp"   "v"      "val"    "var1"   "var2"  
[17] "var3"   "x"      "xlist"  "xx"     "y"      "z"     


In [78]:
# rm() function - This is again a built-in function used to delete an unwanted variable within your workspace.
rm(var3)
print(var3)

ERROR: Error in print(var3): object 'var3' not found


| **Aspect**          | **Global Variables**                                           | **Local Variables**                                               |
|---------------------|---------------------------------------------------------------|-------------------------------------------------------------------|
| **Scope**           | Defined outside of any function; accessible from anywhere.     | Defined inside a function; only accessible within that function.  |
| **Lifetime**        | Remains in memory until the program finishes or is deleted.    | Exists only during the function's execution; destroyed afterward. |
| **Naming Conflicts**| Can cause conflicts if used with the same name elsewhere.      | Less likely to cause conflicts as they are function-specific.     |
| **Memory Usage**    | Uses more memory since it persists throughout program execution. | Uses less memory as they are created and destroyed when necessary. |


## Global Variables

- They are available throughout the lifetime of a program.
- They are declared anywhere in the program outside all of the functions or blocks.

In [79]:
# global variable
global = 5
 
# global variable accessed from within a function
display = function(){
    print(global)
}
display()
 
# changing value of global variable
global = 10
display()

[1] 5
[1] 10


## Local Variables
- Local variables are those variables that exist only within a certain part of a program like a function and are released when the function call ends.
- Local variables do not exist outside the block in which they are declared, i.e. they can not be accessed or used outside that block.

In [80]:
func = function(){
    # this variable is local to the function func() and cannot be accessed outside this function
    age = 18
    print(age)
}
 
cat("Age is:\n")
func()

Age is:
[1] 18


# Creating and Manipulating Objects

# Importing / Exporting Data

# Data Distribution

# Data Manipulation and Extracting Components

# Data Shaping and Transformation