<a href="https://colab.research.google.com/github/darshbs/R-onColab/blob/main/R_Datatypes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Data Types & Data Structures in 'R'**

## **Everything in R is an object**

In [None]:
# Create an R object
obj = 69

# **Basic data types in R:**
1.   Character
2.   Numeric (real or decimal)
3. Integer
4. Logical
5. Complex



In [None]:
# Create objects of simple data types
my_char = 'a'
print(my_char)

num = 10L
print(num)

my_numeric = 25
print(my_numeric)

my_logic = TRUE
print(my_logic)

[1] "a"
[1] 10
[1] 25
[1] TRUE


In [None]:
if (1 == 1) {
  print(my_logic)
}

[1] TRUE


# **Data Structures in R**
1. Atomic Vectors
2. List
3. Matrix
4. Data frame
5. Factor

The `c()` function plays a crucial role in creating vectors, it's called Character Array. It stands for "combine" or "concatenate."

## Vector
A vector is a collection of objects of the same data type.

In [2]:
# Create a Vector
my_vec1 = c(1L, 2L, 3L)
my_vec2 = 1:4
print(my_vec1)
print(my_vec2)

[1] 1 2 3
[1] 1 2 3 4


Sequence Function
seq(from = 1, to = 5, by = 1)

In [4]:
my_vec_seq = seq(from = 1, to = 10, by = 2)
print(my_vec_seq)

[1] 1 3 5 7 9


### Functions on Vector Objects

In [None]:
# length
length(my_vec_seq)

In [7]:
# Class
class(my_vec1)
class(my_vec2)
class(my_vec_seq)
typeof(my_vec_seq)
str(my_vec_seq)

 num [1:5] 1 3 5 7 9


###**Access and Modify Elements of a Vector**
 In R, the indexing starts from one.
 This is slightly different from a programming language like Python, in which the indexing
starts from zero.

For example, the first element is in position one, the second element is in position two, and so on.

In [12]:
vec_index = seq(from = 2, to = 12, by = 2)
vec_index[1] = 200
print(vec_index)


print(vec_index[1])

[1] 200   4   6   8  10  12
[1] 200


In [14]:
start = 2
end = 20
skip = 2

vec_index = seq(from = start, to = end, by = skip)
vec_index[1] = 200
print(vec_index)


print(vec_index[1])

 [1] 200   4   6   8  10  12  14  16  18  20
[1] 200


### Identifying and Handling Missing Data

One of the unique features of ‘R’ is that it lets the user identify and handle missing values in data. ‘NA’ is a special symbol that represents a missing value. Typically, when data is read from an external source, the missing values are replaced by NA. We can also initialise a vector using the symbol ‘NA’. Missing values are handled using the functions ‘is.na()’ and ‘anyNA().’ The details are explained using the code snippet below.

In [15]:
# Initialize a vector with missing values.
# special symbol NA is used to represent missing values.
missing = c(1, 2, NA, 4)
print(missing)
# is.na() is an in-built function. It takes the name of the object as argument
# This function checks if there are any missing values in the object
is.na(missing)
# anyNA will be TRUE if there is at least one missing value in the object,
anyNA(missing)

[1]  1  2 NA  4


### Special Symbols

‘Inf’ and ‘NaN’ are special symbols used to represent infinity and ‘not a number’ or unknown
in R, respectively. These symbols are of special importance in computations; such values
will have to be handled in a systematic manner.

In the above example, the code 1/0 displays the output ‘Inf’ and 0/0 displays the output
‘NaN’.

In [16]:
# Display the result of 1 divided by 0
1/0
# Display the result of 0 divided by 0
0/0

## List
List is a collection of simple objects. However, unlike vectors, the elements in a list can be of different data types.


### Creation of List

In [35]:
# Create list
firstList = list(2)
print(firstList)

# Create a list with values
valuesList = list(2, 'string', c('a', 5, 'c'), NA)
print(valuesList)
anyNA(valuesList)

[[1]]
[1] 2

[[1]]
[1] 2

[[2]]
[1] "string"

[[3]]
[1] "a" "5" "c"

[[4]]
[1] NA



In [36]:
# Assign names to slots of list
names(valuesList) = c('first', 'second', 'third', 'fourth')
str(valuesList)

List of 4
 $ first : num 2
 $ second: chr "string"
 $ third : chr [1:3] "a" "5" "c"
 $ fourth: logi NA


In [37]:
# Access elements of a list
valuesList[3]
valuesList$fourth


In [42]:
#Modify elements of a list
valuesList[6] = 10
print(valuesList)
print(valuesList[5])

$first
[1] 2

$second
[1] "string"

$third
[1] "a" "5" "c"

$fourth
[1] NA

[[5]]
NULL

[[6]]
[1] 10

[[1]]
NULL



## Matrix
A matrix is an atomic vector with one or two dimensions.
This data structure is a tabular arrangement of objects of the same data type

### Creation of Matrix

In [56]:
# Create a matrix
# Arranged in Column wise
myMatrixCol = matrix(c(1, 2, 3, 4, 5, 6), nrow = 3, ncol = 2)
print(myMatrixCol)
print('-----------------------------------')
# Arranged in Row Wise
myMatrixRow = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE)
print(myMatrixRow)
print('-----------------------------------')
# Assign Row & Column names
rownames(myMatrixRow) = c('row-1', 'row-2')
colnames(myMatrixRow) = c('col-1', 'col-2', 'col-3')
str(myMatrixRow)
print('-----------------------------------')
print(myMatrixRow)

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
[1] "-----------------------------------"
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[1] "-----------------------------------"
 num [1:2, 1:3] 1 4 2 5 3 6
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "row-1" "row-2"
  ..$ : chr [1:3] "col-1" "col-2" "col-3"
[1] "-----------------------------------"
      col-1 col-2 col-3
row-1     1     2     3
row-2     4     5     6


In [57]:
# Access elements of a matrix
myMatrixRow[1,2]

## Data Frame

A data frame is a list of lists with each sublist of the same length. It is equivalent to a rectangular list. A data frame is typically used to store data that are read from text/CSV files by retaining the underlying structure such as row names, column names, etc. A data frame can also be created manually.


### Creating Data Frame

In [66]:
# Create a dataframe manually
ID = c('A', 'B', 'C')
Age = c(18, 19, 20)
Height = c(150, 170, 160)
sData = data.frame(ID, Age, Height)

#Assign namse to the rows and columns of the data fram
rownames(sData) = c('Zetsu', 'Deva', 'Animal')
colnames(sData) = c('ID', 'Age', 'Height')


### In-built functions on Data Frame

In [67]:
# Structure of the data frame
str(sData)

'data.frame':	3 obs. of  3 variables:
 $ ID    : chr  "A" "B" "C"
 $ Age   : num  18 19 20
 $ Height: num  150 170 160


In [68]:
# Print first Five Rows
head(sData, 2)

Unnamed: 0_level_0,ID,Age,Height
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>
Zetsu,A,18,150
Deva,B,19,170


In [69]:
# Print last Five Rows
tail(sData, 2)

Unnamed: 0_level_0,ID,Age,Height
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>
Deva,B,19,170
Animal,C,20,160


In [70]:
# Get the dimension of the data frame
dim(sData)

In [71]:
# Number of Rows in Data Frame
nrow(sData)
# Number of Columns in Data Frame
ncol(sData)

### Accessing Elements of a Data Frame
Accessing the elements of a slot or column of a dataframe can be done by using the ‘$’ operator or by using double square brackets with column-name provided within quotes. We can also use a single square bracket. However, in this case, the result would be a data frame.

In [74]:
# Access a Particular column
sData$Age
sData[['Age']]
sData['Age']

Unnamed: 0_level_0,Age
Unnamed: 0_level_1,<dbl>
Zetsu,18
Deva,19
Animal,20


In [85]:
# Access a particular Row
sData['Deva', ]

# Accessing multiple elements
sData[c('ID', 'Age')]

# Access multiple elements for a specific row
sData['Animal', c('ID', 'Age')]

Unnamed: 0_level_0,ID,Age,Height
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>
Deva,B,19,170


Unnamed: 0_level_0,ID,Age
Unnamed: 0_level_1,<chr>,<dbl>
Zetsu,A,18
Deva,B,19
Animal,C,20


Unnamed: 0_level_0,ID,Age
Unnamed: 0_level_1,<chr>,<dbl>
Animal,C,20


### Factor

A factor is a vector that can contain only predefined values and is used to store categorical
data.

In [87]:
# Create a factor for storing a list of genders
gender = factor(c('Male', 'Male', 'Female', 'Female'))
print(gender)

[1] Male   Male   Female Female
Levels: Female Male


In [88]:
# In-built functions on factors
levels(gender)

In [89]:
# Modify a gender
gender[1] = 'Female'
print(gender)

[1] Female Male   Female Female
Levels: Female Male


A factor can be initialised using the function ‘factor().’ There are built-in functions available that operate on factor objects. For example, the built-in function ‘levels()’ can be used to identify the unique categories inside the vector. Factor objects can also be modified using indexes. A factor data structure will be important when we deal with categorical type data such as gender, education level, blood type, etc