** This is a practice notebook for R101 course on webssite : ** ***[cognitiveclass.ai](https://cognitiveclass.ai/courses/r-101/)***

# Data Structures in R

## Array

An array is a structure that **contains data of the same type**, whether that's strings, or characters, or integers.
Arrays can be **multi-dimensional** as well, so the data can be contained in **multiple rows and columns**.

A vector is converted to an array using the **array()** function. We specify the dimension of the array using a **dim=** argument within the function.

In [279]:
movie_vector <- c("Akira", "Toy Story", "Room", "The Wave", "Whiplash", "Star Wars", "The Ring", "The Artist", "Jumanji")

In [280]:
movie_array <- array(movie_vector, dim = c(4, 3))
movie_array

0,1,2
Akira,Whiplash,Jumanji
Toy Story,Star Wars,Akira
Room,The Ring,Toy Story
The Wave,The Artist,Room


It can be seen in the above output that R repeats the first 3 elements. This is because the size of the array i.e. 4 x 3 = 12, is greater than the vector of lenghth 9.

To extract any element of the array, **specify the row and column index of that element within square brackets**.

In [281]:
movie_array[3, 2] #to retrieve 1 element

In [282]:
movie_array[, 2] #to retrieve 2nd column

In [283]:
movie_array[3, ] #to retrieve 3rd row

In [284]:
movie_array[1:3, 1:2] #to retrieve row 1 to 3, and column 1 to 2.

0,1
Akira,Whiplash
Toy Story,Star Wars
Room,The Ring


## Matrix

A matrix is similar in structure to an array. A matrix must be **two dimensional**. 

A matrix is created from a vector using a **matrix()** method. 
This takes 3 arguments, **name of the vector**, number of rows (**nrow = **) and number of columns (**ncol = **).

In [285]:
movie_matrix = matrix(movie_vector, nrow = 3, ncol = 3)
movie_matrix

0,1,2
Akira,The Wave,The Ring
Toy Story,Whiplash,The Artist
Room,Star Wars,Jumanji


By default the matrix elements are organised by columns. We can change this by adding a parameter **byrow = TRUE**

In [286]:
movie_matrix = matrix(movie_vector, nrow = 3, ncol = 3, byrow = TRUE)
movie_matrix

0,1,2
Akira,Toy Story,Room
The Wave,Whiplash,Star Wars
The Ring,The Artist,Jumanji


A subset of the matrix can also be accessed by specifying row and column range.

In [287]:
movie_matrix[1:3, 1:2]

0,1
Akira,Toy Story
The Wave,Whiplash
The Ring,The Artist


## List

A list is a collection of elements. A list may contain **elements of different data types**.

In [288]:
movie_list <- list("Toy Story", 1995, c("Animation", "Adventure", "Comedy"))
movie_list

Single or multiple elements can be retrieved by specifying the element index or index range within **square brackets []**

In [289]:
movie_list[1]

In [290]:
movie_list[2:3]

### Named List

A named list is a list where each element has been named or given a category.

In [291]:
movie_named_list <- list(name = "Toy Story",
                         year = 1995,
                         genre = c("Animation", "Adventure", "Comedy"))

In [292]:
movie_named_list

We can retrieve elements from individual categories using the **listName\$categoryName** syntax, where selectorName is the name of the category within the list. **Note the dollor sign operator "\$" between the list name and the category name**

In [293]:
movie_named_list$genre

In [294]:
movie_named_list$name

Category elements can also be retrieved by specifying the category name within the **listName[" "]** syntax.

In [295]:
movie_named_list["name"]

In [296]:
movie_named_list[c("year", "genre")]

The **class()** function can be used to determine the type of an object.

In [297]:
class(movie_named_list)
class(movie_array)
class(movie_matrix)

In [298]:
class(movie_named_list$year)
class(movie_named_list$genre)
class(movie_named_list$release) #category "release" does not exist in the named list

New elements can be added and updated by using the **double square brackets [[ ]]** operator, because we are directly referencing a list member (and we want to change its content).

In [299]:
movie_named_list[["age"]] <- 5
movie_named_list

In [300]:
movie_named_list[["age"]] <- c(5, 6, 7)
movie_named_list

Elements can be removed by assigning **NULL** value to a category withing named list.

In [301]:
movie_named_list[["age"]] <- NULL
movie_named_list

Concatenating two lists can be done by using them in a vector **c()**.

In [302]:
# We split our previous list in two sublists
movie_part1 <- list(name = "Toy Story")
movie_part2 <- list(year = 1995, genre = c("Animation", "Adventure", "Comedy"))

# Now we call the function c() to put everything together again
movie_concatenated <- c(movie_part1, movie_part2)

# Check it out
movie_concatenated

## Data Frame

A data frame is a ** named list of vectors of same length**. We create a data frame by using **data.frame()** function and pass named vectors to it.

In [303]:
movie_df <- data.frame(name = c("Toy Story", "Akira", "The Breakfast Club", "The Artist",
                                "Modern Times", "Fight Club", "City of God", "The Untouchables"),
                        year = c(1995, 1998, 1985, 2011, 1936, 1999, 2002, 1987),
                        stringsAsFactors=F)
movie_df

name,year
Toy Story,1995
Akira,1998
The Breakfast Club,1985
The Artist,2011
Modern Times,1936
Fight Club,1999
City of God,2002
The Untouchables,1987


Data frame elements can be accessed similar to lists.

In [304]:
movie_df$name

In [305]:
movie_df["year"]

year
1995
1998
1985
2011
1936
1999
2002
1987


In [306]:
movie_df[1]

name
Toy Story
Akira
The Breakfast Club
The Artist
Modern Times
Fight Club
City of God
The Untouchables


In [307]:
movie_df[9,]

Unnamed: 0,name,year
,,


**str()** function is widely used to obtain textual information about an object.

In [308]:
str(movie_df)
str(movie_named_list)
str(movie_list)
str(movie_matrix)

'data.frame':	8 obs. of  2 variables:
 $ name: chr  "Toy Story" "Akira" "The Breakfast Club" "The Artist" ...
 $ year: num  1995 1998 1985 2011 1936 ...
List of 3
 $ name : chr "Toy Story"
 $ year : num 1995
 $ genre: chr [1:3] "Animation" "Adventure" "Comedy"
List of 3
 $ : chr "Toy Story"
 $ : num 1995
 $ : chr [1:3] "Animation" "Adventure" "Comedy"
 chr [1:3, 1:3] "Akira" "The Wave" "The Ring" "Toy Story" "Whiplash" ...


In [309]:
class(movie_df$year)

In [310]:
movie_df[1,2]  #row-1, coloumn-2

**head()** function is used to **dislay the first 6 rows** of a data frame or an event list. Similarly, the **tail()** function is used to **display the last 6 rows**.

In [311]:
head(movie_df)

name,year
Toy Story,1995
Akira,1998
The Breakfast Club,1985
The Artist,2011
Modern Times,1936
Fight Club,1999


In [312]:
tail(movie_df)

Unnamed: 0,name,year
3,The Breakfast Club,1985
4,The Artist,2011
5,Modern Times,1936
6,Fight Club,1999
7,City of God,2002
8,The Untouchables,1987


New coloumns can be added by specifying the column names within **square brackets []** and assigning a vector to them.

In [313]:
movie_df['length'] <- c(81, 125, 97, 100, 87, 139, 130, 119)
movie_df

name,year,length
Toy Story,1995,81
Akira,1998,125
The Breakfast Club,1985,97
The Artist,2011,100
Modern Times,1936,87
Fight Club,1999,139
City of God,2002,130
The Untouchables,1987,119


New rows can be added using the **rbind()** function.

In [314]:
movie_df <- rbind(movie_df, c(name="Dr. Strangelove", year=1964, length=94))
movie_df

name,year,length
Toy Story,1995,81
Akira,1998,125
The Breakfast Club,1985,97
The Artist,2011,100
Modern Times,1936,87
Fight Club,1999,139
City of God,2002,130
The Untouchables,1987,119
Dr. Strangelove,1964,94


To remove rows we can specify the **negative index followed by a comma within square brackets []**

In [315]:
movie_df <- movie_df[-9,]
movie_df

name,year,length
Toy Story,1995,81
Akira,1998,125
The Breakfast Club,1985,97
The Artist,2011,100
Modern Times,1936,87
Fight Club,1999,139
City of God,2002,130
The Untouchables,1987,119


To remove a column we assign a **NULL** value to it

In [316]:
movie_df[["length"]] <- NULL
movie_df

name,year
Toy Story,1995
Akira,1998
The Breakfast Club,1985
The Artist,2011
Modern Times,1936
Fight Club,1999
City of God,2002
The Untouchables,1987
