# Dataframes

A data frame is a type of structure that contains correlated information. So for example, a data frame would be a great structure for storing these movie titles along with their corresponding years.

- We need to use the data.frame() function to create a data frame.

```R
movies <- data.frame(name = c("Toy Story", "Akira", "The Breakfast Club", "The Artist", "Modern Times"),
                     year = c(1995, 1998, 1985, 2011, 1936),
                     stringsAsFactors = FALSE)
```
- The variables of a data frame can be accessed using the “dollar sign” symbol.

```R
movies$name

#[1] "Toy Story"         "Akira"             "The Breakfast Club" "The Artist"        "Modern Times"     
```

- Ưe can also access the variables of a data frame using the square bracket notation.

```R
movies[["name"]]

#[1] "Toy Story"         "Akira"             "The Breakfast Club" "The Artist"        "Modern Times"     

movies[1]

#        name
#1  Toy Story
#2      Akira
#3 The Breakfast Club
#4  The Artist
#5 Modern Times


movies[1, 1]

#[1] "Toy Story"
```

- To get some information about the data frame’s structure, you can pass the data frame as an argument to the “str” function.

```R
str(movies)

#'data.frame':	5 obs. of  2 variables:
# $ name: chr  "Toy Story" "Akira" "The Breakfast Club" "The Artist" ...
# $ year: num  1995 1998 1985 2011 1936
```

The “head” and “tail” function can be used to look at the beginning and end of a data frame, respectively. The “head” function will display the first six elements, and the “tail” function will display the last six elements.

```R
head(movies)

#                 name year
#1           Toy Story 1995
#2               Akira 1998
#3 The Breakfast Club 1985
#4          The Artist 2011
#5        Modern Times 1936

tail(movies)

#                name year
#1 The Breakfast Club 1985
#2          The Artist 2011
#3        Modern Times 1936
#4           Toy Story 1995
#5               Akira 1998
```

- We can also use the “summary” function to get some summary statistics about the data frame.

```R
summary(movies)

#      name               year
#Length:5           Min.   :1936
#Class :character   1st Qu.:1985
#Mode  :character   Median :1995
#                   Mean   :1995
#                   3rd Qu.:2011
#                   Max.   :2011
```

- We can also use the “nrow” and “ncol” functions to get the number of rows and columns in a data frame, respectively.

```R
nrow(movies)

#[1] 5

ncol(movies)

#[1] 2
```

- We can also use the “dim” function to get the dimensions of a data frame.

```R
dim(movies)

#[1] 5 2
```

- We can also use the “names” function to get the names of the variables in a data frame.

```R
names(movies)

#[1] "name" "year"
```

- Adding a new variable to a data frame is very similar to adding a new element to a list. We can use the “dollar sign” symbol to add a new variable to a data frame.

```R
movies$rating <- c(3.9, 4.0, 4.5, 3.0, 4.0)
# method work the same with movies[["rating"]] <- c(3.9, 4.0, 4.5, 3.0, 4.0)
movies

#                 name year rating
#1           Toy Story 1995    3.9
#2               Akira 1998    4.0
#3 The Breakfast Club 1985    4.5
#4          The Artist 2011    3.0
#5        Modern Times 1936    4.0
```

- Insert a new row into a data frame is very similar to inserting a new element into a list. We can use the “rbind” function to insert a new row into a data frame.

```R
movies <- rbind(movies, c("The Deer Hunter", 1978, 4.0))
movies

#                 name year rating
#1           Toy Story 1995    3.9
#2               Akira 1998    4.0
#3 The Breakfast Club 1985    4.5
#4          The Artist 2011    3.0
#5        Modern Times 1936    4.0
#6     The Deer Hunter 1978    4.0
```

- We can also use the “cbind” function to insert a new column into a data frame.

```R
movies <- cbind(movies, c("USA", "Japan", "USA", "France", "USA", "USA"))

movies

#                 name year rating     [,1]
#1           Toy Story 1995    3.9      USA
#2               Akira 1998    4.0    Japan
#3 The Breakfast Club 1985    4.5      USA
#4          The Artist 2011    3.0   France
#5        Modern Times 1936    4.0      USA
#6     The Deer Hunter 1978    4.0      USA
```

- We can also use the “colnames” function to set the names of the columns in a data frame.

```R
colnames(movies) <- c("name", "year", "rating", "country")
movies

#                 name year rating country
#1           Toy Story 1995    3.9     USA
#2               Akira 1998    4.0   Japan
#3 The Breakfast Club 1985    4.5     USA
#4          The Artist 2011    3.0  France
#5        Modern Times 1936    4.0     USA
#6     The Deer Hunter 1978    4.0     USA
```

- We can also use the “rownames” function to set the names of the rows in a data frame.

```R
rownames(movies) <- c("m1", "m2", "m3", "m4", "m5", "m6")
movies

#                 name year rating country
#m1          Toy Story 1995    3.9     USA
#m2              Akira 1998    4.0   Japan
#m3 The Breakfast Club 1985    4.5     USA
#m4         The Artist 2011    3.0  France
#m5       Modern Times 1936    4.0     USA
#m6    The Deer Hunter 1978    4.0     USA
```

- To delete a variable from a data frame, we can use the “dollar sign” symbol.

```R
movies$country <- NULL
movies

#                 name year rating
#m1          Toy Story 1995    3.9
#m2              Akira 1998    4.0
#m3 The Breakfast Club 1985    4.5
#m4         The Artist 2011    3.0
#m5       Modern Times 1936    4.0
#m6    The Deer Hunter 1978    4.0
```

- To delete a row from a data frame, we can use the “[-]” operator.

```R
movies <- movies[-c(1, 3), ]
# or using movies <- movies[-c("m1", "m3"), ]
movies

#               name year rating
#m2            Akira 1998    4.0
#m4       The Artist 2011    3.0
#m5     Modern Times 1936    4.0
#m6  The Deer Hunter 1978    4.0
```

