-
Notifications
You must be signed in to change notification settings - Fork 6
/
The_Lack_of_Trouble_with_Tibbles.R
94 lines (49 loc) · 4.32 KB
/
The_Lack_of_Trouble_with_Tibbles.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# Data Frames versus Tibbles!
## First lets look at strings as factors versus strings as strings
### Make a data frame (not a tibble, do not suppress the default behavior) named "test1" where the first column is the names of three of your friends and the second column is their ages
### Make a tibble named "test2" where the first column is the names of three of your friends and the second column is their ages
# Print test1
# Print test2
# Even with these tiny data frames, what do you notice is different in how the generic data frame and the tibble are printed?
# Paste is a function that allows you to combine strings. Type each of the following commands
paste(test1[1,], collapse = " ")
paste(test2[1,], collapse = " ")
# What happened when paste tried to combine a factor with a string? Feel free to play around with functions like class and typeof to develop your answer
# Now make the following dataset
test3 <- data.frame(numbers_as_numbers = c(5, 3, -7, 1), numbers_as_strings = c("5", "3", "-7", "1"))
# Look at your data frame and examine it with str
# Print the column named "numbers as strings"
# Now, convert the column "numbers_as_strings" to an (additional) numeric column using the following command
test3$numbers_from_strings <- as.numeric(test3$numbers_as_strings)
# Print your data frame and examine it with str... what happened when you converted the numbers_as_strings column to a numeric column? are the numbers what you expected?
# Try the above "test3" but this time make a tibble instead of a data frame. Does it work better?
## Now let's look at simplification versus preservation
# first set-up a (generic) data frame and a tibble using the following code
tester <- data.frame(numbers = 1:10, fives = rep(5, 10), letters = LETTERS[1:10])
tester2 <- tibble(numbers = 1:10, fives = rep(5, 10), letters = LETTERS[1:10])
# Print and examine both new objects
# Save the first column as a subset for both dataframes using the following code
test_nums <- tester[ , 1]
test_nums2 <- tester2[ , 1]
# Print and examine both test_nums and test_nums2
# Run the following line of code as though you were trying to select the second row from a new data frame
test_nums[2, ]
# What happens, why? Now try it with test_nums2 instead of test_nums. Does it work? Why?
# There are several commands and operators in R that won't work on atomic vectors, try the following line of code
test_nums[test_nums$numbers > 5, ]
# What happens, why? Try to remember this error message and why it happened, it is an extremely common bug in R! Now run the same code, but using test_nums2 instead of test_nums. Make sure you add the "2" everywhere you need to!
### Now lets examine row name behavior in data frames and tibbles
# Let's check on our friend the iris dataset, which is a (generic) data frame. Look at it and examine it with str
# Now make a tibble named iris2 that is generated by making the iris dataset into a tibble. Print it and examine it with str
# Check the row names on the iris dataset and the new iris2 dataset
# Now take a subset of the iris data set named veri_iris that takes all columns, and only the rows where the species is "versicolor" What are the row names on this new dataset?
# Make a subset of the iris2 dataset named veri_iris_2 that takes all the columns, and only the rows where the species is "versicolor" What are the row names on this new dataset?
# Print the mtcars dataset. Print the row names for the mtcars dataset.
# Convert the mtcars dataset to a tibble and print it. Then print the row names of the new mtcars tibble.
# Take a subset of rows (preserving all columns) of your mtcars tibble (you can use whatever criteria you like, so long as it's a subset). Now what happens to the row names?
# What are two differences you notice (based on the previous exercises) about how tibbles and dataframes deal with row names?
# Are there advantages to how tibbles deal with row names? Are there advantages to how (generic) data frames deal with rownames?
# You can set the row names on a dataframe with the following syntax
row.names(test1) <- c("cube1", "cube2", "office")
# What happens if you try to do this to your tibble, test2?
### Looking back at the iris (generic) dataframe, iris tibble, mtcars (generic) dataframe, and mtcars tibble. What are the differences in how tibbles and dataframes are displayed? Which do you prefer? Why?