<a href="https://colab.research.google.com/github/brendanpshea/data-science/blob/main/DataScience_11_R.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# From Python to R
### Brendan Shea, PhD
THis chapter will provide a brief introduction to **R**. R is a programming language and environment specifically designed for statistical computing and graphics. It is widely used among statisticians and data analysts to develop statistical software and data analysis. In the context of data science, R provides an extensive array of libraries and built-in functions for complex data analysis and graphical models, enabling detailed statistical modeling and visualization.

Differences between R and Python in data science primarily revolve around their origins and design philosophies:

1. R is rooted in statistical analysis, with a design that inherently understands the needs of data modeling and visualization. Python, while versatile and powerful in data science, is a general-purpose language that has been adapted for data analysis through libraries like Pandas and SciPy.

2. R has a slight edge in the complexity and variety of statistical models available, and its graphics packages like `ggplot2` are considered to be more sophisticated in terms of capabilities for creating advanced statistical plots.

3. R's community is traditionally composed of statisticians and academics, leading to a wealth of packages for nearly every statistical test or model imaginable. Python's data science community is broader, attracting professionals from various backgrounds including software engineering, leading to a robust set of tools that are often seen as more user-friendly.

4. R's syntax is thought to have a somewhat steeper learning curve for those not already familiar with statistical software, whereas Python's syntax is often praised for being more intuitive and easier for beginners to grasp.

In the end, however, both Python and R are widely used in Data Science, and its beneficial to have exposure to both languages. Google Colab supports both Python and R (and, in fact, these are the *only* languages it currently supports).

## Data Types in R
**Vectors** are the most basic data structure in R. They hold elements of the same type. Let's say we're exploring a story where characters collect gems of different values. A vector could represent the values of gems that a character has collected:

In [None]:
gem_values <- c(50, 100, 200, 150)
gem_values # display values

Here, `gem_values` is a numeric vector containing four elements. The `c()` function combines values into a vector.

Moving on to Matrices, imagine our characters are in a grid-based world, and we want to represent the number of gems in different grid locations. A matrix could represent this:

In [None]:
grid_gems <- matrix(c(1, 0, 3, 4, 2, 0, 5, 1, 3), nrow = 3, ncol = 3)
grid_gems # display matrix

0,1,2
1,4,5
0,2,1
3,0,3


This matrix `grid_gems` has 3 rows and 3 columns, showing the count of gems in a 3x3 grid.

**Data Frames** are akin to datasets or tables in other software. They can have columns of different types. Suppose we're tracking different types of items (e.g., gems, keys) found by each character:

In [None]:
characters <- c("Aragem", "Rubella", "Topazia")
gems <- c(5, 3, 8)
keys <- c(2, 2, 4)
inventory <- data.frame(characters, gems, keys)

inventory # display data frame

characters,gems,keys
<chr>,<dbl>,<dbl>
Aragem,5,2
Rubella,3,2
Topazia,8,4


The `inventory` data frame has three columns: character names, number of gems, and number of keys.

**Lists** in R can contain different types and sizes of elements. For instance, a character's profile including their name, age, and the types of gems they like could be a list:

In [None]:
character_profile <-
  list(name = "Aragem", age = 42, favorite_gems = c("Emerald", "Sapphire"))

character_profile

Here, `character_profile` includes a string, a numeric value, and a vector of strings.

**Factors** are used to represent categorical data. If we have a list of characters' roles in our story, we could use a factor to categorize them:

In [None]:
roles <- factor(c("Warrior", "Mage", "Archer", "Mage"))
roles

**Logical types** are straightforward: they represent boolean values. We could track whether a character has completed a quest:

In [None]:
quest_completed <- c(TRUE, FALSE, TRUE)
quest_completed

The vector `quest_completed` tells us which characters (perhaps in the same order as our `characters` vector) have completed a quest.

**Numeric** and **Integer** types are numbers, with the former including decimals and the latter being whole numbers. If each character has a certain amount of gold:

In [None]:
gold <- c(100.5, 200, 150) # Numeric
gold

In [None]:
steps_walked <- c(500L, 700L, 450L) # Integer, denoted by 'L'
steps_walked

`gold` is a numeric vector with decimals, whereas `steps_walked` is an integer vector showing how many steps each character has walked.

Lastly, **Character** types are text strings. If we wanted to note the title given to each character after an achievement:

In [None]:
titles <- c("The Brave", "The Wise", "The Swift")
titles

The `titles` vector holds these honorary titles as character strings.

## Printing with `cat()`
The `cat` function in R is used to concatenate and print objects. It is particularly useful for creating custom-formatted strings and can handle different types of objects by converting them to a character string.

To illustrate how `cat` can be used to print and format variables in R, let's utilize our narrative-driven examples with a focus on displaying informative messages.

#### Printing a simple message with a vector value:

In [None]:
cat("The values of gems collected are:", gem_values, "\n")

The values of gems collected are: 50 100 200 150 


#### Custom message for a data frame's content:

In [None]:
cat("Character inventory:\n",
  "Names:", inventory$characters, "\n",
  "- Gems:", inventory$gems, "\n",
  "- Keys:", inventory$keys, "\n")

Character inventory:
 Names: Aragem Rubella Topazia 
 - Gems: 5 3 8 
 - Keys: 2 2 4 


Here, we use `$` to access each column of the data frame `inventory`.

#### Printing lists:

For lists, since they can contain different types of elements, we can print each element with a custom message:

In [None]:
cat("Character profile:\n",
  "Name:", character_profile$name, "- Age:", character_profile$age, "\n",
  "Favorite gems:", toString(character_profile$favorite_gems), "\n")

Character profile:
 Name: Aragem - Age: 42 
 Favorite gems: Emerald, Sapphire 


`toString()` is used to collapse the elements of the `favorite_gems` vector into a single, comma-separated string.

#### To display a factor with custom formatting:

In [None]:
cat("Character roles are:", levels(roles), "\n")

Character roles are: Archer Mage Warrior 


This prints out the distinct levels of the factor roles.

#### Printing logical values with an explanatory message:

In [None]:
cat("Quest completion status:",
  ifelse(quest_completed, "Completed", "Not completed"),
  "\n")

Quest completion status: Completed Not completed Completed 


The `ifelse` function helps in printing "Completed" or "Not completed" based on the logical values in quest_completed.

For numeric and integer vectors, you might want to format the output to control the number of decimal places:

In [None]:
cat("Gold amounts:", format(gold, nsmall = 2), "\n") # nsmall ensures two decimal places
cat("Steps walked:", steps_walked, "\n")

Gold amounts: 100.50 200.00 150.00 
Steps walked: 500 700 450 


As you can see `cat()` (like python's `print()`, but more highly focused on the formatted display of numerical data) is an incredibly powerful function for displaying data (and you shouldn't expect to master it all right away!).

## Understanding Vectors

**Vectors** are fundamental in R, as they are the simplest type of data structure. Unlike Python, where lists are the go-to linear data structure and can contain elements of different types, R vectors are **homogenous**, meaning all elements must be of the same type. When you attempt to mix types, such as combining numbers and strings, R will coerce the elements to the same type, following a set of hierarchy rules (e.g., numeric to character).

There are six basic data types that vectors can hold in R:

-   logical
-   integer
-   double (often called numeric)
-   complex
-   character
-   raw

#### Creation
You can create vectors using the `c()` function, which stands for 'concatenate' or 'combine':

In [None]:
numbers <- c(1, 2, 3, 4, 5)  # Numeric vector
characters <- c("a", "b", "c")  # Character vector
booleans <- c(TRUE, FALSE, TRUE)  # Logical vector

### Basic Operations and Characteristics
R is **vectorized**, which means that operations are applied to each element of the vector without the need for explicit looping. For instance:

In [None]:
numbers
numbers * 2  # Multiplies each element of the vector by 2

Elements in a vector are **indexed** starting with 1 (not 0 as in Python). You can access elements with square brackets:

In [None]:
characters[2] # second element (not third!)

The `length()` function gives you the number of elements in a vector:

In [None]:
length(numbers)

If you try to combine different types, R will **coerce** them into one type, with a hierarchy that generally goes from less to more informative (logical < integer < numeric < complex < character):

In [None]:
mixed <- c(1, "a")
str(mixed)  # Will show that all elements are now of character type

 chr [1:2] "1" "a"


### Basic Mathematical Operations

Vectors support arithmetic operations, which are performed element-wise. This means that if you add two vectors together, R will add the first element of the first vector to the first element of the second vector, the second element to the second element, and so on:

In [None]:
v1 <- c(10, 20, 30)
v2 <- c(1, 2, 3)
sum <- v1 + v2  # Results in c(11, 22, 33)

cat("The sume is:", sum)

The sume is: 11 22 33

Similarly, subtraction, multiplication, and division are also done element-wise:

In [None]:
difference <- v1 - v2
product <- v1 * v2
quotient <- v1 / v2

cat("Difference: ", difference,
  "\nProduct: ", product,
  "\nQuotient: ", quotient)


Difference:  9 18 27 
Product:  10 40 90 
Quotient:  10 10 10

You can also perform operations between a vector and a single number (**scalar**), where the operation is applied to each element:

In [None]:
doubled <- v1 * 2
doubled


## Table: R Code to Know

| R Code Example | Description |
| --- | --- |
| `vec <- c(1, 2, 3)` | R code to create a numeric vector with the elements 1, 2, and 3. |
| `char_vec <- c("a", "b", "c")` | R code to create a character vector with the elements 'a', 'b', and 'c'. |
| `logic_vec <- c(TRUE, FALSE, TRUE)` | R code to create a logical vector with the elements TRUE, FALSE, and TRUE. |
| `vec[2]` | R code to access the second element of a vector. |
| `length(vec)` | R code to get the number of elements in a vector. |
| `names(vec) <- c("first", "second", "third")` | R code to assign names to the elements of a vector. |
| `sum(vec)` | R code to calculate the sum of the elements in a vector. |
| `mean(vec)` | R code to calculate the mean of the elements in a vector. |
| `vec * 2` | R code to multiply each element of a vector by 2. |
| `cat("The value is:", vec[1])` | R code to print a message followed by the first element of a vector. |
| `df$column` | R code to access a specific column in a data frame named `df`. |
| `vec > 2` | R code to evaluate whether each element of a vector is greater than 2. |
| `!logic_vec` | R code to negate the elements of a logical vector. |
| `vec1 + vec2` | R code to add two vectors element-wise. |
| `c(vec, 4)` | R code to append an element to the end of a vector. |

## Exercises
### Exercise 1: Creating Numeric Vectors

-   Create a numeric vector with the numbers from 1 to 5.
-   Hint: Use the `c()` function to combine values into a vector.

### Exercise 2: Creating Character Vectors

-   Construct a character vector with the names of the seven days of the week.
-   Hint: Remember that character strings must be enclosed in quotes.

### Exercise 3: Mixing Types in Vectors

-   Try to create a vector that contains both numbers and characters. What happens?
-   Hint: R will coerce the data types to be consistent. Observe the result.

### Exercise 4: Vector Arithmetic

-   Create two numeric vectors, `a` and `b`, each with 5 elements, and calculate their sum.
-   Hint: Use `+` to add vectors of the same length.

### Exercise 5: Accessing Vector Elements

-   Create a vector with 10 elements and access the 4th element.
-   Hint: Use the square brackets `[]` with the index of the element you want to access.

In [None]:
# Exercise 1

In [None]:
# Exercise 2

In [None]:
# Exercise 3

In [None]:
# Exercise 4

In [None]:
# Exercise 5

### Exercise 6: Basic Statistical Operations

-   Calculate the mean and standard deviation of a numeric vector with at least 5 elements.
-   Hint: Use the `mean()` and `sd()` functions.

### Exercise 7: Printing with Formatting

-   Use the `cat()` function to print out a statement that includes elements from a vector. For example, print "The sum of the vector is: " followed by the actual sum.
-   Hint: You'll need to perform a calculation within the `cat()` function and use commas to separate text and calculation.

### Exercise 8 (Challenge): Understanding Logical Vectors

-   Logical vectors in R represent sequences of TRUE and FALSE values. They are the result of logical operations and are very useful for subsetting and conditional testing.
-   Generate a logical vector that signifies whether the numbers 1 through 5 are greater than 3.
-   To do this, you will compare a numeric vector (of numbers 1 to 5) the number 4, and R will perform the comparison element-wise and produce a new "logical vector".
-   Hint: Use a comparison operator like `>` with the `c()` function to create your numeric vector.

### Exercise 9 (Challenge): Exploring Named Vectors

-   In R, you can assign names to the elements of a vector, which can be especially helpful for readability and referencing elements by name instead of by position.
-   Create a numeric vector with three elements. Assign names to each element so that you have a named vector where each element corresponds to "first", "second", and "third".
-   Hint: Use the `names()` function to assign names to your vector after you've created it with the `c()` function, like so: `names(my_vector) <- c("name1", "name2", "name3")`.

In [None]:
# Exercise 6

In [None]:
# Exercise 7

In [None]:
# Exercise 8

In [None]:
# Exercise 9

# Working With Data Frames in R
In R, a **data frame** is the main structure you'll work with for storing tabular data. Each column can be of a different data type, and each row represents a single record. To work with a data frame, you don't always have to load data from an external file; R has several built-in datasets for practice, like `mtcars`. (We saw this data set early in the context of Python Pandas).

To see what's inside `mtcars`, you can use the `head()` function. This will display the first few rows of the data frame, giving you a glimpse into the types of data it contains. Each row in `mtcars` represents a car, and each column details attributes like fuel efficiency and horsepower.

Here's how you can view the first few rows of the `mtcars` dataset:

In [3]:
# This command shows the first six rows of the mtcars data frame.
head(mtcars)

# To see more rows, for example, the first ten, just add a number as an argument:
# head(mtcars, 10)


Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


The `head()` function in R is used to return the first part of the dataset. By default, it returns the first six rows, but you can specify the number of rows you wish to view by providing a second argument, as shown above.

Each row in the `mtcars` dataset corresponds to a different car model, and the columns contain various specifications and measurements relevant to each car, such as miles per gallon (mpg), number of cylinders (cyl), displacement (disp), and horsepower (hp), among others. Viewing the first few rows allows students to quickly discern the types of variables included and begin to consider the kinds of analyses that might be meaningful.

## Accessing Rows and Columns
Accessing specific parts of a data frame is a foundational skill in R. The `mtcars` data frame, like any in R, allows you to retrieve rows, columns, and individual data points using indexing with square brackets `[]`. Here's how it works:

-   To access all rows or all columns, you use a blank space before or after the comma inside the brackets.
-   To access specific rows or columns, you specify their positions as numbers.
-   To access individual data points, you specify both the row and the column position.

Let's break this down:

1.  **Accessing Rows--**If you want to access the 3rd row of `mtcars`, you'd write `mtcars[3, ]`. The comma here indicates that you want all columns for the 3rd row.

2.  **Accessing Columns**--To access the `mpg` column, you'd write `mtcars[, 'mpg']`. Alternatively, using column numbers, which for `mpg` is the first column, it's `mtcars[, 1]`.

3.  **Accessing Individual Data Points--**For a specific element, say the mpg value for the 3rd car, you'd write `mtcars[3, 'mpg']` or `mtcars[3, 1]`.

Here's the code for accessing a range and a mix of rows and columns:

In [4]:
# Access rows 3 to 5
mtcars[3:5, ]

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2


In [5]:
# Access mpg and hp columns for the first six cars
mtcars[1:6, c('mpg', 'hp')]

Unnamed: 0_level_0,mpg,hp
Unnamed: 0_level_1,<dbl>,<dbl>
Mazda RX4,21.0,110
Mazda RX4 Wag,21.0,110
Datsun 710,22.8,93
Hornet 4 Drive,21.4,110
Hornet Sportabout,18.7,175
Valiant,18.1,105


In [6]:
# Access the mpg value for the 4th car
mtcars[4, 'mpg']

## Basic Statistics With R

Obtaining summary statistics and structural information about a data frame is pivotal for initial data analysis. In R, the mtcars dataset, like any data frame, can be interrogated using several functions. In this section, we'll look at some of the primary functions you'll use.

### The `summary()` Function
This function gives you a quick statistical summary of each column in the data frame, which includes the minimum, maximum, median, mean, and the 1st and 3rd quartiles for numerical data. For categorical data, it will give you a count of each level in the factor.

In [7]:
# Summarize the entire mtcars data frame
summary(mtcars)

      mpg             cyl             disp             hp       
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
      drat             wt             qsec             vs        
 Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
 Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
 Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
 Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
       am              gear            carb      
 Min.   :0.0000   Min.   :3.000  

### The `str()` Function
The structure function `str()` provides a compact, human-readable summary of the internal structure of an R object. In the case of a data frame, it tells you the number of observations (rows), the number of variables (columns), the data type of each column, and the first few entries in each column.

In [8]:
# Examine the structure of mtcars
str(mtcars)

'data.frame':	32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...


### The `dim()` Function
This function tells you the dimensions of the object, in the case of a data frame, the number of rows and columns.

In [9]:
# Get the dimensions of mtcars
dim(mtcars)

### The `colMeans()` and `rowMeans()` Functions
These functions calculate the mean of each column or row, respectively, for datasets containing numerical data only.

In [10]:
# Average of each column
colMeans(mtcars)

In [11]:
# Average of each row (first 6 for brevity)
rowMeans(head(mtcars))

### The `sapply()` Function
This function can be used to apply a summary function to each column individually. For example, if you wanted to get the median of each column:

In [12]:
# Apply the median function to each column of mtcars
sapply(mtcars, median)

Together, these functions offer a powerful toolkit for understanding the structure and statistics of your data frame before you proceed to more detailed analysis or data manipulation.

## Filtering and Sorting Data
Filtering and sorting data within a data frame are key operations in data analysis, allowing you to view and organize your data according to specific criteria.  To filter the dataset, you can use logical operators within the square brackets `[]`.

Basic concepts in filtering include:

- `$` Operator: This is used to select a single column from a data frame, which returns the column as a vector. For example, `mtcars$mpg` gives you the miles per gallon for all cars in the `mtcars` data frame.

- Square Brackets `[]`: These are used for indexing. They allow you to extract specific portions of a data frame.

    -   `mtcars[1, ]` would return the first row of `mtcars` (all columns).
    -   `mtcars[, 1]` would return the first column of `mtcars` (all rows).
    -   `mtcars[, "mpg"]` would return the `mpg` column, similar to using `$`.
- Logical Operators: Within the square brackets, you can use logical operators (`<`, `>`, `<=`, `>=`, `==`, `!=`, `&`, `|`) to filter rows based on conditions.

### Filtering Syntax

When filtering, you combine these components to specify exactly what you want to see. For example:

-   `mtcars[mtcars$mpg > 25, ]` will return all rows where the `mpg` value is greater than 25.
-   `mtcars[mtcars$cyl == 4 & mtcars$hp > 100, ]` gives you rows where the car has 4 cylinders and horsepower greater than 100.

These operations can be read as:

-   "Give me all rows from `mtcars` where `mpg` is greater than 25."
-   "Give me all rows from `mtcars` where `cyl` is 4 AND `hp` is greater than 100."

The result of these expressions is a new data frame that meets your specified conditions. This new data frame can be used as is, or you might assign it to a new variable for further analysis.


###  Basic Filtering
To select rows that meet certain conditions:

In [14]:
# Filter mtcars for cars with more than 25 mpg
mtcars[mtcars$mpg > 25, ]

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Fiat 128,32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1
Honda Civic,30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2
Toyota Corolla,33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1
Fiat X1-9,27.3,4,79.0,66,4.08,1.935,18.9,1,1,4,1
Porsche 914-2,26.0,4,120.3,91,4.43,2.14,16.7,0,1,5,2
Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2


### Multiple Conditions
Use the `&` (and) and `|` (or) operators to filter with multiple conditions:

In [15]:
# Cars with more than 20 mpg and 6 cylinders
mtcars[mtcars$mpg > 20 & mtcars$cyl == 6, ]

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1


### Subset Function
The subset() function can make filtering more readable:

In [16]:
# The same filter using subset()
subset(mtcars, mpg > 20 & cyl == 6)

Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1


### Sorting the `mtcars` Dataset Using `order()

Sorting involves arranging the rows in ascending or descending order based on one or more columns. One way to sort is by using the  `order()` Function, which function generates a permutation which is the order of its first argument into its second argument:

In [17]:
# Sort mtcars by mpg in ascending order
mtcars[order(mtcars$mpg), ]


Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Cadillac Fleetwood,10.4,8,472.0,205,2.93,5.25,17.98,0,0,3,4
Lincoln Continental,10.4,8,460.0,215,3.0,5.424,17.82,0,0,3,4
Camaro Z28,13.3,8,350.0,245,3.73,3.84,15.41,0,0,3,4
Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4
Chrysler Imperial,14.7,8,440.0,230,3.23,5.345,17.42,0,0,3,4
Maserati Bora,15.0,8,301.0,335,3.54,3.57,14.6,0,1,5,8
Merc 450SLC,15.2,8,275.8,180,3.07,3.78,18.0,0,0,3,3
AMC Javelin,15.2,8,304.0,150,3.15,3.435,17.3,0,0,3,2
Dodge Challenger,15.5,8,318.0,150,2.76,3.52,16.87,0,0,3,2
Ford Pantera L,15.8,8,351.0,264,4.22,3.17,14.5,0,1,5,4


In [18]:
# For descending order
mtcars[order(-mtcars$mpg), ]


Unnamed: 0_level_0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Toyota Corolla,33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1
Fiat 128,32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1
Honda Civic,30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2
Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2
Fiat X1-9,27.3,4,79.0,66,4.08,1.935,18.9,1,1,4,1
Porsche 914-2,26.0,4,120.3,91,4.43,2.14,16.7,0,1,5,2
Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
Toyota Corona,21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1
