In this file, we'll get a more in-depth look at manipulating vectors for data analysis:

- Working with a subset of values in a vector.
- Assigning names to elements of a vector.
- Using comparison operators to answer questions about data stored in vectors.

We'll begin by investigating how our grades in STEM (science, technology, engineering, and math) classes compare with those in non-STEM classes.

**final_scores <- c(88, 87.66667, 86, 91.33333, 84, 91, 89.33333)**

We can **index** vectors to select a subset of the elements they contain. Within a vector, every element has a **position**. R is a **1-indexed** programming language, which means that the first element in a vector is assigned a position of one.

We can extract values from the vector by specifying their position in brackets ([]). To return the value in the third position of final_scores, we can write:

**final_scores[3]**

which returns:

**86**

We can also extract multiple values of a vector by specifying more than one position. We may want to extract a **range** of vector elements, which we can specify using a colon (:). To select the first through fourth elements of the `final_scores` vector, we would write:

`final_scores[1:4]`

`88.00000 87.66667 86.00000 91.33333`

If we want to select vector elements that are not next to each other, we can specify them using `c()`. To select elements in the first, third, and seventh position of the final_scores vector, we would write:

`final_scores[c(1,3,7)]`

`88.00000 86.00000 89.33333`

# Assigning names to elements

When we indexed the final_grades vector by position, was it difficult to match the element positions to the classes? 


Since there are only seven classes in the data set, we could simply refer to the table of data. However, if there were grades for more classes stored in `final_grades`, indexing by position would become tedious.

Assigning **names** to elements of a vector can make indexing easier.

# Types of data

As we delve into naming elements of vectors, let's take a moment to talk about the **types** of data contained in the vectors we're working with. In R, there are several main types of data. Understanding these different types is key to making the best use of the R language. In this file, we'll work with **numeric** and **character data**.

### 1. Numeric Data

In this file, we've been working with class grade data; the data has consisted entirely of **numbers**. In R, this data type is referred to as **numeric**. Numeric data may include **integer** data, or whole numbers (`88`), and **double** data, or decimals (`87.666667`).

As we begin working with multiple data types, the type of data we're working with won't always be clear. Because some operations we'll perform are only useful for some data types, we need to be sure of the type we're working with.

To display the data type of a vector, we'll use the `typeof()` function. Let's display the data type of final_scores.

`typeof(final_scores)`

This will display:

`"double"`

### 2. Character Data

In R, **"characters"** refer to all symbols that are used to make up a language, including letters, special characters like "%", "&", or "$", and numbers. Some functions we'll use to perform calculations, such as `min()` and `max()`, will not work on character data.

To create a vector containing character data, we can use the `c()` function as we did when we added grades to the final_scores vector. However, we need to specify the elements we are including consist of characters by surrounding them with quotation marks (either '' or "").

To create a vector containing the names of two of our classes, we would write:

`math_chemistry <- c("math", "chemistry")`

To check that the vector we've created contains character data, we'll use the typeof() function, which returns the data type of a vector:

`typeof(math_chemistry)`

`"character"`

Now that we've learned about character and numeric data types, let's return to assigning names to elements of the `final_grades` vector. As a first step, let's create a character vector of the names of our classes.

`class_names <- c("math","chemistry","writing","art","history","music","physical_education")`
                 

We have now created two vectors:

- `final_scores`, containing our class grades (numeric data).
- `class_names`, containing the class names (character data).

In R, vectors may have **attributes** assigned to them. Attributes provide information, such as names, about the values stored in the vector. To assign names to vector elements, we can use the `names()` function.

To illustrate how to use the `names()` function, let's create two vectors.

One vector contains the math and chemistry grades:

**math_chemistry <- c(88, 87.66667)**

The other contains the names of the two classes:

**class_names <- c("math", "chemistry")**

To assign the values stored in `class_names` as attributes of the grade values contained in the `math_chemistry` vector, we would write:

**names(math_chemistry) <- class_names**

Now, if we type `math_chemistry`, R returns:

![image.png](attachment:image.png)

We can also use the `names()` function to return the names of elements in a vector. If we type:

**names(math_chemistry)**

R returns the names that we assigned to the elements of `math_chemistry`:

`"math"   "chemistry"`

If we try to apply the `names()` function to a vector with no names assigned to its elements, R will return `NULL`.

Earlier in this file, we learned to index vectors using the positions of elements we wanted to extract. Now that we have assigned names to elements of the vector, we can index using the names.

Let's return to the `math_chemistry` vector. We assigned names to the grades contained in this vector :

![image.png](attachment:image.png)

If we want to return the score in chemistry class, we can index math_chemistry by the class name chemistry:

![image.png](attachment:image.png)

Remember, since the names attributes consist of character data, "chemistry" needs to be in quotes.

Indexing the `math_chemistry` vector by the name `chemistry` returns the grade (`87.66667`). We'd get the same result if we indexed by position.

![image.png](attachment:image.png)

As with indexing by position, we can index by name to return multiple elements using `c()`:

![image.png](attachment:image.png)

Earlier in this file, we calculated the averages of grades in STEM and non-STEM classes. Now, we're interested in comparing the average grades in our fine arts (art, music) and liberal arts (writing, history) classes.

`liberal_arts <- final_scores[c("writing", "history")]
fine_arts <- final_scores[c("art", "music")]
mean(liberal_arts)
mean(fine_arts)`

# Comparison operators

Instead visually comparing pairs of grades, we can write code using **comparison operators** to compare values based on specific conditions, such as `"greater than," "less than," or "equal to."`

When we compare two values using a comparison operator, if the values satisfy the condition, the R interpreter will return `TRUE`. If the values do not satisfy the condition, the R interpreter will return `FALSE`.

Below, we illustrate a comparison of our math final grade (`88`) against our chemistry grade (`87.6667`) using all the comparison operators:


![image.png](attachment:image.png)

These `TRUE` and `FALSE` values are of another data type in R: **boolean**, or **logical**. The logical data type can only consist of two values, `TRUE` and `FALSE`.

To answer the question, "Did I get a better grade in chemistry than I did in math?" we could write:

**math_chemistry["chemistry"] > math_chemistry["math"]**

This expression returns `FALSE`, since our chemistry grade is actually lower than our math grade.

Now, let's ask a different question: "Is the final math grade higher than the grade in my other classes?"

The syntax in R makes this comparison straightforward to write an expression for:

`final_scores["math"] > final_scores`

The output consists of comparisons of the math grade with each other class grade:

![image.png](attachment:image.png)

To understand why this code results in the output shown above, we need to understand how R works with vectors of different lengths. When comparing the math grade (a vector containing a single value) to a vector containing all grades, R replicates the shorter vector until it is the same length as the longer vector. Then it performs the operation, as illustrated below:

![image.png](attachment:image.png)

Like numeric and character data, logical data can be stored in vectors. If we want to store the results of comparing the math grade with the other grades as a variable called math_comparison, we can write:

`math_comparison <- final_scores["math"] > final_scores`

If we then use the typeof() function to check the data type of math_comparison, the output tells us that the data type is logical:


`typeof(math_comparison)`

`"logical"`


**Task**

1. Use the mean() function to calculate the grade point average from final_scores. Store this in a variable named gpa.
2. Compare final_scores to gpa to see whether the grade in each class is higher than the gpa. Store the logical output in a vector named above_average.

**Answer**

1. gpa <- mean(final_scores)
2. above_average <- gpa < final_scores

We've now created a logical vector, above_average, that tells us whether or not each of the grades is higher than the gpa:

![image.png](attachment:image.png)

# logical indexing.

The art, music, and physical education grades were higher than our grade point average. What if we want to create a new vector containing only grades from those classes in which our grade was higher than our gpa?

Above, we indexed by position and by name. Here, we'll introduce a new type of indexing called __logical indexing__.

Logical indexing will compare each value in a target vector against the corresponding value in a logical vector.

* If the corresponding value is `TRUE`, the resulting vector will contain that value.
* If the corresponding value is `FALSE`, the resulting vector will not contain that value.

Above, we compared the math grade with grades in other classes to see if it was higher. Let's store the result of this comparison in a vector of logical values:

`logical_vector <- final_scores["math"] > final_scores`

![image.png](attachment:image.png)

We can now index `final_scores` using `logical_vector`. This will allow us to create a new vector containing only class grades that are lower than the math grade:

`final_scores[logical_vector]`

# Multiple Vectors

We have now learned to perform operations on single vectors. We'll frequently use single-vector operations, such as calculating the average of values (or a subset of values) in a vector, as we analyze data.

For the rest of this file, we'll learn to make use of a very powerful feature of R: The ability to perform arithmetic operations on every element of multiple vectors at once.

let's consider an example.

We've been making great progress learning to use R to write programs to analyze our grades. Our friends have noticed our good work and have expressed interest in using our program to calculate their final grades, too.

Our friend, Noman, who has the same classes as we this year, emailed us all his average exam, homework, and project grades in the following format:

`Tests: 76, 89, 78, 88, 79, 93, 89
 Homework: 85, 90, 88, 79, 88, 95, 74
 Projects: 77, 93, 87, 90, 77, 82, 80`

Noman is a bit disorganized, but he assures us the grades are listed in the same order for each assignment category:

- math, chemistry, writing, art, history, music, physical_education

We start by creating three vectors, one for each assignment category, to work with:

`tests <- c(76, 89, 78, 88, 79, 93, 89)
 homework <- c(85, 90, 88, 79, 88, 95, 74)
 Projects <- c(77, 93, 87, 90, 77, 82, 80)`

First, Noman would like help calculating the final scores for each class. We could calculate each class grade individually:

`math <- (76 + 85 + 77) /3
 chemistry <- (89 + 90 + 93)/3 # etc.`

However, We're learning to use R with larger data sets. Instead of calculating each final grade, We can use **vector arithmetic** to perform these calculations.

Vector arithmetic is similar to the arithmetic we performed to make calculations using individual values earlier. When performing arithmetic on vectors, operations are performed between values in order of position.

To illustrate how adding two vectors together works, let's add Noman's tests and homework vectors and save the output as a new vector called sum:

`sum <- tests + homework`

The operation and the resulting vector would look like:

![image.png](attachment:image.png)

**Task**

* Calculate Noman's average scores for each class by adding the `tests, homework, and projects` vectors and dividing by `3`.
- Store the resulting vector in a variable named Noman_scores.
* Use the `mean()` function to calculate Noman's grade point average from Noman_scores.

**Answer**

* `tests <- c(76, 89, 78, 88, 79, 93, 89)`
* `homework <- c(85, 90, 88, 79, 88, 95, 74)`
* `projects <- c(77, 93, 87, 90, 77, 82, 80)`
* `Noman_scores <- (tests + homework + projects)/3`
* `mean(Noman_scores)`

In the above example, we calculated Noman's scores by performing vector arithmetic. In that scenario, the three vectors we're working with were of the same length. Each had seven values for the seven classes Noman took. This isn't always the case, though. For example, what if Noman forgot to give us a homework grade for one of the classes?

![image.png](attachment:image.png)

Whenever there's a mismatch in the length of two vectors that we're comparing, the shorter vector is **recycled** (or repeated) until it matches the length of the longer one.

To illustrate how R's recycling behavior works when we perform operations on vectors of different lengths, let's shorten our homework vector to only two values:

![image.png](attachment:image.png)

The R interpreter will determine that the homework vector is shorter than the tests vector and will automatically recycle the values in the homework vector until the two vectors are the same length:

![image.png](attachment:image.png)

Once the vector lengths match, the R interpreter will perform the specified arithmetic operation.

When we perform operations on vectors of unequal lengths in R, we will receive the following warning message:

`Warning message:
In tests + homework :
  longer object length is not a multiple of shorter object length`

R will still perform the calculation. The warning message is intended to alert us to the possibility that the different vector lengths were not intended.

Let's return to our scenario of writing programs to calculate our friends' grades. We have a very disorganized friend, Naima, who provides us with incomplete data: She is missing test averages for four of her classes. Let's see what will happen if we try to calculate her grades.

**Task**

* Here are Naima's test, homework and project grades:
 * Tests: 76, 89, 78
 * Homework: 85, 90, 88, 79, 88, 95, 74
 * Projects: 77, 93, 87, 90, 77, 82, 80

* Calculate the sum of Naima's test, homework, and project grades and store the resulting vector as a variable named `recycling`. Note the resulting warning message.

**Answer**

`tests <- c(76, 89, 78)
homework <- c(85, 90, 88, 79, 88, 95, 74)
projects <- c(77, 93, 87, 90, 77, 82, 80)
recycling <- tests + homework + projects`

Above, We calculated the sum of Naima's test, homework, and project grade vectors despite knowing that the data she gave us was incomplete. Although R recycled the incomplete `tests` vector and calculated an average grade for each class, these averages do not accurately reflect Naima's grade.

Luckily, while cleaning her room, Naima found the four tests she was missing and was able to give her grades:

* 88
* 79
* 93
* 89

Rather than re-typing the tests vector, we can append, or add, Naima's test scores to it.

We used the `c()` function to create vectors:

`tests <- c(76, 89, 78)`

To add additional elements to a vector, we can use `c()` to create a new vector consisting of the existing vector plus the new elements we want to add to it:

`tests <- c(tests, 99, 67)`

Naima has asked us to help her figure out which of her classes were her weakest so that she can improve. Let's add Naima's test scores to the `tests` vector and help her figure out which of her classes need more effort.

`class_names <- c("math", "chemistry", "writing", "art", "history", "music", "physical_education")
tests <- c(tests, 88, 79, 93, 89)
naima_grades <- (tests + homework + projects)/3
names(naima_grades) <- class_names
naima_gpa <- mean(naima_grades)
lower_than_GPA <- naima_grades < naima_gpa
naima_low_grades <- naima_grades[lower_than_GPA]`