# Introduction to Vectors

## Indexing a Vector By Position

From the previous mission, you learned the `min()` function. Let's try applying the `min()` function to our `final_scores` vector:

```r
print(min_final_scores))

[1] 84
```

Calling a `min()` on our `final_scores` vector doesn't provide us with the name of the class like `math` or `chemistry`. It still prints our score. We'd like the R interpreter to print both `name` and `score`. Like this:

```r
math
88
```

Only using `min()` won't work as our final solution. We'll return to `min()` later on in this mission. In the meantime, let's find another method that can print both name and the score.<br>

One method, is to use **indexing**. Indexing means to select a subset of values for use. Within a vector, every value has a position in the vector. R is a **1-indexed** programming language. 

### This means, that the first value has a position of 1.

![r-1-indexing-lang](https://s3.amazonaws.com/dq-content/181/position_v3.svg)

We'll go over what `names()` is in a later screen. Let's use our `math_chemistry` vector and return the value in the second position. In order to return a specific value from our vector, follow these steps:

![](https://s3.amazonaws.com/dq-content/181/vector_index.svg)

When you index into a vector, the interpreter will display both the value and the name.

In [1]:
final_scores <- c(88, 87.66667, 86, 91.33333, 84, 91, 89.33333)
print(final_scores[3])

third = final_scores[3]

[1] 86


## Understanding Numeric Data Types

Whenever you store a number or return a number, this value is a **numeric** data type. Numeric types are both whole numbers & decimal numbers(`88`, `87.666667`). However, not all values stored in R are of the numeric data type. We'll introduce a different data type later on in this mission, called the character type. First, let's figure out how check our data type.<br>

To display the data type of a vector, you'll use the `class()` function. The `class()` function operates similar to the `mean()` function. Just place the vector in between the parentheses.<br>

Let's display the data type of math_chemistry.

```r
math_chemistry <- c(88,87.66667)
class(math_chemistry)

[1] "numeric"
```

The displayed result is `"numeric"`. We'll go over what the quotations(`" "`) are in the next screen. Within the numeric data type, there is more complexity between different types of numbers, which you can read about [here](http://uc-r.github.io/integer_double/) if you'd like. Let's put the `class()` function to use!

In [2]:
class(final_scores)

## Understanding Character Data Types

In the previous screen, we introduced numeric data types. However, we're not looking to return a number `86`, we're looking to return a class name (`math`, `chemistry` etc). In addition, asking a question like "what was my score in position 3?", doesn't have much real-world utility.<br>

As a result, there are a few more steps we need to take to return both a final score and class name. To return a name, let's learn about the `character` data type. The `character` data type is a common non-numerical data type. To represent the `character` data type, we'll surround our text with double quotes (`"`). `"math"` would be an example of a `character` data type.

```r
>>> print("math")
math
```

Unlike variable names, you can include special characters within character types. Similar to assigning a number to a variable, you can also assign a character to a variable:

```r
class_name <- "math"
print(clasS_name)

[1] "math"
```

Let's return to our `math_chemistry` vector. Within this vector, we have our math score `88` and our chemistry score `87.66667`:

```r
math_chemistry <- c(88,87.66667)
class_names <- c("math","chemistry")
class(class_names)

[1] "character"
```

A common mistake is to forget to include the `" "`. Let's see what happens when we forget to use `" "`. Let's try creating a vector of text with `math` and `chemistry`, without the quotation wrapper(`" "`):

```r
class_names <- c(math, chemistry)
```

This will return an error:

```
Error: object 'math' not found
```

Without quotes surrounding math and chemistry, the R interpreter will treat these two values as variables. **This will return an error because we didn't create these variables in this program**.

In [3]:
class_names <- c("math", "chemistry", "writing", "art", "history", "music", "physical_education")
print(class(class_names))

[1] "character"


## Naming Values

To allow our program to return `math`, `chemistry` etc, let's find a way to label our scores in `final_scores` with our `class_names`. That way, when you display the classes we're struggling with, it'll **return both the score and the class name**.<br>

Vectors have a feature where you can name the values within the vector. You'll use the `name()` function to assign these values. However, the `name()` function works differently than the `mean()` and other functions we've used.<br>

When you look at a vector, you'll only see its values. Let's take `math_chemistry <- c(88,87.66667)`. However, **vectors have an additional feature that can store the names of every value**. This feature is called an `attribute`. Attributes are labeled values we can attach to our vector.<br>

To access an attribute, use the `names()` function on your vector: `names(math_chemistry)`. This function that accesses the attribute of a vector is called an **accessor** function. By default, attributes are empty, so calling `names(math_chemistry)` will return an NA:

![vector-attr-names](https://s3.amazonaws.com/dq-content/181/names_v2.svg)

While functions like `mean()` and `min()` accept input values and return a new, computed value, functions like `print()` and `names()` return information about the value that's passed in.<br>

To name our values in a vector, there are three steps: 
1. Create a vector of names. 
2. Call the `names()` function on the original vector (not the vector of names). 
3. Store the vector of names in `names()`.

In [5]:
class_names <- c("math", "chemistry", "writing", "art", "history", "music", "physical_education")
final_scores <- c(88, 87.66667, 86, 91.33333, 84, 91, 89.33333)

names(final_scores) <- class_names
named_final_scores <- final_scores

print(named_final_scores)

              math          chemistry            writing                art 
          88.00000           87.66667           86.00000           91.33333 
           history              music physical_education 
          84.00000           91.00000           89.33333 


## Indexing Vectors using Names

Now that we've named our `final_scores` vector, we can use a **named index** to answer the question "what was my score in math class?"<br>

Before we use a named index, there is one key requirement when indexing by name:

#### To index by name, you must have already named the values in your vector using the `names()` function.

By default, attributes are empty, so calling `names(vector)` will be `NA`. You cannot index a vector by name, if the name is `NA`. This is why you need to name your values first.<br>

Indexing by name is similar to indexing by position, except we're accessing the value by name. To answer our question, "what was my score in math class?", indexing `"math"` into your vector will answer your question!

![indexing-vectors-using-names](https://s3.amazonaws.com/dq-content/181/named_indexing.svg)

Let's index our final scores vector by name!

In [6]:
class_names <- c("math", "chemistry", "writing", "art", "history", "music", "physical_education")
final_scores <- c(88, 87.66667, 86, 91.33333, 84, 91, 89.33333)
names(final_scores) <- class_names

In [7]:
history <- final_scores['history']
print(history)

art <- final_scores['art']
print(art)

music <- final_scores['music']
print(music)

history 
     84 
     art 
91.33333 
music 
   91 


## Comparing Values and Logical Data Types

While you could answer this question by indexing with math and chemistry, `final_scores["math"]` and `final_scores["chemistry"]`, and then visually comparing the result of `88` and `87.6667`, **this method is not robust if we had thousands of data points**.<br>

Rather than eyeball the comparison, you can represent this comparison in code by using a comparison operator. A comparison operator compares two values based on a specific condition. A comparison operator will create a condition that compares `88` against `87.6667`. There are multiple conditions you can use, such as `greater than` or `less than`. See table below for most common conditions.<br>

After you use a comparison operator, if the the values satisfy the condition, the R interpreter will return `TRUE`. If the values do not satisfy the condition, the R interpreter will return `FALSE`. `TRUE` and `FALSE` are not numeric or character data types, they're called **boolean** or **logical** data types. The logical data type can only take on two values, `TRUE` or `FALSE`.<br>

Let's compare `88` against `87.6667` using all the comparison operators:

![r-boolean-logical](https://s3.amazonaws.com/dq-content/181/comparison_v3.svg)




In [8]:
class_names <- c("math", "chemistry", "writing", "art", "history", "music", "physical_education")
final_scores <- c(88, 87.66667, 86, 91.33333, 84, 91, 89.33333)
names(final_scores) <- class_names

In [9]:
history_math <- (final_scores['history'] > final_scores['math'])
writing_art <- (final_scores['writing'] < final_scores['art'])
music_chem <- (final_scores['music'] == final_scores['chemistry'])

In [10]:
print(history_math)
print(writing_art)
print(music_chem)

history 
  FALSE 
writing 
   TRUE 
music 
FALSE 


## Comparing Single Values against Vectors

In the previous section, we compared `88` against `87.6667`. In this scenario, we're comparing a single value against another single value. Returning to our original question, if we wanted to display the class you're struggling in, one method would be to compare every value against each other:

```r
final_scores["math"] > final_scores["chemistry"]

final_scores["math"] > final_scores["writing"]

final_scores["math"] > final_scores["art"]

final_scores["math"] > final_scores["history"]
.............
```

Going through each class you took and comparing every value against each other can work. However, you're going to have to write `21` different expressions to make this calculation.<br>

Luckily, comparison operators do not limit you to comparing single values against single values. In fact, you can compare a single value against an entire vector. Let's compare your math score against your `final_scores` vector and see which classes math scored higher than.<br>

Writing this comparison, would look like this:

```r
final_scores['math`] > final_scores 
```

This displays:

```r
math   chemistry    writing     art      history     music
    FALSE  TRUE         TRUE        FALSE    TRUE        FALSE          physical_education 
   FALSE
```

Let's dig into how this comparison works. Here's the comparison in visual form:

![comparing-single-vals-against-vectors](https://s3.amazonaws.com/dq-content/181/single_val_vector_v4.svg)

The single value will then be compared against every value in the vector. The reason behind this is the recycling rule, which we'll dive into, later in this mission.<br>

Let's look at a visual diagram to better solidify this concept.

![comparing-single-vals-against-vectors-2](https://s3.amazonaws.com/dq-content/181/single_value_vec_comp_v2.svg)

Let's store this expression in a vector:

```r
logical_vector <- (final_scores["math"] > final_scores)
```
Recall, the operations within the parentheses will be performed first. In this case, this is the comparison. The logical_vector then, would be a vector of boolean values:

```r
logical_vector <- c(FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE)
```

In [11]:
class_names <- c("math", "chemistry", "writing", "art", "history", "music", "physical_education")
final_scores <- c(88, 87.66667, 86, 91.33333, 84, 91, 89.33333)
names(final_scores) <- class_names

In [12]:
lowest_score <- min(final_scores)
lowest_logical <- (lowest_score == final_scores)

print(lowest_logical)

              math          chemistry            writing                art 
             FALSE              FALSE              FALSE              FALSE 
           history              music physical_education 
              TRUE              FALSE              FALSE 


## Indexing using Logical Data Types

Now that we've created a vector that tells us whether each value is the lowest score, the problem we have is that this is in `logical` form. A vector of `TRUE` or `FALSE` values doesn't explicitly tell us which classes we're struggling with.<br>

In the previous exercises, you learned how to index by position and by name. Let's introduce a new type of indexing called **logical indexing**. Logical indexing will help us figure out which classes you're struggling with.<br>

Logical indexing will check each value of your target vector, against the corresponding value in the logical vector. If the corresponding value is `TRUE`, then the resulting slice will contain that value. If the corresponding value is `FALSE`, the resulting slice will not contain that value.<br>

In the previous section, we wanted to compare our math score against our score vector using the greater than operator. We made the following comparison `final_scores["math"] > final_scores` to create the logical vector: `c(FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, FALSE)`.<br>

If we were to index into `final_scores` using our logical vector, this would tell us explicitly the classes and the scores that are lower than math:

![indexing-using-logical-data-types](https://s3.amazonaws.com/dq-content/181/logical_index_v4.svg)

In review, here are the steps we took over the last two screens to display the classes math scored higher than:

![indexing-using-logical-data-types2](https://s3.amazonaws.com/dq-content/181/logical_index_v2.svg)



In [13]:
class_names <- c("math", "chemistry", "writing", "art", "history", "music", "physical_education")
final_scores <- c(88, 87.66667, 86, 91.33333, 84, 91, 89.33333)
names(final_scores) <- class_names

lowest_score <- min(final_scores)
lowest_logical <- lowest_score == final_scores

In [14]:
# Use lowest_logical to index into final_scores. 
# Store this in lowest_class.

lowest_class <- final_scores[lowest_logical]
print(lowest_class)

history 
     84 


## Performing Arithmetic with a Vector

You've figured out the class you're struggling with! As you've been writing this program, your friends have expressed interest in using your program to calculate their grades.<br>

Your friend Johnny, has given you all his exam, homework and project scores for the same seven courses. However, Johnny is a bit disorganized, so you'll need to re-organize the scores to make the calculation. Here are Johnny's scores:

```r
tests <- c(76, 89, 78, 88, 79, 93, 89)
homework <- c(85, 90, 88, 79, 88, 95, 74)
projects <- c(77, 93, 87, 90, 77, 82, 80)
```

Johnny first, wants you to calculate the final scores for each class. Assuming that each value corresponds to the same class, we'll use **vector arithmetic** to make these calculations.<br>

Vector arithmetic is similar to arithmetic in your first mission. However, we're now performing these operations on vectors. Vector arithmetic is performed member-by-member. This means, that the operation will be performed between each values, *by position*.<br>

If we took Johnny's tests & homework scores vectors, added them together and stored them in `sum`:

```r
tests <- c(76, 89, 78, 88, 79, 93, 89)
homework <- c(85, 90, 88, 79, 88, 95, 74)
sum <- tests + homework
```

The operation and the resulting vector would look like:

![vector-arithmetic](https://s3.amazonaws.com/dq-content/181/vector_arithmetic_v4.svg)






In [16]:
tests <- c(76, 89, 78, 88, 79, 93, 89)
homework <- c(85, 90, 88, 79, 88, 95, 74)
sum <- tests + homework

projects <- c(77, 93, 87, 90, 77, 82, 80)

In [None]:
# Create a johnny_scores vector by adding the projects vector to sum
# and dividing the resulting vector by 3.

johnny_scores <- (sum + projects)/3

In [None]:
# Use the mean() function on johnny_scores to calculate Johnny's overall score. 
# Store this in johnny_overall.

johnny_overall <- mean(johnny_scores)

## Vector Recycling Rule

In the previous screen, you calculated Johnny's scores by performing vector arithmetic between his scores. In that scenario, all the vectors had the same length. However, what happens if Johnny forgets to give us a value. So instead of:

![vector-recycle1](https://s3.amazonaws.com/dq-content/181/vector_arithmetic_v4.svg)

He forgets to give us a homework value:

![vector-recycle2](https://s3.amazonaws.com/dq-content/181/vector_recycle3.svg)

Earlier in this mission, we learned when comparing a single value with an entire vector, the R interpreter compares the single value against every value in the vector. Whenever there's a mismatch in the length of 2 objects that are compared, the shorter object is recycled (or repeated) until it matches the longer one.<br>

When we perform an arithmetic operation on two vectors that have different lengths, the R interpreter will also reuse the values of the shorter vector.<br>

This recycling behavior is called the **recycling rule**. The recycling rule states that when performing an operation between two vectors of unequal length, the R interpreter will automatically recycle the shorter one, until it's long enough to match the longer one.<br>

Here's what happening when we employ the recycling rule. Let's shorten our homework vector to only two values:

```r
tests <- c(76, 89, 78, 88, 79, 93, 89)
homework <- c(85, 90)
```

And then we'll add `tests` with `homework` to see the recycling rule in action. In the diagram, we've color coded the values in homework to show how the values recycle:

![vector-recycle3](https://s3.amazonaws.com/dq-content/181/vector_r_1_v5.svg)

The R interpreter will see that the homework vector is shorter than the tests vector. As a result, it'll automatically recycle the homework values, starting with the first element:

![vector-recycle4](https://s3.amazonaws.com/dq-content/181/vector_r_2_v3.svg)

Once the first value is used, the interpreter will look at the next value:

![vector-recycle5](https://s3.amazonaws.com/dq-content/181/vector_r_3_v4.svg)

And then, it'll keep recycling the values, in this pattern, until it matches the length of the longer vector:

![vector-recycle6](https://s3.amazonaws.com/dq-content/181/vector_r_4_v4.svg)

Once the vector lengths match, the R interpreter will perform the specified arithmetic operation.<br>

In the section where you compared `final_scores["math"] > final_scores`, you compared a single value (math) against the entire vector. In this scenario, the R interpreter used the recycling rule by re-using the single value, **until it matched the length of `final_scores`**.


In [17]:
tests <- c(76, 89, 78)
homework <- c(85, 90, 88, 79, 88, 95, 74)

In [18]:
recycling <- tests + homework
print(recycling)

“두 객체의 길이가 서로 배수관계에 있지 않습니다”

[1] 161 179 166 155 177 173 150


## Appending Data to a Vector

In the previous section, you learned about the vector recycling rule by adding `tests` and `homework` together. Recycling the three test scores(`76`, `89`, `78`) to look like c(`76`, `89`, `78`, `76`, `89`, `78`, `76`) would allow us to perform an operation. However, for Johnny, recycling his scores for three classes, won't give him an accurate score for each class. Johnny didn't score `76, 89, 78, 76` in his last four classes. To fix this, let's append Johnny's real scores to our shorter vector.<br>

To append data points to our shorter vector, let's look at the steps to appending by appending one value to `tests`:

![append-data-to-vec](https://s3.amazonaws.com/dq-content/181/append_data.svg)

In addition, you could use the same method to append multiple values. To append another value, follow the same steps, except, add another value to the expression:

![append-data-to-vec2](https://s3.amazonaws.com/dq-content/181/append_multiple_data.svg)

We've added two values to the test vector. Let's finish adding the rest of Johnny's scores!



In [19]:
tests <- c(76, 89, 78)

In [20]:
tests <- c(tests, 88, 79, 93, 89)
print(tests)

[1] 76 89 78 88 79 93 89


## Next Steps

In the next mission, we'll dive into a real-world university rankings dataset. We'll expand our R capabilities by introducing the two-dimensional version of a vector, matrices:

world_rank|quality_of_education|influence|broad_impact|patents
---|---|---|---|---
Harvard|1|1|1|1|3
Stanford|2|9|3|4
MIT|3|3|2|2
Cambridge|4|2|6|13
Oxford|5|7|12|9
Columnbia|6|13|13|12


Click the Finish button below to continue to this mission.