# Introduction to Matrices


## Recommending Universities

As a high school senior, you've calculated your grades and you received stellar marks. The next step on your journey, is to decide which college to attend. You've been accepted into the top universities in the world but need help choosing the university.<br>

When choosing your schools, you've decided that you want to maximize the quality of your education while minimizing the price. In the previous mission, you learned how to use a vector to answer your questions. In this mission, we'll add another dimension by using a matrix to answer our question.<br>

We'll be using the [Times Higher Education World University Ranking dataset](https://www.kaggle.com/mylesoneill/world-university-rankings) to find a university recommendation. We've removed many columns for the purpose of this mission. Each row corresponds to a specific university. Here's a description of each column in the dataset:

* `University`: The name of the university
* `world_rank`: The world rank for the university
* `quality_of_education`: Rank for quality of education
* `influence`: Rank for amount of influence university has on big issues
* `broad_impact`: Rank for impact university has on the world
* `patents`: Rank for number of patents created

Let's take a vector of data on Harvard University's world ranking in different categories:

```r
harvard <- c(1,1,1,1,3)
names(harvard) <- c("world_rank","quality_of_education","influence","broad_impact","patents")
```

 |world_rank|quality_of_education
---|---|---
Harvard|1|1

Let's say we wanted to compare `harvard` to `stanford`:

```r
stanford <- c(2,9,3,4,10)
names(stanford) <- c("world_rank","quality_of_education","influence","broad_impact","patents")
```

While using a one-dimensional vector to store our data is useful, we're limited in a few ways:
* We need to create a new vector every time we want to add a university to our data.
* Comparing the values between the two vectors, isn't intuitive. You need to visually match the categories & rankings together.
* If we were analyzing hundreds of universities, writing out hundreds of lines of vectors might not be the most efficient way of performing an analysis.

Rather than create a vector for each school, we could use a **matrix** to hold all our university data in one place. A matrix is a collection of data values arranged in a two-dimensional, rectangular layout. A matrix is also a two-dimensional vector.<br>

Let's compare `harvard` and `stanford`, but instead, in matrix form:

 |world_rank|quality_of_education
---|---|---
Harvard|1|1
Stanford|2|9

In matrix form, it's much easier to see which school has a higher education rank. You can also access all your data in one location, so you don't need to store 20 different vectors. In this mission, we'll be figuring out a university recommendation for you, by learning how to manipulate matrices.<br>

Here's what the data looks like in this mission:

world_rank|quality_of_education|influence|broad_impact|patents
---|---|---|---|---
Harvard|1|1|1|1|3
Stanford|2|9|3|4
MIT|3|3|2|2
Cambridge|4|2|6|13
Oxford|5|7|12|9
Columnbia|6|13|13|12

## Combine Vectors

In the previous screen, you saw how a matrix might be a more effective way of holding more data points. In our previous scenario, we had two vectors: `harvard` and `stanford`. To realize the benefits of a matrix, let's combine these two vectors into a matrix.<br>

To transform multiple vectors into one matrix, the first thing we'll do, is combine all our disparate vectors into one vector. When we introduce the `matrix()` function, we'll dive deeper into why you need to combine all disparate vectors into one vector.<br>

As review, in the previous mission, you helped your friend Johnny calculate his scores by appending `88` to the end of the `tests` vector:

```r
tests <- c(76, 89, 78)
tests <- c(tests, 88)
```

Combining multiple vectors into one vector, follows a similar process to appending data to a vector. Instead of appending one data point, we're appending multiple data points. Let's see this in action by combining our `harvard` vector and `stanford` vector into one vector.<br>

Let's take our two vectors:

```r
harvard <- c(1,1,1,1,3)
stanford <- c(2,9,3,4,10)
```

And then combine these into a single vector called `harv_stan`:

```r
harv_stan <- c(harvard, stanford)
```

If you'd like to combine more than two vectors, you can perform the same steps. Include all the vectors you want to combine within the `c()` function.<br>

We've combined two of the university vectors. Let's combine the rest!


* Combine the following university vectors into one vector called `uni_vector`:

In [23]:
harvard <- c(1,1,1,1,3)
stanford <- c(2,9,3,4,10)
MIT <- c(3,3,2,2,1)
cambridge <- c(4,2,6,13,48)
oxford <- c(5,7,12,9,15)
columbia <- c(6,13,13,12,4)

uni_vector <- c(harvard, stanford, MIT, cambridge, oxford, columbia)

## Creating a Matrix

In the previous section, you learned how to combine multiple vectors into one vector. Now, our data is in a format, where we can transform it into a **matrix**.<br>

A matrix is a collection of values of the same data type arranged in a two-dimensional rectangular shape. To create a matrix, you'll use the matrix() function. Within the matrix() function, there are a few arguments you'll enter. An argument is an input a function needs to give you an output. Here is the matrix function & its arguments:

```r
matrix(data = NA ,nrow = 1,ncol = 1,byrow = FALSE)
```

* **data**: This is the data that will be transformed into a matrix. This data must be in vector form. This is why you combined vectors in the earlier screen.
* **nrow**: This is the number of rows you want your matrix to hold.
* **ncol**: This is the number of columns you want your matrix to hold.
* **byrow**: This is a logical value of either `TRUE` or `FALSE`. 
  * If `TRUE`, the matrix will be filled *by rows*. 
  * If `FALSE`, *by columns*. 
  * We'll dive deeper into how this argument works, later in this mission.
  
Let's fill in these arguments by transforming our `harvard` and `stanford` vectors into a matrix. In the previous screen, we combined our two vectors into one vector:

![creating-a-matrix1](https://s3.amazonaws.com/dq-content/182/harv_stan.svg)

Since we're only dealing with two universities with five different categories each, we'll need to create a matrix with `2` rows and `5` columns. Let's use the `matrix()` function to create an empty matrix with the specified dimensions. We're creating an empty matrix, because later on, we're going to fill our matrix with the data from `harv_stan`:

![creating-a-matrix2](https://s3.amazonaws.com/dq-content/182/empty.svg)

Let's fill our matrix with the data from `harv_stan`. However, there are multiple ways to fill our matrix. This is why we have the `byrow` parameter. By default, if we do not specify anything, `byrow` will be set to `TRUE`. This means, by default, the interpreter will fill the matrix by row:

![creating-a-matrix3](https://s3.amazonaws.com/dq-content/182/fill_row.svg)

If we wanted to fill our matrix by column, we'd set `byrow=FALSE`. As a result, the matrix will use the vector to fill in a different direction:

![creating-a-matrix4](https://s3.amazonaws.com/dq-content/182/fill_col.svg)

Let's use the `matrix()` function to transform our combined vector into a matrix!

* Transform this `uni_vector` into a six row by five column matrix. Name this `uni_matrix`.
* Fill this matrix setting `byrow` to `TRUE`.

In [24]:
uni_matrix <- matrix(uni_vector, nrow = 6, ncol = 5, byrow = TRUE)
print(uni_matrix)

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    1    1    3
[2,]    2    9    3    4   10
[3,]    3    3    2    2    1
[4,]    4    2    6   13   48
[5,]    5    7   12    9   15
[6,]    6   13   13   12    4


## Vector & Matrix Data Types

So far, we've only worked with vectors that contain values of a single type. From your previous missions, notice, that we've never created a vector or matrix, that contain multiple data types. This is because, a vector & matrix can only store one data type. When you used `names()` to name your vector, you are accessing something called an attribute, which is not considered a value in the object.<br>

If you tried to create a vector or matrix containing multiple data types, the R interpreter will not return an error. Instead, the R interpreter will attempt to guess the correct data type and then convert all your data to that data type.<br>

For example, let's say we tried to store the following vector of harvard & stanford values with both `numeric` and `character` data types:

```r
harv_stan_dtype <- c("harvard",1,1,1,1,3,"stanford",2,9,3,4,10)
```

If we display `harv_stan_dtype` using the `print()` function, we see:

```r
[1] "harvard"  "1"        "1"        "1"        "1"        "3"       
 [7] "stanford" "2"        "9"        "3"        "4"        "10"
```

In the previous screen, you used a vector as an argument for a matrix. Because matrices are vectors in two-dimensions, when you create a matrix with multiple data types, the R interpreter will guess the correct data type. Then, it'll convert all the data into the guessed data type.<br>

Let's try creating a vector of multiple data types to see this in action!

* Create a vector with two data types, using the following values:
```
"columbia",6,13,13,12,4
```
* Store this vector in `columbia_types`.
* Use the `class()` function on this vector to check it's data type. Store this in `type`.
* Print `type`.

In [25]:
columbia_types <- c("columnbia", 6, 13, 13, 12, 4)
type <- class(columbia_types)
print(type)

[1] "character"


## Naming Rows and Columns

Now that you've created a matrix, let's look at what our current matrix look like:

In [26]:
print(uni_matrix)

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    1    1    3
[2,]    2    9    3    4   10
[3,]    3    3    2    2    1
[4,]    4    2    6   13   48
[5,]    5    7   12    9   15
[6,]    6   13   13   12    4


We've stored all our values, however, we don't know which row corresponds to `stanford`. We don't know which column corresponds to `world_rank`. We don't know which row corresponds with `harvard`. In order to come up with a university recommendation, we need to add labels to our rows and columns.<br>

Naming the rows and columns of our matrix is similar to naming a vector, except we need to name both our rows and columns. In a previous mission, you named the values in a vector, by using the accessor function `names()`. When naming a matrix, we'll be using an accessor function. Here are the accessor functions for rows and columns:

* **rows**: `rownames(matrix)`
* **columns**: `colnames(matrix)`

When we wanted to name our `harv_stan` vector, we wrote the following:

```r
univerisites <- c("harvard","stanford")
names(harv_stan) <- universities
```

To name the columns, follow the same steps, but use `colnames()` and `rownames()` instead of `names()`:

```r
rownames(harv_stan) <- c("harvard","stanford")

colnames(harv_stan) <- c("world_rank","quality_of_education","influence","broad_impact","patents")`
```

* Name the columns and rows of our `uni_matrix` with the following vectors:

```r
categories <- c("world_rank","quality_of_education","influence","broad_impact","patents")

universities <- c("Harvard","Stanford","MIT","Cambridge","Oxford","Columbia")
```

* Store the resulting `uni_matrix` in `named_uni_matrix`.

In [27]:
categories <- c("world_rank","quality_of_education","influence","broad_impact","patents")
universities <- c("Harvard","Stanford","MIT","Cambridge","Oxford","Columbia")

colnames(uni_matrix) <- categories
rownames(uni_matrix) <- universities
named_uni_matrix <- uni_matrix

In [28]:
print(named_uni_matrix)

          world_rank quality_of_education influence broad_impact patents
Harvard            1                    1         1            1       3
Stanford           2                    9         3            4      10
MIT                3                    3         2            2       1
Cambridge          4                    2         6           13      48
Oxford             5                    7        12            9      15
Columbia           6                   13        13           12       4


## Finding the dimensions of the matrix

Here's what `uni_matrix` looks like:

In [29]:
named_uni_matrix

Unnamed: 0,world_rank,quality_of_education,influence,broad_impact,patents
Harvard,1,1,1,1,3
Stanford,2,9,3,4,10
MIT,3,3,2,2,1
Cambridge,4,2,6,13,48
Oxford,5,7,12,9,15
Columbia,6,13,13,12,4


To find which universities have the best education, while minimizing costs, you'll notice we're missing data on the cost of the university. As a result, you did some additional research on the costs for each university:<br>

```r
tuition <- c(43280,45000,45016,49350,28450,55161)
```

As a result, we'd like to add this vector of information to the current dataset to compare costs between schools. When we want to add a new column or row to a matrix, we want to first make sure the additional vector matches the dimensions of the matrix. To do this, we'll find the length of the vector and see if it matches either the rows or columns.<br>

When adding a new column, length of vector must match number of rows:

![matrix-dimension1](https://s3.amazonaws.com/dq-content/182/add_rows_v2.svg)

When adding a new row, length of vector must match number of columns:

![matrix-dimension2](https://s3.amazonaws.com/dq-content/182/add_cols.svg)

It's good practice to check the dimensions to see if they match. If they do not match, the R interpreter will not throw an error. Instead the *recycling rule* will come into play. In many cases, the recycled values aren't useful in our analysis.<br>

To find the dimensions of the matrix, you can use the `dim()` function.<br>

Like this:

```r
dim(harv_stan)
```

The dimensions of `harv_stan` matrix is:

```r
[1] 2 5
```

Keep in mind, when you call `dim(harv_stan)`, this returns a vector of the rows by columns, in this order. In our example, `2 5` says we have two rows and five columns.<br>

To find the length of a vector, you can use the `length()` function. Let's use this function on two tuition values: 

```r
harv_stan_tuition <- c(43280,45000)
length(harv_stan_tuition)
```

The R interpreter will return:

```r
[1] 2
```

Once you have both the dimensions of the matrix and the length of the vector you want to add, depending on whether you're adding a new row or column, make sure the length the vector matches the dimensions of the matrix. To check if they match, you can write the following:

```r
dim(harv_stan)[1] == length(tuition)
```

This should return:

```r
TRUE
```

We're indexing `[1]` into `dim(harv_stan)[1]` since using `dim()` will return a vector of two values: number of rows & number of columns. Since we're adding a new column we want the length of the column to match the number of rows. `[1]` will tell us the number of rows.<br>

Let's use the `dim()` and `length()` function to make sure the tuition vector can fit with our matrix.





* Find the dimensions of the matrix: `uni_matrix`. Display using `print()` statement.
* Find the length of the `tuition` vector. Display using `print()` statement.
* Use a comparison to see if the number of rows equals to the length of the vector. Store in `equality`. Display using `print()` statement.

In [30]:
tuition <- c(43280,45000,45016,49350,28450,55161)

print(dim(uni_matrix))
print(length(tuition))

equality <- ((dim(uni_matrix)[1]) == (length(tuition)))
print(equality)

[1] 6 5
[1] 6
[1] TRUE


## Creating new columns and rows

In the previous section, we learned how to find the dimensions of our matrix to see if our tuition vector fits on our matrix. Once we confirm this fits, let's attach `tuition` to `uni_matrix`. To attach a vector to our matrix, we'll need to learn about how to add new columns & rows.<br>

Let's first look at adding new columns:

![new-cols-and-rows1](https://s3.amazonaws.com/dq-content/182/add_rows_v2.svg)

To add the `tuition` column, you'll use the `cbind()` function using the following format: `matrix <- cbind(matrix, new_column)`. Using `cbind()` with our `harv_stan` vector looks like:

```r
harv_stan <- cbind(harv_stan, tuition)
```
Now that we've added new columns, let's look at adding new rows:

![new-cols-and-rows2](https://s3.amazonaws.com/dq-content/182/add_cols.svg)

To add rows, you'll use the `rbind()`. The `rbind()` function follows the same format as `cbind()`, except we're adding a new row: `matrix <- rbind(matrix, new_row)`. Let's see this in action by adding a new row called `MIT` to the `harv_stan` vector:

```r
MIT <- c(3,3,2,2,1)
harv_stan <- rbind(harv_stan, MIT)
```

Let's try adding the full tuition vector to our original matrix!


* Add the tuition vector to uni_matrix:

```r
tuition <- c(43280,45000,45016,49350,28450,55161)
```

In [31]:
tuition <- c(43280,45000,45016,49350,28450,55161)

In [32]:
uni_matrix <- cbind(uni_matrix, tuition)

In [34]:
uni_matrix

Unnamed: 0,world_rank,quality_of_education,influence,broad_impact,patents,tuition
Harvard,1,1,1,1,3,43280
Stanford,2,9,3,4,10,45000
MIT,3,3,2,2,1,45016
Cambridge,4,2,6,13,48,49350
Oxford,5,7,12,9,15,28450
Columbia,6,13,13,12,4,55161


## Subsetting and Indexing a Matrix by Element

Now that we added labels to our matrix and added a new column, we can now pull individual values from our matrix. First, we'll learn how to select values from a matrix by individual values. Then we'll pull entire rows or columns.<br>

Let's look at `uni_matrix` again:

