R provides an impressive toolset to support the data analysis workflow. These tools are typically functions grouped in packages. They are all part of the **tidyverse** collection of packages.

To install an R package, we use the function `install.packages()`.

`install.packages("readr")`

This function installs our desired packages from the [Comprehensive R Archive Network (CRAN) repository](https://cran.r-project.org/). Note that we need to wrap the package name in quotation marks like character literals.We only need to install a package once.

However, We need to load packages we want to work with when beginning a new session. To load packages, we use the function `library()`:

`library(readr)`

We do not need to surround package names with quotation marks to load them using the `library()` function.

In this file, the question we will answer is: **Since there are many data science job offers published daily, which ones can we focus on? How to filter the desirable ones?**.

To answer this question, we found a dataset representing one day of publication on [Monster website](https://www.monster.com/jobs) in The United States.

We extracted from this dataset, all calls that require data science and related skills.

This dataset contains 86 rows and eight columns.

![image.png](attachment:image.png)

The columns `job_id`, `salary_min`, and `salary_max` are of **Numeric** data type, and the rest of the columns are of **Character** data type.

The readr package is part of the tidyverse collection of packages. It is used to import a dataset in R. 

The readr package contains a function, `read_csv()`, that's specifically for importing data in CSV format into R.

When we use the `read_csv()` function, R returns a dataframe containing our dataset by guessing the data types of each column. A message is thereby displayed at the end of the process (as a warning) to let us know the data type chosen for each column. This message is not an error. It is worth reading this message to learn about the columns in the dataframe making sure that each column has the right type.

**Task**

* Use the `read_csv()` function to import the file `monster_jobs_clean.csv` into R.

**Answer**

`monster_jobs_clean <- read_csv("monster_jobs_clean.csv")`

In R, there is a function to perform each of these operations:

* To determine the number of rows, we use the `nrow()` function, which returns an integer number.
* To determine the number of columns, we use the `ncol()` function, which returns an integer number.
* To determine the column names, we use the `colnames()` function, which returns the names of the columns.

We use all these functions in the same way by providing the dataframe name as parameter to them. We can eventually store the outputs in variables.

`n_cols_clean <- ncol(monster_jobs_clean)
n_rows_clean <- nrow(monster_jobs_clean)
names_clean <- colnames(monster_jobs_clean)`

We can explore data by viewing the first six lines of the dataframe as a table using the `head()` function.

`monster_jobs_clean_head <- head(monster_jobs_clean)`

Similar to the function `head()`, we can use the function `tail()` to look at the last lines of a dataset.

We started by exploring our dataset by looking at the first and last lines of our dataset. This only gives us a partial view of our dataset. We can also visualize our dataset to have a global view and thus being able to answer our question.

To visualize our dataset we'll be using a function from the `ggplot2` package. Hence, we need to load it first.

`library(ggplot2)`

The `qplot()` function of `ggplot2` package allows representing graphs quickly. Basically, this function receives as parameters:

* The dataset column representing the x-axis (thanks to the parameter `x`).
* The dataset column representing the y-axis (thanks to the parameter `y`).
* The dataset itself (thanks to the parameter `data`).

**Task**

* Visualizing the maximum salary (`salary_max`) for all the job posts using the `qplot()` function.

**Answer**

`library(ggplot2)`

`salary_max_viz <- qplot(x = job_id,
      y = salary_max,
      data = monster_jobs_clean)`

`salary_max_viz`

let's differentiate these salaries by type of job. To do this we will use colors. To do so, we'll use the same function, `qplot()`, to add the color parameter for specifying how R should color our scatterplot.

The result is a scatterplot where a dot represents the maximum salary of a given job identifier with a color for each type of job.

`salary_max_with_color_viz <- qplot(x = job_id,
      y = salary_max,
      color = job_type,
      data = monster_jobs_clean)`

`salary_max_with_color_viz`