# Who is Ryan Wade

Ryan is an experienced advanced data analytics professional. His education, technical skills, and business acumen enables him to understand things from a technical, analytical, and business viewpoint. He have the ability to present complex data in an intuitive way using sound analytical and visualization methods. He has an advanced understanding of **R**, **Python**, **DAX**, **SQL**, **VBA**, and **M**. He knows how to effectively leverage those programming languages for on-prem and cloud based data analytics solutions using the Microsoft Data Platform. He is also th author of ***Advanced Analytics in Power BI with R and Python***.

![Book](https://raw.githubusercontent.com/DieselAnalytics/TDWI2020_RForDataAnalysis/main/Images/BookCover.jpg)

# Session Overview

The R programming language is a domain specific programming language that is for data analytics. It has built-in features that facilitates all aspects. In addition to the built-in features, you have access to more than 17,000 packages via the ***CRAN*** package repository that makes performing tasks such as data visualization, data analysis, and machine learning relatively easy to do. 

The purpose of this workshop is to gentle introduce you to the R programming language. We will:

- Configure Your Environment to use R
- Install packages from **MRAN**
- Introduce the ***layered grammar of graphics*** concept
- How to install R packages
- Introduce R data structures
- Leverage the ***ggplot2*** package for **Data Visualization**
- Leverage the ***dplyr*** and ***tidyr*** packages for **Data Wrangling**
- Leverage base R for **Machine Learning**

As stated earlier, users of R has built-in functions that they can leverage in R. Given that, there is often many ways to skin the proverbial cat when performing a task in R. That causes R to be overwhelming to many new users. Fortunately, a meta-package named ***tidyverse*** was created that helps mitigate that problem. ***tidyverse*** is comprised of a group of packages that were built using a similar framework so they work well together. Those packages together makes doing data science in R much easier. This course will introduce you to R via ***tidyverse***. The packages that will be focused on are ***ggplot2*** for data visualization and also the ***dplyr***, ***readr***, and ***scales*** packages for data wrangling and data analysis.

# How to Configure Your Environment to use R

The R programming language is open source and is free to install and use. The most popular distribution of R is the ***CRAN*** distribution and is available on their site. The site hosts installers for the current and previous distribution of R. The CRAN distribution has a few limitations. One of the most notable ones is that it is single threaded. Fortunately, Microsoft solved that problem.

Microsoft created a distribution of R named ***Microsoft R Open***. ***Microsoft R Open (MRO)*** is an enhanced distribution of R. It is totally compatible with R. That means you can get all the functionality you have in base R plus some major improvements. Probably the most notable improvement is multi-threaded functionality via the ***Math Kernel Library (MKL)***. Pictured below is a visual that compares the R distribution from CRAN to ***MRO***:

![Benchmark](https://raw.githubusercontent.com/DieselAnalytics/TDWI2020_RForDataAnalysis/main/Images/Benchmark.png)

Like the distribution of R from CRAN, Microsoft has made the MRO distribution free. You can download the distribution free of charge from this site:  https://mran.microsoft.com/download.

After you install MRO, you are able to do your R development via the R console. That experience is very limiting and not ideal. A better option would be to use an IDE that is optimized for development. The best IDE for R development is ***R Studio*** and it is freely available at https://rstudio.com/products/rstudio/download/.

Make sure to download ***MRO*** before you install ***R Studio***. When you launch R for the first time you will notice that it will detect the distribution of R you have installed on your machine. Note that you can have multiple versions and distributions of R installed on your machine. To keep things simple, it is recommended that you just have one.


# How to Install R Packages

Packages are easy to install in R. You can easy install packages in R using the ***install.packages()*** function. You have the option to install one or more packages at a time using this function. To install one package you wrap the name in double or single quotes then it to the ***install.packages()*** function. So, you can install the R package named ***MASS*** with the following code:  `install.pacckages("MASS")`. You also have the option to install multiple packages at once. Below is the code snippet you would use to install the following R packages at once; ***MASS***, ***ggthemes***, and ***Metrics***:  

`
pkgs <- c("tidyverse","MASS", "ggthemes", "Metrics")  
install.packages(pkgs)
`

The majority of the tasks in this workshop will be handled using a R meta-package named ***tidyverse***. ***tidyverse*** is a meta-package comprised of a bunch of R packages that were designed to work together to make doing data science in R easier. You can install them all the packages that make up ***tidyverse*** at once by passing ***tidyverse*** to ***install.packages***. Doing so will install the packages listed below:

### Core tidyverse packages
- ggplot2 - data visualization
- dplyr - data manipulation
- tidyr - data tidying
- readr - data import
- purrr - functional programming
- tibble - tibbles, a modern re-imagining of data frames
- stringr - string manipulation
- forcats - working with factors

### Non-Core tidyverse packages
- hms - working with times
- lubridate - workign with date/times
- feather - sharing with Python and other languages
- haven - importing files from SPSS, SAS and Stata files
- httr - working with web apis
- jsonlite - working with JSON
- readxl - working with MS Excel files
- rvest - for web scraping
- xml2 - for working with XML
- modelr - for modeling within a pipeline
- broom - for turning models into tidy data

### Additional Packages
- ggthemes
- MASS
- ROCR

# How to use R Projects

R projects keeps all of the resources for the project in one location. It also sets the working directory for R to the location of the R Project folder. To create a R project from scratch:

1. Make sure to have a folder in your file system named ***R Projects***. This will be the root location where you will warehouse your R projects.    
2. Launch ***R Studio***
3. Click ***File*** > ***New Project***
4. After a short moment a dialogue box will appear like the one shown below:

   ![image](https://raw.githubusercontent.com/DieselAnalytics/TDWI2020_RForDataAnalysis/main/Images/RStudioProjects.png)

   Select ***New Directory*** > ***New Project***
5. Set the subdirectory to the ***R Projects*** folder referenced in ***Step 1***. Put the name that you want to use for the project in the ***Directory name:*** textbox.
6. Click ***Create Project***

A couple of assets were created from the above actions;  a file with a ***Rproj*** extension and a ***\.Rproj.user*** folder. Both items holds information needed by R and should not be edited.

The ***R Project*** featured offered in ***R Studio*** should be used to warehouse your data and code assets related to a project when working with R. ***R Projects*** help you keep the assets self contained in one location making it easy to programmatically access the resources needed when you are doing your analysis. 