# Introduction to R

<img src="https://i.imgur.com/HeSzToG.png" width=400 align="center">

# Contents
1. CRISP DM
2. What is R used for? 
3. Downloading Anaconda 
4. IDEs for R 
5. Basics of Jupyter Notebook
6. File Types 
7. Data Terminology
8. Installing and adding packages 
9. Importing Data 
10. Displaying Data 

## 1. CRISP DM 


<img src='https://st3.ning.com/topology/rest/1.0/file/get/2808314343?profile=original' width=400 align = 'center'>

<p style="text-align:center;"><font size="2"> <i>A useful codification of the data mining process is given by the Cross Industry Standard Process for Data Mining
(CRISP-DM; Shearer, 2000)</i> </font> </p>

- <b>Business Understanding</b> <br> 
 - Framing a business problem in terms of expected value can allow us to systematically decompose it into data mining tasks. 
- <b>Data Understanding</b> <br> 
 - Understanding the strengths and limitations of the data because there is rarely an exact match with the problem <br>
 - Purchasing or obtaining data <br>
 - Identifying and understanding variables to use 
- <b>Data Preparation</b> <br> 
 - Removing missing values
 - Sorting, replacing, filtering, grouping, mapping data
 - Adding variables from other datasets 
- <b>Modeling</b> <br>
 - Machine Learning
- <b>Evaluation</b> <br> 
 - Assessing the data mining results rigorously to gain confidence that they are valid and reliable
- <b>Deployment</b> <br> 
 - Implementing the model into the business process

In [None]:
library(datasets)
library(help = "datasets")

In [None]:
str(iris)

# 2. What is R used for? 

In [None]:
# Notes 
# Data cleaning 
# Modeling 
# Comparison to Python 


#  3. Downloading Anaconda

- Downloading Anaconda <br> 
https://www.anaconda.com/distribution/ <br>
- Adding R into Anaconda <br> 
https://docs.anaconda.com/anaconda/navigator/tutorials/r-lang/

# 4. Comparison between Python & R

| Parameter | R | Python | 
|------|:------|:------|
| Objective | Data Analysis and Statistical Modeling | Data Science, Web Development, Embedded Systems |
| Workability | Consist of many easy to use packages | Can easily perform matrix computation as well as optimization | 
| Integration | Locally run programs | Programs integrated with web-app for easy deployment | 
| Database Handling Capacity | Poses problem for handling large dataset | Can handle large data easily without any fault | 
| IDE | RStudio, R GUI | Spyder, IPython, Juypter Notebook | 
| Essential Packages and library | ggplot2, tidyverse, caret | Numpy, pandas, scipy, scikit-learn, Tensorflow | 

# 5. Basics of Juypter Notebook

- <b>Restarting Kernel </b> 
 - Kernel > Restart 
- <b>Running cells </b>
 - Cell > Run All 
- <b>Code vs. Markdown </b>
 - Markdown has its roots in HTML 
 - Markdown is for formatting and commenting 

| Shortcut | Description |
|------|:------|
| Tab | Code Completion |
| a | Insert cell above |
| b | Insert cell below |
| Enter | Edit cell |
| d + d | Delete cell | 
| Shift + Enter | Run individual cell | 
| Esc + M | Change to Markdown | 
| Ctrl + Shift + ← | Highlight group of words | 
| Ctrl + Backspace | Delete word by word | 
| Ctrl + / | Make comment |

In [None]:
# Test box 

# Header 1 
## Header 2 
### Header 3 
#### Header 4 
<b> bold text </b> <br> 
<i> italicized text </i> <br> 
<br> newline <br>
- bullet point <br> 

<p> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. In ante metus dictum at tempor commodo ullamcorper a lacus. Lorem mollis aliquam ut porttitor leo a diam. Neque aliquam vestibulum morbi blandit cursus. Nunc faucibus a pellentesque sit amet porttitor. Id semper risus in hendrerit gravida rutrum quisque non tellus. Proin nibh nisl condimentum id venenatis a condimentum vitae sapien. Aliquet nec ullamcorper sit amet risus nullam eget. Augue eget arcu dictum varius. Donec ac odio tempor orci dapibus ultrices in iaculis. Diam sit amet nisl suscipit adipiscing bibendum est ultricies integer. </p> <p> Leo duis ut diam quam nulla. Tellus at urna condimentum mattis pellentesque id nibh. Proin fermentum leo vel orci. Mi sit amet mauris commodo quis imperdiet massa tincidunt nunc. Venenatis cras sed felis eget velit aliquet sagittis id consectetur. Fringilla urna porttitor rhoncus dolor purus non enim praesent. Consectetur purus ut faucibus pulvinar elementum integer enim neque. Ullamcorper malesuada proin libero nunc consequat interdum varius sit amet. Nibh praesent tristique magna sit amet purus. Egestas maecenas pharetra convallis posuere. Sed faucibus turpis in eu mi bibendum neque egestas. Pharetra diam sit amet nisl suscipit. Sed pulvinar proin gravida hendrerit. Risus at ultrices mi tempus imperdiet nulla malesuada pellentesque elit. Malesuada bibendum arcu vitae elementum curabitur. Cursus mattis molestie a iaculis at erat pellentesque. Pellentesque sit amet porttitor eget dolor morbi non. Lacus suspendisse faucibus interdum posuere lorem ipsum dolor sit. Tempor nec feugiat nisl pretium fusce id velit. Viverra justo nec ultrices dui sapien eget mi proin. </p> 

## Exercise 1: Using Markdown 

Format the below text using Markdown 

In [None]:
RStudio is an integrated development environment (IDE) that streamlines the R programming workflow into an easy to read layout. RStudio also includes useful tools (referred to as packages) for data manipulation (dplyr), cleaning (tidyr), visualizations (ggplot2), report writing (rmarkdown & knitr), and publishing to the web (shiny & ggviz).

# 6. File Types

- comma separated values (.csv)
- rdata. 
- Microsoft Excel Open XML Spreadsheet (.xlsx)

# 7. Data Terminology

https://swcarpentry.github.io/r-novice-inflammation/13-supp-data-structures/

- Data > Dataset > Database 
- Vector > List > DataFrame  
- Observations & Variables
- Levels 
- Libraries / Packages 

## Exercise 2: Creating vectors 

Create a vector with 2 elements A and B and store in ```grade``` 

In [None]:
# Answer here
# After part 10 then back here

# 8. Installing and adding packages

In [None]:
install.packages('something')

In [None]:
library(something)

## Exercise 3: Importing packages 

Import dplyr, tidyr, ggplot 

In [None]:
# Answer here

In [None]:
library(dplyr)

# 9. Reading documentation

https://www.rdocumentation.org/packages/readxl/versions/1.3.1/topics/read_excel

- Default values 
- Checking the package name 
- Importing package

In [None]:
# Import relevant package here 

# 10. Importing Data 

| Function | What it does | Example | 
|------|:------|:------|
| read.csv() | Read csv file | read.csv('states.csv') |
| read_excel() | Read excel file | read_excel('filename.xlsx', sheet = 'Sheet 2') | 

## Exercise 4: Import csv file

Import ```loans-25k.csv``` and store in ```loans_df```

In [None]:
# Explain location and backslash 
# Store in variable 

# Answer here

Display the head of ```loans_df```

## Exercise 5: Import excel file 

In [None]:
# Answer here 

# 11. Displaying Data

Checking data type of each column and shape of column

In [None]:
# str

Better version of str() from dplyr

In [None]:
# glimpse

Summary of dataset 

In [None]:
# summary

Checking shape of dataframe

In [None]:
# dim

Showing dataset

Showing full dataset

In [None]:
# Set max no. to be high
options(max.print=25000) 
print.data.frame(iris)

Showing first few values of dataset

In [None]:
# head and tail

Looking at range of data 

In [None]:
# max

# min

# unique

# n_distinct

## Exercise 6: Looking at loans dataframe

In [None]:
# head(loans_df)

a. Find the range of annual income

In [None]:
# Answer here 

b. Find the unique labels for ```purpose``` variable

In [None]:
# Answer here 

c. Get the summary of ```loans_df```

In [None]:
# Answer here 

d. Get the total number of observations and variables 

In [None]:
# Answer here 

e. How many grades are there in total? 

In [None]:
# Answer here 

## [Back to top](#Contents) 