# R Foundamentals

**Table of Contents**
- [Syntax and Data Types](#chapter1)
- [Control Flow and Functions](#chapter2)
- [Data Visualization and EDA](#chapter3)

<a id='chapter1'></a>
## Syntax and Data Types

- [I. Chracters](#character)
- [II. Numerics](#numeric)
- [III. Logicals](#logic)
- [IV. Data Type Conversion](#conversion)
- [V. Vectors](#vector)
- [VI. Seq](#seq)
- [VII. Rep](#rep)
- [VIII. Any, All, Which](#any_all_which)
- [IX. `subset()` 按条件切片](#subset)
- [X. 数组与切片](#matrix_cut)
- [XI. 数组运算](#matrix_calc)
- [XII. DataFrame](#df)

<a id='character'></a>
**I. Characters**

In [22]:
class("Hello")

In [23]:
class("world")

In [24]:
class('')

<a id='numeric'></a>
**II. Numerics**

In [19]:
class(1.2)

In [20]:
class(5)

In [21]:
class(-5.5)

<a id='logic'></a>
**III. Logicals**

In [25]:
class(TRUE)

In [26]:
class(FALSE)

In [27]:
class(True)

ERROR: Error in eval(expr, envir, enclos): object 'True' not found


<a id='conversion'></a>
**IV. Data Type Conversion**

In [29]:
as.logical(0)

In [32]:
# 非零数转换成布尔值均为TRUE
as.logical(5)

In [33]:
as.logical('hello')

In [37]:
a = "1.2"
class(a)

In [38]:
b = as.integer(a)
class(b)

In [39]:
as.integer(2.2)

<a id='vector'></a>
**V. Vectors**

syntax:

```
vector(mode=, length=)
```

In [40]:
vector('logical', 2)

In [41]:
vector('numeric', 3)

In [42]:
vector('character', 5)

In [45]:
c('blue', 'green', 'brown', 'brown')

In [46]:
c(67, 74, 63, 'yes')

<a id='seq'></a>
**VI. seq**

In [43]:
seq(1, 10, 2)

利用`length.out`参数，等分两个数

In [49]:
# 类似python中的np.linspace()
seq(1, 10, length.out=8)

In [52]:
seq(0, 1, length.out=11)

<a id='rep'></a>
**VII. rep**

In [44]:
c(rep(2, 3), rep(1, 2))

In [50]:
vector('numeric', 5)

<a id='any_all_which'></a>
**VIII. `any`, `all` and `which`**

In [54]:
# any
heights = c(67, 74, 63, 70)
any(heights>70)

In [55]:
# all
heights = c(67, 74, 63, 70)
all(heights>65)

In [58]:
# which
heights = c(67, 74, 63, 70)
which(heights>65)

In [59]:
# 取反操作
heights = c(67, 74, 63, 70)
which(!heights>65)

<a id='subset'></a>
**IX. `subset()` 按条件切片**

In [62]:
# four people with 1st person eye and height in index 1
# 2nd person has eye and height in index 2
# and so on...

eye.colors = c('blue', 'green', 'brown', 'brown')
heights = c(67, 74, 63, 70)
subset(eye.colors, heights > 70)

<a id='matrix_cut'></a>
**X. 数组与切片**

In [63]:
my.array = array(seq(1,20,1), dim=c(4,5))
my.array

0,1,2,3,4
1,5,9,13,17
2,6,10,14,18
3,7,11,15,19
4,8,12,16,20


In [64]:
my.array[4, 2] # 4th row, 2nd column

In [65]:
my.array[1,] # first row

In [66]:
my.array[,2] # second column

In [67]:
my.array[c(1,3), c(4,5)]

0,1
13,17
15,19


In [68]:
my.array[1:3, 3:5]

0,1,2
9,13,17
10,14,18
11,15,19


In [69]:
my.array[-1,]

0,1,2,3,4
2,6,10,14,18
3,7,11,15,19
4,8,12,16,20


In [73]:
w = seq(0, 1, 0.1)
subset(w, w<.5)

In [75]:
# 与subset()等效
w[w<.5]

In [74]:
which(w<.5)

In [76]:
# zero out components
w[w<.5]=0
w

<a id='matrix_calc'></a>
**XI. 数组运算**

- [基于元素的算数运算](#array1)
- [转置 `t()`](#array2)
- [内积（点乘）`%*%`](#array3)
- [行拼接和列拼接 `rbind` & `cbind()`](#array4)

<a id='array1'></a>
**基于元素的算数运算**

**Example 1**

In [77]:
my.array = array(seq(1,4,1), dim=c(2,2))
my.array

0,1
1,3
2,4


In [78]:
my.array + 10

0,1
11,13
12,14


**Example 2**

In [109]:
y = array(c(1,2,5,6), dim=c(2,2))
y

0,1
1,5
2,6


In [110]:
2 * y + 1

0,1
3,11
5,13


In [111]:
y %*% y

0,1
11,35
14,46


In [114]:
# test
1*1 + 5*2 
1*5 + 5*6
2*6 + 1*2
2*5 + 6*6

<a id='array2'></a>
**转置 `t()`**

In [80]:
my.array1 = array(c(2, -10, 18, 4, 5, -7, -1, 11, 6), dim=c(3,3))
my.array1

0,1,2
2,4,-1
-10,5,11
18,-7,6


In [81]:
t(my.array1)

0,1,2
2,-10,18
4,5,-7
-1,11,6


<a id='array3'></a>
**内积（点乘）`%*%`**

___NOTICE:___
1. The number of columns in the first matrix must equal the number of columns in the second matrix.


2. The value in the final matrix for any position is equal to the dot product of the row in the first matrix with the column in the second matrix. 

**Example 1**

In [83]:
my.array1 = array(c(1, 3, 2, 4), dim=c(2,2))
my.array2 = array(c(5, 0, 6, 7), dim=c(2,2))
my.array1
my.array2

0,1
1,2
3,4


0,1
5,6
0,7


In [84]:
my.array1 %*% my.array2

0,1
5,20
15,46


**Example 2**

In [115]:
x <- array(seq(1, 20), dim=c(4, 5))
x

0,1,2,3,4
1,5,9,13,17
2,6,10,14,18
3,7,11,15,19
4,8,12,16,20


In [116]:
x[1, ] %*% x[1, ]

0
565


In [117]:
# test
1*1 + 5*5 + 9*9 + 13*13 + 17*17

In [118]:
# 转置
t(x)

0,1,2,3
1,2,3,4
5,6,7,8
9,10,11,12
13,14,15,16
17,18,19,20


<a id='array4'></a>
**行拼接和列拼接 `rbind` & `cbind()`**

**Example 1**

In [87]:
x = 1:3
y = 10:12
x
y

In [88]:
cbind(x, y)

x,y
1,10
2,11
3,12


In [89]:
rbind(x, y)

0,1,2,3
x,1,2,3
y,10,11,12


**Example 2**

In [92]:
array1 <- array(c(1, -3, 0, 2), dim=c(2,2))
array1

0,1
1,0
-3,2


In [93]:
array1[1, ] # pulls the first row

In [94]:
array1[, 1]

In [96]:
cbind(array1[ ,1], array1[1, ])

0,1
1,1
-3,0


In [97]:
rbind(array1[ ,1], array1[1, ])

0,1
1,-3
1,0


<a id='df'></a>
**XII. DataFrame**

**Commonly Used Commands:**
1. `head(iris, 7)` - returns top 7 rows of the dataframe


2. `tail(iris, 3)` - returns last 3 rows of the dataframe


3. `dim(iris)` - returns the number of rows and columns in the dataframe


4. `summary(iris)` - returns all the summary statistics for every column in the dataframe


5. `iris$Sepal.Length` - returns the Sepal.Length column (we can do this with any column and the `$`)


6. `min(iris$Sepal.Length)` - returns the min value in the `Sepal.Length` column
    
    There are similar functions `mean()`, `max()`, `median()`, `sd()`, etc.


7. `subset(iris, Species=='virginica' & Petal.Length > 5)` - returns rows meeting these criteria


8. `names(iris)` - show column names


9. `colMeans(iris[, 1:4])` - Means of all four numeric columns

In [127]:
head(iris)

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa


In [128]:
tail(iris)

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
145,6.7,3.3,5.7,2.5,virginica
146,6.7,3.0,5.2,2.3,virginica
147,6.3,2.5,5.0,1.9,virginica
148,6.5,3.0,5.2,2.0,virginica
149,6.2,3.4,5.4,2.3,virginica
150,5.9,3.0,5.1,1.8,virginica


In [129]:
head(iris, 10)

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa
4.6,3.4,1.4,0.3,setosa
5.0,3.4,1.5,0.2,setosa
4.4,2.9,1.4,0.2,setosa
4.9,3.1,1.5,0.1,setosa


In [130]:
names(iris)

In [131]:
iris$Sepal.Width

In [132]:
mean(iris$Sepal.Width)

In [133]:
median(iris$Sepal.Width)

In [134]:
sd(iris$Sepal.Width)

In [135]:
virginicas = subset(iris, Species=='virginica')

In [136]:
dim(virginicas)

In [137]:
dim(iris)

In [149]:
subset(iris, Species=='virginica' & Petal.Length > 5)

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
101,6.3,3.3,6.0,2.5,virginica
102,5.8,2.7,5.1,1.9,virginica
103,7.1,3.0,5.9,2.1,virginica
104,6.3,2.9,5.6,1.8,virginica
105,6.5,3.0,5.8,2.2,virginica
106,7.6,3.0,6.6,2.1,virginica
108,7.3,2.9,6.3,1.8,virginica
109,6.7,2.5,5.8,1.8,virginica
110,7.2,3.6,6.1,2.5,virginica
111,6.5,3.2,5.1,2.0,virginica


In [139]:
dim(subset(iris, Species=='virginica' & Petal.Length > 5))

In [140]:
summary(iris)

  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                

In [141]:
iris[1, 3:4]

Petal.Length,Petal.Width
1.4,0.2


In [150]:
names(iris)

In [151]:
class(iris)

In [152]:
colMeans(iris[, 1:4])

In [153]:
iris[c(1,3,5), c(3,4)]

Unnamed: 0,Petal.Length,Petal.Width
1,1.4,0.2
3,1.3,0.2
5,1.4,0.2


<a id='chapter2'></a>
## Control Flow and Functions

- [If/Else Statement](#if_else)
- [Loops](#loop)
- [Functions](#function)

<a id='if_else'></a>
### if ... else statement\

**Example 1**

In [181]:
a = 10; b = 10; c = 1
if (a < b) {
    d = 1
} else if (a == b) {
    d = 2
} else {
    d = 3
}

print(d)

[1] 2


**Example 2**

Suppose you are heating the oven before bakery. The temperature of the oven must be set at 425 degrees. If it's below 425 degrees, it will tell you `oven is not hot enough`; if it is above 425 degrees, it will tell you `oven is too hot`; it will tell you `oven is just right` exactly when it is right just 425 degrees. 

In [189]:
oven.temp = 425
is.it.ready.result1 = "oven is just right"
is.it.ready.result2 = "oven is too hot"
is.it.ready.result3 = "oven is not hot enough"

if (oven.temp == 425){
    print(is.it.ready.result1)
} else if (oven.temp > 425) {
    print(is.it.ready.result2)
} else {
    print(is.it.ready.result3)
}

[1] "oven is just right"


**Example 3**

Using the `where.are.you` variable: if this variable is `"here"`, print `"Come on in"`. If the variable is `"on my way"`, print `"Hurry Up!"`. If the variable is `"I got lost"`, print `"Oh no! You're lost?!"`.

In [191]:
where.are.you1 = 'here'
where.are.you2 = 'on my way'
where.are.you3 = 'I got lost'

result1 = "Come on in"
result2 = 'Hurry Up!'
result3 = "Oh no! You're lost?!"

where.are.you = 'here'

if (where.are.you == where.are.you1){
    print(result1)
} else if (where.are.you == where.are.you2) {
    print(result2)
} else {
    print(result3)
}

[1] "Come on in"


**Example 4**

Use modulars to print if a number stored in the variable `y` is divisible by 4.  For example, imagine `y = 12`. Since 12 is divisible by 4, you would print `"y is divisible by 4"`. If instead `y=5`, 5 is not divisible by 4 without a remainder, you would print `"y is not divisible by 4"`.

In [193]:
# print if y is divisible by 4 or not
y = 20  # some number

# write your solution code here:
res1 = 'y is divisible by 4'
res2 = 'y is not divisible by 4'


if (y %% 4 == 0) {
    print(res1)
} else {
    print(res2)
}

[1] "y is divisible by 4"


<a id='loop'></a>
### Loops

<a id='function'></a>
### Functions

<a id='chapter3'></a>
## Data Visualization and EDA