**SELECTED EXERCISES FOR A REVIEW BEFORE THE FINAL EXAM**

# split()

## Quick word count

Let's have a sentence with many repeated words:

In [None]:
sentence <- "Peter Piper picked a peck of pickled peppers A peck of pickled peppers Peter Piper picked If Peter Piper picked a peck of pickled peppers Where is the peck of pickled peppers Peter Piper picked?"

In [None]:
sentence

Let's first change case to lower, and split into words:

In [None]:
words <- unlist(strsplit(tolower(sentence), split = " "))

In [None]:
str(words)

The easiest way to create a wordlist is to use the split() function:

In [None]:
words_list <- split(1:length(words), words)

In [None]:
words_list

## iris data frame

First split the data frame into a list across species:

In [None]:
iris_split <- split(iris[,-5], iris[,5])
iris_split

In [None]:
str(iris_split)

## countries by income group

In [None]:
load("~/file/weo_subset2.RData")

Now let's see which countries are in which group:

In [None]:
with(weo_subset2, split(Country, income))

# rowSums(), colSums()

## Functions on matrices

In [None]:
m <- matrix( 1:9, nrow=3 )
m 

In [None]:
rowSums(m)

In [None]:
rowMeans(m)

In [None]:
colSums(m)

In [None]:
colMeans(m)

# apply()

## \*plying through matrices

Let's first create a random matrix:

In [None]:
mat_r <- matrix(sample(20), nrow = 4)
mat_r

### applying on row margin

Let's get the row minimums:

In [None]:
apply(mat_r, 1, min)

And get the minimum and maximums in one time:

In [None]:
apply(mat_r, 1, function(x) c(min(x), max(x)))

### applying on column margin

Let's get the col minimums:

In [None]:
apply(mat_r, 2, min)

And minimum and maximums in one time:

In [None]:
apply(mat_r, 2, function(x) c(min(x), max(x)))

## NEIGHBOUR COUNTS

In [None]:
load("~/file/weo_subset2.RData")

Now, use the same distance object.

Write a function named **neigh_count** that takes two arguments:

- radius
- dist with a default value of distance object

The function should return the count of neighbouring cities of each city within a radius, sorted descending

Hint: Use apply function

In [None]:
neigh_count <- function(radius, dist = distance2)
{
    counts <- apply(dist, 1, function(x) sum(x <= radius) - 1)
    return(sort(counts, decreasing = T))
}
                    
neigh_count(100)

# which.min()

<a id='travelsales'></a>
## TRAVELLING SALESMAN (also check for "while()")

We will implement a simple "greedy" solution to travelling salesman problem: The shortest route to take to visit all cities in a path

We will compare the solution to a brute force approach (trying all permutations) and see whether the fast, simple approach is as good as the slow brute force approach

Of course, we will take advantage of the expressiveness of R language: Do much with little

First let's download the distance matrix for 81 city centers again: We have a 81x81 matrix of bird fly distances in km's between 81 province centers in Turkey. To retrieve this matrix please follow the link below to download the file distance.RData:

[distance2.RData](../file/distance2.RData)

Load the data:

In [None]:
load("~/file/distance2.RData")

See what is inside:

In [None]:
distance2[1:5, 1:5]

Now let's have a sample of the cities:

In [None]:
cities <- c("istanbul", "adana", "ankara", "van", "mugla", "artvin")
#cities <- c("istanbul", "adana", "ankara", "van", "mugla", "artvin", "kayseri", "usak", "erzincan")
#cities <- c("istanbul", "adana", "ankara", "van", "mugla", "artvin", "kayseri", "usak", "erzincan", "sinop")
cities

**Exercise 1:**

Write a function shortest() using the below template (just fill in ...'s)

The logic is:

- We select a starting position out of the cities
- We keep track of visited and unvisited cities and the last visited location
- In each iteration, we select the next city with the shortest distance to the current one
- We continue until we run out of unvisited cities
- We report both the distance and the vector of cities in visited order as a list

```R

shortest <- function(path = cities, start = "istanbul", dist = distance2)
{
    dist2 <- distance2[cities, cities] # subset the larger matrix so that we deal with a smaller one
    diag(dist2) <- Inf # toggle the diagonal to Inf so that min() does not return 0's

    # we keep track of two city vectors: path is for unvisited cities, path2 is for visited ones
    path <- setdiff(path, start) # delete the starting city
    path2 <- start # append the start city to visited

    location <- start # current location is the start city
    distance <- 0 # initiate the cumulative distance
    
    while(length(path) > 0) # as long as unvisited cities exist
    {
        rowx <- ... # get the row for current location, columns for unvisited cities
        nextind <- ... # get the index of the minimum distance to next city, use which.min
        distance <- ... # add the minimum distance to the cumulative
        location <- ... # get the next location
        path <- ... # delete the next city from unvisited ones, use setdiff
        path2 <- ... # append the next city to visited ones
    }
    
    return(list(..., ...)) # report the cumulative distance and cities in visited order as a list
}

```

**Solution:**

In [None]:
shortest <- function(path = cities, start = "istanbul", dist = distance2)
{
    dist2 <- distance2[cities, cities] # subset the larger matrix so that we deal with a smaller one
    diag(dist2) <- Inf # toggle the diagonal to Inf so that min() does not return 0's

    # we keep track of two city vectors: path is for unvisited cities, path2 is for visited ones
    path <- setdiff(path, start) # delete the starting city
    path2 <- start # append the start city to visited

    location <- start # current location is the start city
    distance <- 0 # initiate the cumulative distance
    
    while(length(path) > 0) # as long as unvisited cities exist
    {
        rowx <- dist2[location,path] # get the row for current location, columns for unvisited cities
        nextind <- which.min(rowx) # get the index of the minimum distance to next city
        distance <- distance + min(rowx) # add the minimum distance to the cumulative
        location <- path[nextind] # get the next location
        path <- setdiff(path, location) # delete the next city from unvisited ones
        path2 <- c(path2, location) # append the next city to visited ones
    }
    
    return(list(distance, path2)) # report the cumulative distance and cities in visited order as a list
}

Now let's check for different starting cities:

In [None]:
shortest(start = "istanbul")
shortest(start = "ankara")

Note that, total distance differs for the starting city. So for the optimal solution, we should also find the optimal starting city, along with the optimal path for a given starting city

**Exercise 2:**

What if we try the function for all possible starting cities as such (fill in ... part):

```R
alternatives <- lapply(cities, function(x) ...)
```

**Solution 2:**

In [None]:
alternatives <- lapply(cities, function(x) shortest(cities, x))

Now we list the optimal paths for all alternative starting cities:

In [None]:
alternatives

**Exercise 3:**

Now let's choose the shortest one programmatically as such:

```R
dist1 <- sapply(alternatives, ...)
alternatives[[...]]

[[1]]
[1] 2219

[[2]]
[1] "mugla"    "istanbul" "ankara"   "adana"    "van"      "artvin"
```

**Solution 3:**

In [None]:
dist1 <- sapply(alternatives, "[[", 1)
dist1
alternatives[[which.min(dist1)]]

For a given vector of n cities, we just construct and test n permutations. Now let's compare with the brute force or exhaustive approach

# merge()

## Merge Operations

Let's first review four types of merges:

In [None]:
df_weights <- data.frame(names = c("ahmet", "ayse"), weights = c(75, 52))
df_weights

In [None]:
df_heights <- data.frame(names = c("ali", "ayse"), heights = c(176, 165))
df_heights

### Left (outer) join

Join the two data frames, on names fields, keeping all categories in the LEFT df:

In [None]:
merge(df_weights, df_heights, by.x = "names", by.y = "names", all.x = TRUE)

### Right (outer) join

Join the two data frames, on names fields, keeping all categories in the RIGHT df:

In [None]:
merge(df_weights, df_heights, by.x = "names", by.y = "names", all.y = TRUE)

### Full outer join

Join the two data frames, on names fields, keeping all categories in either df:

In [None]:
merge(df_weights, df_heights, by.x = "names", by.y = "names", all = TRUE)

### Inner join

Join the two data frames, on names fields, keeping only common categories in both df's:

In [None]:
merge(df_weights, df_heights, by.x = "names", by.y = "names", all = FALSE)

### Flights

Let's merge flights and dep_delay_date_origin2 data frames into flights2, so that we have the average departure delay for the date and origin airport pair for all rows:

If you don't have the dep_delay_date_origin2 data frame, you can load it as such:

In [None]:
flights <- read.csv("~/file/flights14.csv")

In [None]:
load("~/file/dep_delay_date_origin2.RData")

In [None]:
flights2 <- merge(flights,
                  dep_delay_date_origin2,
                  by.x = c("dates", "origin"),
                 by.y = c("dates", "origin"),
                 all = T)

In [None]:
head(flights2)

## merge types

Now let's create a small sample: just the first rows of each species:

In [None]:
sample1 <- aggregate(iris[,-5], by = list(iris[,5]), FUN = head, 1)
sample1

Let's delete the virginica row from sample1 

In [None]:
sample2 <- sample1[-3,]
sample2

And from the averages df, let's delete the setosa row:

In [None]:
averages2 <- averages[-1,]
averages2

### Left join

Join the sepal lengths on species, keep all species categories on the LEFT df:

In [None]:
merge(sample2[,1:2], averages2[,1:2], by = "Group.1", all.x = T)

### Right join

Now keep all species on the RIGHT df:

In [None]:
merge(sample2[,1:2], averages2[,1:2], by = "Group.1", all.y = T)

### Full outer join

Take the union of species on either df:

In [None]:
merge(sample2[,1:2], averages2[,1:2], by = "Group.1", all = T)

### Inner join

Keep only the common species:

In [None]:
merge(sample2[,1:2], averages2[,1:2], by = "Group.1", all = F)

## Exercise 2: Merge

You can load the necessary objects if you couldn't follow the steps up to now:

In [None]:
load("~/file/weo_subset2.RData")
load("~/file/gdp_agg.RData")

Now, based on the common column "income", merge data frames weo_subset2 and gdp_agg into weo_merged DF so that the median growth of the respective income group can be tracked along for all countries

**Solution:**

In [None]:
weo_merged <- merge(weo_subset2, gdp_agg, by = "income")
weo_merged

## MERGE LEFT

Let's have two data frames as such:

```R
RNGversion("3.3.1")
set.seed(40)
select <- sample(letters, 15)
select1 <- sample(select, 10)
select2 <- sample(select, 10)
datf1 <- data.frame(label = select1, data1 = round(rnorm(10, 5, 3), 1))
datf2 <- data.frame(label = select2, data2 = round(rnorm(10, 5, 3), 1))

datf1

   label data1
1  k     1.2  
2  j     7.0  
3  p     3.5  
4  v     2.2  
5  y     4.3  
6  e     2.5  
7  q     5.1  
8  r     5.6  
9  w     3.5  
10 b     6.1  
   label data2
1  w     7.8  
2  e     5.4  
3  s     3.0  
4  g     3.2  
5  k     5.1  
6  b     5.1  
7  q     3.3  
8  j     8.7  
9  r     7.5  
10 c     6.5 
```

Please write a function left(df1, df2) such that all categories in the label column of df1 are kept (whether they correspond to label categories in df2), so the function should make a left join as such:

```R
left(df1 = datf1, df2 = datf2)

   label data1 data2
1  b     6.1   5.1  
2  e     2.5   5.4  
3  j     7.0   8.7  
4  k     1.2   5.1  
5  p     3.5    NA  
6  q     5.1   3.3  
7  r     5.6   7.5  
8  v     2.2    NA  
9  w     3.5   7.8  
10 y     4.3    NA  
```

**Hint:** You should use the merge() function

**Solution:**

In [None]:
RNGversion("3.3.1")
set.seed(40)
select <- sample(letters, 15)
select1 <- sample(select, 10)
select2 <- sample(select, 10)
datf1 <- data.frame(label = select1, data1 = round(rnorm(10, 5, 3), 1))
datf2 <- data.frame(label = select2, data2 = round(rnorm(10, 5, 3), 1))
datf1
datf2

left <- function(df1, df2)
{
    merge(df1, df2, by = "label", all.x = T)
}

left(df1 = datf1, df2 = datf2)

## MERGE INNER

Let's have two data frames as such:

```R
RNGversion("3.3.1")
set.seed(40)
select <- sample(letters, 15)
select1 <- sample(select, 10)
select2 <- sample(select, 10)
datf1 <- data.frame(label = select1, data1 = round(rnorm(10, 5, 3), 1))
datf2 <- data.frame(label = select2, data2 = round(rnorm(10, 5, 3), 1))

datf1

   label data1
1  k     1.2  
2  j     7.0  
3  p     3.5  
4  v     2.2  
5  y     4.3  
6  e     2.5  
7  q     5.1  
8  r     5.6  
9  w     3.5  
10 b     6.1  

datf2

   label data2
1  w     7.8  
2  e     5.4  
3  s     3.0  
4  g     3.2  
5  k     5.1  
6  b     5.1  
7  q     3.3  
8  j     8.7  
9  r     7.5  
10 c     6.5  
```

Please write a function inner(df1, df2) such that only common label categories in both data frames are kept so the function should make an inner join as such:

```R
inner(df1 = datf1, df2 = datf2)

  label data1 data2
1 b     6.1   5.1  
2 e     2.5   5.4  
3 j     7.0   8.7  
4 k     1.2   5.1  
5 q     5.1   3.3  
6 r     5.6   7.5  
7 w     3.5   7.8  
```

**Hint:** You should use the merge() function

**Solution:**

In [None]:
RNGversion("3.3.1")
set.seed(40)
select <- sample(letters, 15)
select1 <- sample(select, 10)
select2 <- sample(select, 10)
datf1 <- data.frame(label = select1, data1 = round(rnorm(10, 5, 3), 1))
datf2 <- data.frame(label = select2, data2 = round(rnorm(10, 5, 3), 1))
datf1
datf2

inner <- function(df1, df2)
{
    merge(df1, df2, by = "label", all = F)
}

inner(df1 = datf1, df2 = datf2)

# aggregate()

## Summarizing a data frame with aggregate()

iris is a famous database and is a built-in one in R:

In [None]:
iris

Info on iris:

In [None]:
?iris

```
Format
iris is a data frame with 150 cases (rows) and 5 variables (columns) named Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species.

iris3 gives the same data arranged as a 3-dimensional array of size 50 by 4 by 3, as represented by S-PLUS. The first dimension gives the case number within the species subsample, the second the measurements with names Sepal L., Sepal W., Petal L., and Petal W., and the third the species.
```

See the unique values of species:

In [None]:
unique(iris$Species)

And let's get the average

- Sepal.Length
- Sepal.Width
- Petal.Length
- Petal.Width

values of each of the species

Now let's do this with the aggregate function:

In [None]:
?aggregate

```
## S3 method for class 'data.frame'
aggregate(x, by, FUN, ..., simplify = TRUE, drop = TRUE)

Arguments
x	
an R object.

by	
a list of grouping elements, each as long as the variables in the data frame x. The elements are coerced to factors before use.

FUN	
a function to compute the summary statistics which can be applied to all data subsets.

simplify	
a logical indicating whether results should be simplified to a vector or matrix if possible.

drop	
a logical indicating whether to drop unused combinations of grouping values. The non-default case drop=FALSE has been amended for R 3.5.0 to drop unused combinations.

formula	
a formula, such as y ~ x or cbind(y1, y2) ~ x1 + x2, where the y variables are numeric data to be split into groups according to the grouping x variables (usually factors).

data	
a data frame (or list) from which the variables in formula should be taken.
```

In [None]:
averages <- aggregate(iris[,-5], by = list(iris[,5]), FUN = mean)

In [None]:
averages

## Aggregate data

In [None]:
flights <- read.csv("~/file/flights14.csv")

Now the separate year, month and day information can be combined into a single united field:

In [None]:
flights$dates <- with(flights, as.Date(paste(year, month, day, sep = "-")))

In [None]:
names(flights)

In [None]:
head(flights)

And the weekdays:

In [None]:
flights$weekdays <- weekdays(flights$dates, abbreviate = T)

In [None]:
head(flights)

We can have the average departure delay by date as such:

In [None]:
dep_delay_date <- with(flights, aggregate(dep_delay,
                                          by = list(dates),
                                          FUN = mean))

View the data:

In [None]:
dep_delay_date

And better, visualize data:

In [None]:
with(dep_delay_date, plot(Group.1, x, type = "l"))

## Exercise 1: Aggregate data

You can load the necessary object if you couldn't follow the steps up to now:

In [None]:
load("~/file/weo_subset2.RData")

Remember the aggregate function:

```R
aggregate(x, by, FUN, ..., simplify = TRUE, drop = TRUE)
```

Now please get the **median** growth rate (NGDP_RPCH) for each income category and save into gdp_agg. The column names should be income and gdpg as such:

```R
  income gdpg 
1 low    3.969
2 middle 2.849
3 high   2.197
```

Note that by argument takes a list object. You may use with() and median() functions along with aggregate and olsa names() function

**Solution:**

In [None]:
aggregate(weo_subset2$NGDP_RPCH,
          by = list(weo_subset2$income),
                          FUN = median)

In [None]:
gdp_agg <- with(weo_subset2,
                aggregate(NGDP_RPCH,
                          by = list(income),
                          FUN = median))
names(gdp_agg) <- c("income", "gdpg")
gdp_agg

As you see, low income countries grow faster on the average than middle and high income countries

## GROUP DATA

Let's have a data.frame "cox" as such:

```R
cox <- CO2
RNGversion("3.3.1")
set.seed(20)
cox$uptake <- rnorm(nrow(cox),
                    mean = runif(1, 20, 100),
                    sd = runif(1, 2, 10))

head(cox)

  Plant Type   Treatment  conc uptake   
1 Qn1   Quebec nonchilled  95   85.42744
2 Qn1   Quebec nonchilled 175  104.75015
3 Qn1   Quebec nonchilled 250   79.34338
4 Qn1   Quebec nonchilled 350   86.56297
5 Qn1   Quebec nonchilled 500   94.84301
6 Qn1   Quebec nonchilled 675   66.65552
```

Write a function groups(dat) that takes a single argument dat, a data frame object, and returns the **median** **uptake** value of each respective **Plant** and **Treatment** category as such:

```R
groups(dat = cox)

   Group.1 Group.2    x       
1  Qn1     nonchilled 85.42744
2  Qn2     nonchilled 86.43963
3  Qn3     nonchilled 90.43168
4  Mn3     nonchilled 87.06613
5  Mn2     nonchilled 96.15445
6  Mn1     nonchilled 90.34985
7  Qc1     chilled    85.34404
8  Qc3     chilled    90.91757
9  Qc2     chilled    88.43915
10 Mc2     chilled    88.97322
11 Mc3     chilled    90.95314
12 Mc1     chilled    95.54233
```

**Hint:**
- You should use aggregate() function
- You may also use with() function to simplify the code (not necessary)
- Note that the "by" argument to aggregate(), a list object, should incorporate two categoric variables (Plant and Treatment), not a single one. So it should be a list of two!

**Solution:**

In [None]:
cox <- CO2
RNGversion("3.3.1")
set.seed(20)
cox$uptake <- rnorm(nrow(cox),
                    mean = runif(1, 20, 100),
                    sd = runif(1, 2, 10))

head(cox)

groups <- function(dat = cox)
{
    with(cox, aggregate(uptake,
                        by = list(Plant, Treatment),
                        median))
}

groups(dat = cox)

## GROUP DATA

Let's have a data.frame "cox" as such:

```R
cox <- CO2
RNGversion("3.3.1")
set.seed(20)
cox$uptake <- rnorm(nrow(cox),
                    mean = runif(1, 20, 100),
                    sd = runif(1, 2, 10))

head(cox)

  Plant Type   Treatment  conc uptake   
1 Qn1   Quebec nonchilled  95   85.42744
2 Qn1   Quebec nonchilled 175  104.75015
3 Qn1   Quebec nonchilled 250   79.34338
4 Qn1   Quebec nonchilled 350   86.56297
5 Qn1   Quebec nonchilled 500   94.84301
6 Qn1   Quebec nonchilled 675   66.65552
```

Write a function groups(dat) that takes a single argument dat, a data frame object, and returns the **mean** **uptake** value of each respective **Type** and **Treatment** category as such:

```R
groups(dat = cox)

  Group.1     Group.2    x       
1 Quebec      nonchilled 88.32927
2 Mississippi nonchilled 91.59208
3 Quebec      chilled    88.13191
4 Mississippi chilled    90.62369
```

**Hint:**
- You should use aggregate() function
- You may also use with() function to simplify the code (not necessary)
- Note that the "by" argument to aggregate(), a list object, should incorporate two categoric variables (Type and Treatment), not a single one. So it should be a list of two!

**Solution:**

In [None]:
cox <- CO2
RNGversion("3.3.1")
set.seed(20)
cox$uptake <- rnorm(nrow(cox),
                    mean = runif(1, 20, 100),
                    sd = runif(1, 2, 10))

head(cox)

groups <- function(dat = cox)
{
    with(cox, aggregate(uptake,
                        by = list(Type, Treatment),
                        mean))
}

groups(dat = cox)

# which()

## EXCLUDE MISSING CASES

And read the data into R as such:

In [None]:
weo_data <- read.csv("~/file/weo_2016_wide_2.csv")
weo_desc <- read.csv("~/file/weo_description.csv")

Let's first subset the relavant columns:

In [None]:
weo_subset <- weo_data[,c("Country", "NGDP_RPCH", "PPPPC")]
weo_subset

Now we will exclude rows with missing information. complete.cases and na.omit will do that

In [None]:
missing <- which(!(complete.cases(weo_subset)))
missing

## PLATES

Load the data object to your R environment as such:

```R
load("~/file/distance2.RData")
```

Check that you have the distance2 object now:

```R
distance2[c("istanbul", "ankara", "izmir"), c("istanbul", "ankara", "izmir")]

         istanbul ankara izmir
istanbul   0      349    327  
ankara   349        0    520  
izmir    327      520      0   
```

The numbers show the birdfly distances between the cities in the rows and columns (e.g the distance between İstanbul and Ankara is 349 km). The distance between a city and itself is 0 of course!

Please write a function **plates** that takes two arguments:
- **cities** is a vector of city names (each name inside quotes)
- **dist** is a distance matrix and the default value is our distance2 object

The function should return the plate codes of the cities as such:

```R
plates(cities = c("istanbul", "izmir", "ankara"))

istanbul    izmir   ankara 
      34       35        6 

plates(c("adana"), dist = distance2)

adana 
    1 

plates(c("edirne", "ardahan")) 

edirne ardahan 
    22      75 
```

**Note that, the returned vector has the city names as attributes along with the plate code values. Your code should also return exactly the same for the given argument values**


**Hints:**
- You may use which() function and the fact that the distance of a city to itself is 0
- You can initiate an empty vector object and then construct a for loop across the city names to accumulate the values in that vector object
- distance2 is the global object, dist is the argument to the function. distance2 is passed to the function as the default value of dist argument
- You can also use the apply() function alternatively, without a loop. However the looped version is easier for you to construct
- All city names in row and column names are lower case and feature only ascii characters such as corum, sanliurfa, agri and gumushane.

**Solution:**

In [None]:
load("~/file/distance2.RData")
distance2[c("istanbul", "ankara", "izmir"), c("istanbul", "ankara", "izmir")]

plates <- function(cities, dist = distance2)
{
    plate <- NULL
    for (i in cities)
    {
        plate <- c(plate, which(dist[i,] == 0))
    }
    
    return(plate)
}

plates(cities = c("istanbul", "izmir", "ankara"))
plates(c("adana"), dist = distance2)
plates(c("edirne", "ardahan"))
plates(c('bursa', 'izmir'))

In [None]:
apply(distance2[c("istanbul", "izmir", "ankara"),], 1, function(x) which(x == 0))

# cumsum()

## get sums and cumulative sums with sum(), cumsum()

Create a vector of values as a sequence from 1 to 10 and save into seq10 vector

In [None]:
seq10 <- 1:10
seq10

Now get the sum of the values in the vector:

In [None]:
sum(seq10)

See the sum of a vector yielded a vector of a single value

What if want to get a cumulative sum of values: For each value n, the sum of all terms starting from the first term to nth term: 

In [None]:
seq10c <- cumsum(seq10)
seq10c

# if()

## Joint conditions

Let's write a function that accepts three inputs val, range_min, range_max. The function will return whether the value is within the range range_min to range_max

In [None]:
is_within <- function(val, range_min, range_max)
{
    if (val >= range_min & val <= range_max)
    {
        return(T)
    }
    else
    {
        return(F)
    }
}

In [None]:
is_within(4, 1, 10)

In [None]:
is_within(12, 1, 10)

Note that, range_min should be smaller than range_max in this simple example. The code cannot handle otherwise

Now we will test whether the value is outside the range

In [None]:
is_outside <- function(val, range_min, range_max)
{
    if (val < range_min | val > range_max)
    {
        return(T)
    }
    else
    {
        return(F)
    }
}

In [None]:
is_outside(4, 1, 10)

In [None]:
is_outside(12, 1, 10)

## Exercise

A famous anonymous Chinese, Arabic or Persian proverb:

```
He who knows not, and knows not that he knows not, is a fool... shun him.
He who knows not, and knows that he knows not, is willing... teach him.
He who knows, and knows not that he knows, is asleep... awaken him.
He who knows, and knows that he knows, is wise... follow him.
```

We will convert this proverb into a function called "is_wise". We have two inputs that takes "yes" or "no": know_real (for the firstpart), know_himself (for the second part).

The output will be a two item character vector as such: "fool", shun"

Use regular if condition not ifelse

You can follow any order provided that you cover all cases

**EXERCISE 2:**

In [None]:
is_wise <- function(know_real, know_himself)
{
    if (know_real == "no") # if1, "He who knows not"
    {
        if (know_himself == "no") # if2, "and knows not that he knows not" 
        {
            return(c("fool", "shun"))
        }
        else # else1, "and knows that he knows not"
        {
            return(c("willing", "teach"))
        }
    }
    else # else2, "He who knows"
    {
        if (know_himself == "no") # if3, "and knows not that he knows"
        {
            return(c("asleep", "awaken"))
        }
        else # else3, "and knows that he knows"
        {
            return(c("wise", "follow"))

        } # close else3
    } # close else2
}

In [None]:
is_wise(know_real = "no", know_himself = "no")

In [None]:
is_wise(know_real = "no", know_himself = "yes")

In [None]:
is_wise(know_real = "yes", know_himself = "no")

In [None]:
is_wise(know_real = "yes", know_himself = "yes")

# for()

## "for" loop: when you know the # of iterations

The basic syntax for a "for" loop is as follows:

```R
for(var in a_vector)
{
    some_code_that_uses_var
}
```

- Here "var" is a new variable created for the "for" loop
- a_vector is a vector that var iterates through

In each iteration of the loop, var takes the next value from the vector and the code inside the loop is executed again with the new value of var

### Iterate through a vector

Let's see in action:

First create a vector of random numbers and iterate through it to print its value

In [None]:
vec_1 <- sample(seq(-10,10), 10)

for (i in vec_1)
{
    print(i)
}

Let's put this code inside a function to step into using debug()

In [None]:
vec_1 <- sample(seq(-10,10), 10)


for_1 <- function(vecc = vec_1)
{

    for (i in vecc)
    {
        print(i)
    }

}

### Iterate through the indices of a vector

Let's say we have two vectors height and weight and iterate through both vectors simultaneously

Instead of iterating through the vectors themselves, we may iterate through indices:

In [None]:
set.seed(12345)
height1 <- rnorm(5, mean = 170, sd = 8)
weight1 <- rnorm(5, mean = 75, sd = 11)

height1
weight1

height_and_weight1 <- function(heights = height1, weights = weight1)
{

    indices <- seq_along(heights)
    
    for (i in indices)
    {
        text1 <- sprintf("For the person %s, height is %s, weight is %s",
               i,
               heights[i],
               weights[i])
        print(text1)
    }

}

height_and_weight1()

### Collect values by appending to a vector

Now instead of printing the values, let's create a character vector and append the text values

Return the vector

In [None]:
set.seed(12345)
height1 <- rnorm(5, mean = 170, sd = 8)
weight1 <- rnorm(5, mean = 75, sd = 11)

height1
weight1

height_and_weight2 <- function(heights = height1, weights = weight1)
{

    indices <- seq_along(heights)
    character_vector <- NULL
    
    for (i in indices)
    {
        text1 <- sprintf("For the person %s, height is %s, weight is %s",
               i,
               heights[i],
               weights[i])
        character_vector <- c(character_vector, text1)
    }

    return(character_vector)
    
}

height_and_weight2()

### Collect values by creating an empty vector of size n

This option is OK for lower count of iterations. As the number of iterations go up - such as 1e6 - it may perform poorly.

This is another option:

In [None]:
set.seed(12345)
height1 <- rnorm(5, mean = 170, sd = 8)
weight1 <- rnorm(5, mean = 75, sd = 11)

height1
weight1

height_and_weight3 <- function(heights = height1, weights = weight1)
{

    indices <- seq_along(heights)
    character_vector <- character(length = length(heights))
    
    for (i in indices)
    {
        text1 <- sprintf("For the person %s, height is %s, weight is %s",
               i,
               heights[i],
               weights[i])
        character_vector[i] <- text1
    }

    return(character_vector)
    
}

height_and_weight3()

### Conditional inside a for loop

Now let'say we throw a warning when the height or weight of the person is above average:

In [None]:
set.seed(12345)
height1 <- rnorm(5, mean = 170, sd = 8)

height1

height_and_average <- function(heights = height1)
{

    indices <- seq_along(heights)
    character_vector <- NULL
    av_height <- mean(heights)
    
    for (i in indices)
    {
        height_i <- heights[i]
        
        if (height_i > av_height)
        {
            text_x <- sprintf("Height of person %s of %s is above average", i, height_i)
            character_vector <- c(character_vector, text_x)
        }
        
    }

    return(character_vector)
    
}


height_and_average()

## Factorial with for

Factorial of n (n!) is the product of all integers from 1 to n. Hence: 5! = 120

Write a function facto_for that takes an argument n, calculates the factorial of n using a for loop as such:

```R
facto_for(5)
facto_for(10)

[1] 120
[1] 3628800
```

**Hint: Initiate a separate variable inside the function BEFORE the loop, setting it to 1. Update this variable by multiplying with the value that the iterator variable of the loop (such as "i") gets** 

**Solution 2:**

In [None]:
facto_for <- function(n)
{
    value <- 1
    for (i in 1:n)
    {
        value <- value * i
    }
    
    return(value)
}

facto_for(5)
facto_for(10)

## Divisors

Write a function divisors that takes an argument n, and collects and returns the divisors of the number

**Hint:**
- Initiate an empty vector (vec <- c() or vec <- integer(0) or vec <- NULL)
- Iterate through all numbers from 1 to n with a for loop
- Check the divisibility of n with the iterator using modulo and append the number if divisible
- Return the vector

as such:

```R
divisors(60)
divisors(23)
divisors(100)

[1]  1  2  3  4  5  6 10 12 15 20 30 60
[1]  1 23
[1]   1   2   4   5  10  20  25  50 100
```

**Solution 4:**

In [None]:
divisors <- function(n)
{
    vec <- integer(0)
    
    for (i in 1:n)
    {
        if (n %% i == 0)
        {
            vec <- c(vec, i)            
        }
    }
    
    return(vec)
}

divisors(60)
divisors(23)
divisors(100)

# while()

## "while loop": when you don't know the number of iterations

Let's say we are rolling two dices. We will continue until we have a "duses" or "6 + 6", the return value will be the count of iteration at that moment

**NOTE THAT ANY VARIABLES IN THE WHILE CONDITION MUST ALREADY HAVE BEEN CREATED!**

**IN FOR LOOP, THE ITERATOR VARIABLE IS CREATED AT THAT MOMENT**

In [None]:
duses_count <- function()
{
    dice_vals <- 1:6 # possible outcomes
    sum_val <- -Inf # initiate a sum value. anything != 12 is ok
    iter <- 0 # initiate an iterator to return
    
    # as long as sum is not 12, continue, note that sum_val is already initiated
    while (sum_val != 12)
    {
        iter <- iter + 1 # increment iteration
        new_roll <- sample(dice_vals, 2, replace = T) # roll the dices
        sum_val <- sum(new_roll) # get the sum
    }
    
    return_text <- sprintf("At iteration %s, we got duses", iter)
    return(return_text)
}

duses_count()

## BINARY

Write a function named **binary** that takes an argument n, an integer value, and returns the binary representation (base 2) as a vector of 1 and 0 values

- The function should initiate a vector with a single value of 1
- A while function should check whether n is above 1
- Inside the while, if n is even, append 0 to the vector and divide n by 2
- Otherwise, append 1 to the vector, decrement n and divide by 2

as such:


```R
binary(15)
binary(10)
binary(16)

[1] 1 1 1 1
[1] 1 0 1 0
[1] 1 0 0 0 0
```

**Note: In decimal (base 10), all digits are powers of 10 starting with 10<sup>0</sup> as the least significant (rightmost) digit. In binary all digits are powers of 2 starting with 2<sup>0</sup> as the least significant (rightmost) digit.**

**Solution 5:**

In [None]:
binary <- function(n)
{
    vec <- 1
    
    while(n > 1)
    {
        if (n %% 2 == 0)
        {
            vec <- c(vec, 0)
            n <- n / 2        
        }
        else
        {
            vec <- c(vec, 1)
            n <- (n - 1) / 2
        }        
    }
    
    return(vec)
}

binary(15)
binary(10)
binary(16)

## Reversing a number

Using simple arithmetic we can reverse the order of the digits of a number.

So 123 becomes 321.

Let's write a function for this:

First of all, remember modulo and floor divison operators:

In [None]:
123 %/% 10

In [None]:
123 %% 10

Now the function:

In [None]:
number_reverse <- function(num)
{
    rev_num <- 0 # create an object to collect reversed digits
    
    while(num > 0) # as long as we have more digits
    {    
        last_digit <- num %% 10 # extract the last digit
        rev_num <- rev_num * 10 + last_digit # update the reversed number
        num <- num %/% 10 # delete the rightmost digit from the original number
    }
    
    return(rev_num)

}

Now let's check:

In [None]:
number_reverse(123)

## TRAVELLING SALESMAN

[Travelling Salesman](#travelsales)

# par(mfrow, mfcol)

## Figure arrays

Sometimes we want several plots in one figure. We can achieve this with the `par()` function.

In [None]:
options(repr.plot.width=6, repr.plot.height=4)

In [None]:
normal1 <- rnorm(1000)
par(mfrow=c(1,2))
plot(normal1)
hist(normal1)

Here `mfrow=c(1,2)` specifies that the plots should be arranged as one row and two columns, and placement of figures should go by rows.

Alternatively, `mfcol` argument would force placement by columns. In this particular example, it gives an identical result.

Now generate normally-distributed random numbers with twice the variance and compare the plots.

In [None]:
options(repr.plot.width=8,repr.plot.height=8)

In [None]:
normal2 <- rnorm(1000, sd = 2)
par(mfrow=c(2,2))
plot(normal1)
hist(normal1)
plot(normal2)
hist(normal2)

Match the axis scales for better comparison.

In [None]:
par(mfrow=c(2,2))
plot(normal1, ylim = c(-6,6),pch=4, col="blue")
hist(normal1, xlim = c(-6,6), col="red")
plot(normal2, ylim = c(-6,6),pch=4, col="blue")
hist(normal2, xlim = c(-6,6), col="red")

# hist()

## Histograms

Create normal distributed numbers:

In [None]:
vec_norm <- rnorm(100, 10, 2)
vec_norm

Create a histogram:

In [None]:
hist(vec_norm)

Default breaks are 6 to 14 in wholenumbers

We may instruct to create fewer or more bins by bin count:

In [None]:
hist(vec_norm, 5)

In [None]:
hist(vec_norm, 20)

Or explicitly tell the cutting points of bins:

In [None]:
hist(vec_norm, seq(2, 20, by = 0.5))

# boxplot()

## Box plots

A _box-and-whisker_ plot provides a graphical summary of the distribution of data points.

Let's generate some random numbers and create box plots with them.

In [None]:
randnums <- rnorm(1000)

We can get some summary statistics about the data vector using the `summary()` function.

In [None]:
summary(randnums)

In [None]:
options(repr.plot.width=6,repr.plot.height=3)

The boxplot is a visual summary of the data:

In [None]:
options(repr.plot.width=3,repr.plot.height=5)
boxplot(randnums)