# Exercises on Basic Routines in R
* Author: Johannes Maucher
* Last Update: 12.09.2017, a few modifications by OK in 2019
* Corresponding lecture notebook: [02DataTypes](../01Basics/02DataTypes.ipynb)

## Solve the tasks ...

Your solution should contain 
* the implemented code in code-cells, 
* the output of this code
* answers on questions in mark-down cells
* and optionally your remarks, discussion, comments on the solution in markdown-cells.

Send me the resulting Jupyter notebook.

## Tasks
1. At time $t=0$ a vehicle has speed $v_0$ (in $m/s$). The vehicle's accelaration is $a$ (in $m/s^2$). Then the traveled distance $s$ (in meters) after an arbitrary time $t$ (in seconds) can be calculated as 
$$s=\frac{1}{2}a t^2 +v_0 t.$$ 
In a lab experiment after time $t_1=2s$ the traveled distance is measured to be $s_1=16m$ and after time $t_2=4s$ the traveled distance is measured to be $s_2=44m$. Calculate the vehicle's accelaration $a$ and initial speed $v_0$ by solving the corresponding system of linear equations in R.


2. Assign the constant `letters` to the variable `x` and `letters` in reverse order to the variable `y`. What is the result of `x > y`? Repeat this experiment but now `letters` and the reverse ordering of `letters` shall be represented as *factors*. What is now the result of `x > y`?


3. A company likes to store the following data of it's employees:
    1. ID
    2. Name
    3. Age
    4. Salary
    
   For each employee this data shall be stored in a R - list. Create such lists for 4 arbitrary sample persons and assign all of these lists to another list `employeelist`. 
   1. Access and display all data of a single employee.  
   2. Access and display the salary of a single employee.
   3. Define a list for a new person and insert this new list at the third position of the `employeelist`.
   4. Remove the list of an arbitrary person from `employeelist`.


4. Read the [energy data file](../Lecture/data/EnergyMixGeoClust.csv) into a dataframe `energyData` like in the lecture notebook. 
    1. Determine the number of observations (rows) and features (columns) of this dataframe.
    2. Create a dataframe `energyDataRed`, which contains all data of `energyData`, except the 4 countries with the highest coal-consumption.
    3. Calculate the mean and the median of the coal-consumption for both dataframes `energyData` and `energyDataRed`.
    4. What do you conclude from this experiment regarding the quality of the statistics *mean* and *median*?
    
To order rows you can use the function `arrange()` - see [arrange()](https://dplyr.tidyverse.org/reference/arrange.html) and [examples](https://r4ds.had.co.nz/transform.html#arrange-rows-with-arrange) from **tidyverse** (more exactly from the dplyr package)

In [100]:
library(tidyverse)

Task 1: Calculate the vehicle's accelaration  𝑎  and initial speed  𝑣0  by solving the corresponding system of linear equations in R.

In [101]:
(C <- matrix(c(1, 1,
               2, 1), 
                  nrow=2, ncol=2, byrow = TRUE))

a <- c(16, 44)
as.data.frame(a)  #different display

x <- solve(C, a)
as.data.frame(x)  #different display



0,1
1,1
2,1


a
16
44


x
28
-12


The vehicle's accelaration 𝑎 is 28 m/s^2 and initial speed 𝑣0 is -12 m/s

Task 2: Assign the constant letters to the variable x and letters in reverse order to the variable y. What is the result of x > y? Repeat this experiment but now letters and the reverse ordering of letters shall be represented as factors. What is now the result of x > y?

In [102]:
x <- 'letters'
y <- 'srettel'

x>y

xFact <- factor(x) #convert vector into factor
yFact <- factor(y) #convert vector into factor

xFact>yFact

"'>' not meaningful for factors"

x>y solange sie Konstanten repräsentieren. '>' macht für Faktoren keinen Sinn, da sie kategorische Werte repäsentieren.

Task 3:Create such lists for 4 arbitrary sample persons and assign all of these lists to another list employeelist.

In [104]:
person1 <- list(ID=0, name='Aaron', age=22, salary=2100)
person2 <- list(ID=1, name='Barek', age=23, salary=2200)
person3 <- list(ID=2, name='Carmen', age=24, salary=2300)
person4 <- list(ID=3, name='Derek', age=25, salary=2400)

employeelist <- list (person1, person2, person3, person4)
employeelist

Task 3A: Access and display all data of a single employee.

In [105]:
employeelist[1]

Task 3B: Access and display the salary of a single employee.

In [106]:
employeelist[[1]]$salary

Task 3C: Define a list for a new person and insert this new list at the third position of the employeelist

In [107]:
person5 <- list(ID=4, name='Enar', age=26, salary=2500)
employeelist <- append(employeelist, list(person5), after=2)
employeelist


Task 3D: Remove the list of an arbitrary person from employeelist.


In [108]:
employeelist[3] <- NULL
employeelist

Task4: Read the energy data file into a dataframe energyData like in the lecture notebook.

In [135]:
energyData <- read.csv(file="../data/EnergyMixGeoClust.csv", header=TRUE, 
                       sep=",",row.names=1)
glimpse(energyData)

Observations: 65
Variables: 11
$ Country   <fct> US, Canada, Mexico, Argentina, Brazil, Chile, Colombia, E...
$ Oil       <dbl> 842.9, 97.0, 85.6, 22.3, 104.3, 15.4, 8.8, 9.9, 8.5, 27.4...
$ Gas       <dbl> 588.7, 85.2, 62.7, 38.8, 18.3, 3.0, 7.8, 0.4, 3.1, 26.8, ...
$ Coal      <dbl> 498.0, 26.5, 6.8, 1.1, 11.7, 4.1, 3.1, 0.0, 0.5, 0.0, 2.3...
$ Nuclear   <dbl> 190.2, 20.3, 2.2, 1.8, 2.9, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...
$ Hydro     <dbl> 62.2, 90.2, 6.0, 9.2, 88.5, 5.6, 9.3, 2.1, 4.5, 19.5, 8.3...
$ Total2009 <dbl> 2182.0, 319.2, 163.2, 73.3, 225.7, 28.1, 29.0, 12.4, 16.6...
$ CO2Emm    <dbl> 5941.9, 602.7, 436.8, 164.2, 409.4, 70.3, 57.9, 31.3, 35....
$ Lat       <dbl> 37.090240, 56.130366, 23.634501, -38.416097, -14.235004, ...
$ Long      <dbl> -95.712891, -106.346771, -102.552784, -63.616672, -51.925...
$ Cluster   <int> 6, 5, 6, 4, 5, 5, 5, 5, 5, 5, 5, 4, 4, 6, 2, 2, 6, 6, 3, ...


Task 4A: Determine the number of observations (rows) and features (columns) of this dataframe.

In [136]:
print(paste0("number of observations (rows) : ", nrow(energyData)))
print(paste0("number of features (columns) : ", ncol(energyData)))

[1] "number of observations (rows) : 65"
[1] "number of features (columns) : 11"


Task 4B: Create a dataframe energyDataRed, which contains all data of energyData, except the 4 countries with the highest coal-consumption.


In [137]:
#sort by Coal
energyDataRed <- energyData[order(energyData$Coal),]
#reverse Order
energyDataRed <-energyDataRed %>% map_df(rev)
#drop first 4 elements
energyDataRed <- energyDataRed[-c(1, 2, 3, 4), ] 
print(paste0("number of observations energyDataRed(rows) : ", nrow(energyDataRed)))
print(paste0("number of features energyDataRed(columns) : ", ncol(energyDataRed)))
glimpse(energyDataRed)

[1] "number of observations energyDataRed(rows) : 61"
[1] "number of features energyDataRed(columns) : 11"
Observations: 61
Variables: 11
$ Country   <fct> South_Africa, Russian_Federation, Germany, South_Korea, P...
$ Oil       <dbl> 24.3, 124.9, 113.9, 104.3, 25.5, 42.7, 46.6, 14.1, 12.0, ...
$ Gas       <dbl> 0.0, 350.7, 70.2, 30.4, 12.3, 23.1, 10.2, 42.3, 17.7, 33....
$ Coal      <dbl> 99.4, 82.9, 71.0, 68.6, 53.9, 50.8, 38.7, 35.0, 33.0, 30....
$ Nuclear   <dbl> 2.7, 37.0, 30.5, 33.4, 0.0, 0.0, 9.4, 18.6, 0.0, 0.0, 15....
$ Hydro     <dbl> 0.2, 39.8, 4.2, 0.7, 0.7, 2.6, 0.8, 2.7, 1.7, 2.7, 1.2, 8...
$ Total2009 <dbl> 126.8, 635.3, 289.8, 237.5, 92.3, 119.2, 105.7, 112.5, 64...
$ CO2Emm    <dbl> 468.6, 1535.3, 795.6, 663.3, 320.4, 386.6, 320.3, 280.8, ...
$ Lat       <dbl> -30.559482, 61.524010, 51.165691, 35.907757, 51.919438, -...
$ Long      <dbl> 22.937506, 105.318756, 10.451526, 127.766922, 19.145136, ...
$ Cluster   <int> 2, 4, 6, 6, 2, 2, 6, 4, 2, 6, 4, 2, 5, 2, 6, 6, 5, 6, 

Task 4C: Calculate the mean and the median of the coal-consumption for both dataframes energyData and energyDataRed.

In [138]:
# Get Median of the column by column name
print(paste0("Median of energyDataRed : ", median(energyDataRed$Coal)))
print(paste0("Median of energyData : ", median(energyData$Coal)))

# Get Mean of the column by column name
print(paste0("Mean of energyDataRed : ", mean(energyDataRed$Coal)))
print(paste0("Mean of energyData : ", mean(energyData$Coal)))

[1] "Median of energyDataRed : 4"
[1] "Median of energyData : 4.1"
[1] "Mean of energyDataRed : 13.5098360655738"
[1] "Mean of energyData : 49.4476923076923"


Task 4D: What do you conclude from this experiment regarding the quality of the statistics mean and median?
To order rows you can use the function arrange() - see arrange() and examples from tidyverse (more exactly from the dplyr package)

In [140]:
energyData %>% arrange(desc(Coal))

Country,Oil,Gas,Coal,Nuclear,Hydro,Total2009,CO2Emm,Lat,Long,Cluster
China,404.6,79.8,1537.4,15.9,139.3,2177.0,7518.5,35.861660,104.195397,2
US,842.9,588.7,498.0,190.2,62.2,2182.0,5941.9,37.090240,-95.712891,6
India,148.5,46.7,245.8,3.8,24.0,468.9,1539.1,20.593684,78.962880,2
Japan,197.6,78.7,108.8,62.1,16.7,463.9,1222.1,36.204824,138.252924,6
South_Africa,24.3,0.0,99.4,2.7,0.2,126.8,468.6,-30.559482,22.937506,2
Russian_Federation,124.9,350.7,82.9,37.0,39.8,635.3,1535.3,61.524010,105.318756,4
Germany,113.9,70.2,71.0,30.5,4.2,289.8,795.6,51.165691,10.451526,6
South_Korea,104.3,30.4,68.6,33.4,0.7,237.5,663.3,35.907757,127.766922,6
Poland,25.5,12.3,53.9,0.0,0.7,92.3,320.4,51.919438,19.145136,2
Australia,42.7,23.1,50.8,0.0,2.6,119.2,386.6,-25.274398,133.775136,2


Das Entfernen der vier größten Kohlekonsumenten führt zu einer Veränderung des Mean um kleiner -70%, während der Median fast unverändert bleibt. Entsprechend ist davon auszugehen, dass die vier größten Kohlekonsumenten einen großen Teil des gesamten Kohleknsums ausmachen. Das Dataframe sortiert nach dem Kohlekonsum bestätigt das.