# Solutions of R Exercises


## Exercise 1 Basic Routines
1. Create a vector `r` with integers from 1 to 10
2. Access and display the
    1. first element
    2. the fifth element 
    3. elements at positions 6-9
3. Assume that the elements of vector `r` are radiuses of circles. 
    1. Create a vector `c`, whose elements are the circumferences of circles with radiuses from `r`.
    2. Create a vector `a`, whose elements are the surface areas of circles with radiuses from `r`.
    
   Apply `help()` and `apropos()` in order to find out how to use $\pi$ in R.
  
    

In [1]:
(r<-1:10)

In [2]:
(c<-2*pi*r)

In [11]:
(a<-pi*r**2)

## Exercise 2 Data Types
1. At time $t=0$ a vehicle has speed $v_0$ (in $m/s$). The vehicle's accelaration is $a$ (in $m/s^2$). Then the traveled distance $s$ (in meters) after an arbitrary time $t$ (in seconds) can be calculated as 
$$s=\frac{1}{2}a t^2 +v_0 t.$$ 
In a lab experiment after time $t_1=2s$ the traveled distance is measured to be $s_1=16m$ and after time $t_2=4s$ the traveled distance is measured to be $s_2=44m$. Calculate the vehicle's accelaration $a$ and initial speed $v_0$ by solving the corresponding system of linear equations in R.
2. Assign the constant `letters` to the variable `x` and `letters` in reverse order to the variable `y`. What is the result of `x > y`? Repeat this experiment but now `letters` and the reverse ordering of `letters` shall be represented as *factors*. What is now the result of `x > y`?
3. A company likes to store the following data of it's employees:

    1. ID
    2. Name
    3. Age
    4. Salary
    
   For each employee this data shall be stored in a R - list. Create such lists for 4 arbitrary sample persons and assign all of these lists to another list `employeelist`. 
   
   1. Access and display all data of a single employee.  
   2. Access and display the salary of a single employee.
   3. Define a list for a new person and insert this new list at the third position of the `employeelist`.
   4. Remove the list of an arbitrary person from `employeelist`.

4. Read the [energy data file](./data/EnergyMixGeoClust.csv) into a dataframe `energyData` like in the lecture notebook. 
    1. Determine the number of observations (rows) and features (columns) of this dataframe.
    2. Create a dataframe `energyDataRed`, which contains all data of `energyData`, except the 4 countries with the highest coal-consumption.
    3. Calculate the mean and the median of the coal-consumption for both dataframes `energyData` and `energyDataRed`.
    4. What do you conclude from this experiment regarding the quality of the statistics *mean* and *median*?

Task 1 solution:

In [19]:
(t<-c(2,4))
(s<-c(16,44))

In [20]:
(S<-matrix(s,nrow=2,ncol=1))

0
16
44


In [21]:
(tSer<-c(0.5*t[1]**2,0.5*t[2]**2,t[1],t[2]))

In [22]:
(tMat<-matrix(tSer,nrow=2,ncol=2))

0,1
2,2
8,4


In [23]:
(av<-solve(tMat,S))

0
3
5


The first component of vector av is the accelaration $a=3 m/s^2$, the second component is the initial speed $v_0=5 m/s$

Task 2 solution:

In [3]:
(x<-letters)
print(class(x))
(y<-rev(x))
(x>y)

[1] "character"


In [5]:
(xfac<-factor(x))
(yfac<-factor(y))

In [6]:
xfac>yfac

"'>' not meaningful for factors"

Task 3 solution:

In [42]:
e1<-list(id=1,name="bob",age=23,salary=45600)
e2<-list(id=2,name="mary",age=29,salary=67800)
e3<-list(id=3,name="tim",age=39,salary=98000)
e4<-list(id=4,name="anne",age=25,salary=31400)
employees<-list(e1,e2,e3,e4)
employees

In [43]:
#employees[[3]]
e5<-list(id=5,name="ian",age=45,salary=110000)
employees<-append(employees,list(e5),2)
employees

In [44]:
employees[[5]]<-NULL
employees

Task 4 Solution:

In [48]:
energyData=read.csv(file="./data/EnergyMixGeoClust.csv", header=TRUE, sep=",",row.names=1)
numObs<-dim(energyData)[1]
numFeats<-dim(energyData)[2]
cat("Number of observations: ",numObs)
cat("\nNumber of features: ",numFeats)

Number of observations:  65
Number of features:  11

In [49]:
sortEnergyData<-energyData[order(energyData$Coal),]
sortEnergyData

Unnamed: 0,Country,Oil,Gas,Coal,Nuclear,Hydro,Total2009,CO2Emm,Lat,Long,Cluster
7,Ecuador,9.9,0.4,0.0,0.0,2.1,12.4,31.3,-1.831239,-78.183406,5
9,Venezuela,27.4,26.8,0.0,0.0,19.5,73.6,147.0,6.423750,-66.589730,5
11,Azerbaijan,2.8,6.9,0.0,0.0,0.5,10.2,24.8,40.143105,47.576927,4
12,Belarus,9.3,14.5,0.0,0.0,0.0,23.9,62.9,53.709807,27.953389,4
38,Turkmenistan,5.2,17.8,0.0,0.0,0.0,23.0,57.9,38.969719,59.556278,4
43,Kuwait,19.2,12.1,0.0,0.0,0.0,31.3,87.2,29.311660,47.481766,6
44,Qatar,8.2,19.0,0.0,0.0,0.0,27.2,69.8,25.354826,51.183884,4
45,Saudi_Arabia,121.8,69.7,0.0,0.0,0.0,191.5,537.6,23.885942,45.079162,6
46,United_Arab_Emirates,21.8,53.2,0.0,0.0,0.0,75.0,191.9,23.424076,53.847818,4
61,Singapore,52.1,8.7,0.0,0.0,0.0,60.8,180.2,1.352083,103.819836,6


In [50]:
sortEnergyDataRed<-sortEnergyData[1:(numObs-4),]
sortEnergyDataRed
dim(sortEnergyDataRed)

Unnamed: 0,Country,Oil,Gas,Coal,Nuclear,Hydro,Total2009,CO2Emm,Lat,Long,Cluster
7,Ecuador,9.9,0.4,0.0,0.0,2.1,12.4,31.3,-1.831239,-78.183406,5
9,Venezuela,27.4,26.8,0.0,0.0,19.5,73.6,147.0,6.423750,-66.589730,5
11,Azerbaijan,2.8,6.9,0.0,0.0,0.5,10.2,24.8,40.143105,47.576927,4
12,Belarus,9.3,14.5,0.0,0.0,0.0,23.9,62.9,53.709807,27.953389,4
38,Turkmenistan,5.2,17.8,0.0,0.0,0.0,23.0,57.9,38.969719,59.556278,4
43,Kuwait,19.2,12.1,0.0,0.0,0.0,31.3,87.2,29.311660,47.481766,6
44,Qatar,8.2,19.0,0.0,0.0,0.0,27.2,69.8,25.354826,51.183884,4
45,Saudi_Arabia,121.8,69.7,0.0,0.0,0.0,191.5,537.6,23.885942,45.079162,6
46,United_Arab_Emirates,21.8,53.2,0.0,0.0,0.0,75.0,191.9,23.424076,53.847818,4
61,Singapore,52.1,8.7,0.0,0.0,0.0,60.8,180.2,1.352083,103.819836,6


In [52]:
cat("Mean value of coal consumption all: ",mean(sortEnergyData$Coal))
cat("\nMean value of coal consumption reduced: ",mean(sortEnergyDataRed$Coal))

Mean value of coal consumption all:  49.44769
Mean value of coal consumption reduced:  13.50984

In [53]:
cat("Median value of coal consumption all: ",median(sortEnergyData$Coal))
cat("\nMedian value of coal consumption reduced: ",median(sortEnergyDataRed$Coal))

Median value of coal consumption all:  4.1
Median value of coal consumption reduced:  4

As can be seen in this experiment, the median is a much more robust statistic than the arithmetic mean value.

### Next Task

In [8]:
shop1Sales<-c(235000,278234,567890,456890,345123,398000)
shop2Sales<-c(335300,468000,588810,745895,666123,503980)

In [16]:
salesList<-list(shop1=shop1Sales,shop2=shop2Sales)
salesList

In [17]:
salesDF<-stack(salesList)
salesDF

values,ind
235000,shop1
278234,shop1
567890,shop1
456890,shop1
345123,shop1
398000,shop1
335300,shop2
468000,shop2
588810,shop2
745895,shop2


In [37]:
table(salesDF)

        ind
values   shop1 shop2
  235000     1     0
  278234     1     0
  335300     0     1
  345123     1     0
  398000     1     0
  456890     1     0
  468000     0     1
  503980     0     1
  567890     1     0
  588810     0     1
  666123     0     1
  745895     0     1

In [34]:
class(sales)
class(sales['values'])

In [35]:
aov(values ~ ind, data = sales)

"argument is not numeric or logical: returning NA"

[1] NA