# Course Overview 

This course is an introduction to data analysis using the programming language **R** and the editor R-Studio.

**Objective of the course** 

Review basic methods and models from econometrics and implement them in **R**. 

**Goal: Being able to evaluate different methods based on their finite sample properties**

## Data

We will use (in most cases) simulated data:  

- allows us to control the data generating process (normality, homoskedasticity, linearity, etc.). 
- manipulate properties of the DGP to evaluate different estimators 

## Tools

To follow along you need to download the newest version of **R** and the editor **RStudio**

- https://cran.r-project.org
- https://rstudio.com/products/rstudio/

**An Example: Density estimation**

Suppose that $X \sim \mathcal{N}(0,1)$. Compare a randomly chosen sample from a normal distribution with the theoretical values of the normal distribution. 

In [None]:
N=10000# Number of observations in the draw
x<-rnorm(N, mean=0, sd=1)
z=seq(min(x),max(x),le=100)
hx<-dnorm(z)
plot(density(x),main="",ylim=c(0,0.45)) 
lines(z,hx, col="red")

**Let's get started!**

![image_google2.png](attachment:image_google2.png)

# R Introduction



# Operators

R can be used as a simple calculator

The basic arithmetic operations are 

+, -, *, /

1.8 not 1,8

Exponents are declared with ^, i.e. $2^3$ is written as  

2^3

For mathematical calculations, only parentheses can be used. ()
Frequently used mathematical functions are


- sqrt()   square root
- exp() exponential function
- log()  nat.  Logarithm
- log(...,10)  Logarithm with Base 10
- abs()  absolute value
- round(...,x)  rounding to x decimals.
- pi 
- exp(1)  Eulers number
- sin(),cos(),tan() trigonemetric functions
- min(), max() returns the lowest/highest value of a vector/matrix

- 1.8+2
- 1.8-2
- 1.8*2
- 1.8/2
- 2+2*3
- (2+2)*3
- 2^3
- 8^(1/3)
- 3^2
- 9^0.5
- 2^2*2+2
- 2^(2*((0.2+0.3)*(1+2)))+4
- sqrt(2)
- exp(1)
- exp(2)
- log(7.389056)
- log(exp(3))
- log(100,10)
- abs(1.8-2)
- round(sqrt(2),2)
- round(sqrt(2),4)
- pi
- sin(pi/2)
- sin(pi/2)
- sin(pi)
- tan(pi)
- x=2+3i
- y=4+1i
- x
- y
- x+y
- x*y

In [None]:
1.8+2
 1.8-2
 1.8*2
 1.8/2
 2+2*3
 (2+2)*3
 2^3
 8^(1/3)
 3^2
 9^0.5
 2^2*2+2
 2^(2*((0.2+0.3)*(1+2)))+4
 sqrt(2)
 exp(1)
 exp(2)
 log(7.389056)
 log(exp(3))
 log(100,10)
 abs(1.8-2)
 round(sqrt(2),2)
 round(sqrt(2),4)
 pi
 sin(pi/2)
 sin(pi/2)
 sin(pi)
 tan(pi)
 x=2+3i
 y=4+1i
 x
 y
 x+y
 x*y

**Attention:**

 1.224606e-16 is interpreted as $1.224606 \times 10^{-16}$. This is often exactly zero, but for internal memory purposes (R only has up to 16 decimals), can be a rounding mistake.

# Data structures: vectors, matrices, lists

## Numerical and non-numeric objects

Non-numeric data in R is often preceded by quotation marks "" or '' and can also be identified by these in the output.

Enter the following: 

- x=2
- x
- x+1
- x="2"
- x
- x+2
- x=z
- x="z"
- x
- x+2

In [None]:
x=2
x

In [None]:
x+1

In [None]:
x="2"
x


In [None]:
x+1

In [None]:
is.numeric(x)

In [None]:
x=2
x
x+1
x="2"
x
x+2
x=z
x="z"
x
x+2

**A note on good practices: Always comment your code!**


![image_commenting.jpg](attachment:image_commenting.jpg)

In [None]:
hello

In [None]:
# hello, this is my first comment

In [None]:
y=1#here i define y as 1

In [None]:
y

## Vectors

If you want to store several numbers or names (string) under an object name, the construction of a vector is a good idea. Vectors are created using the command c() where "c" stands for "combine". Compute the following commands for vectors:

In [None]:
a<-c(1,2,3)
a

In [None]:
b<-c(2,3,4)
b

In [None]:
a+b

In [None]:
a*b

In [None]:
a%*%b

In [None]:
b%*%a

In [None]:
v1<-c(a,b)
v1
v2<-c(1,2,3)
v2
v3=c(v1,4,5)
v3
v4=c(c(4,2,3),v3,c(3,8),1,2,c(v1,v2,v3))
v4

In [None]:
Name=c("Anton","Berta","Caesar","Dora","Emil")
Gender=c("m","w","m","w","m")
Height=c(182,174,189,165,180)
Weight=c(80,68,92,55,78)
Name 
Gender
Height
Weight

In [None]:
0:20

In [None]:
seq(2,30,by=4)

In [None]:
seq(2,30,le=7)

In [None]:
rep(1,3)

In [None]:
v1
rep(v1,each=3)

In practice, it often happens that the values of a data vector follow a particular basic pattern. For example, individual values follow each other at fixed intervals or they repeat continuously. Here the commands seq() and rep are very helpful. With the operator : shortened sequences with step size 1 can be generated. If the end value does not fit into the sequence of the generated sequence, the next lower (or next larger) value is aborted. Follow the following calculations

In [None]:
v1=seq(1,30,le=10)
v1


The individual components of a vector x are numbered from 1 to n, where n is the total number (vector length) of the components. The vector length can be queried with length (x). With the help of square brackets, specific parts of a vector can be accessed.

In [None]:
length(v1)

When performing calculations with vectors, it is important to note that a vector is commonly used
either with a scalar or with a vector of the same length. In the first
case, each individual component is linked to the scalar in the same way.
For example, the same value is added to each component. In the second case, computations are performed component-wise.
For example, all peer-to-peer components are added.
If two vectors of unequal length, each with at least 2 components, are added, a warning message is usually issued. R will still perform the operation, though!

In [None]:
a<-c(1,2,3,4,5)
b<-c(1,2,3)
a+b


How does this result come about? R proceeds here according to the principle of cyclic extension, i. with two objects of different lengths, the shorter object is extended to the length of the longer object. The existing components at the end of the vector are simply added again. This is repeated until the lengths of the two vectors correspond. In this example, c (1,2,3) thus again the numbers 1 and 2 attached, so that in effect the two vectors

In [None]:
s1<-c("hello","goodbye")
s1[1]

**R-Exercises**
Sequences of Numbers, Vectors




- Generate x=3 6 8
- Calculate $x/2$, $x^2$, $\sqrt{x}$.
- Display the second element of $x$ 
- Create a vector with the first and the third element of $x$.
- Remove the third elmement from $x$.
- Generate y= 2 5 1
- Calculate $x-y$, $x*y$ and x'%*%y and x%*%y'.
- Display elements of x, for which y is larger than 1.5.
- Element(s) of y for which the element in x is equal to 6.
- Generate a vector of integers from 4-10.
- Using seq and rep, (a) generate a vector ranging from 2-3 in 0.1 steps, (b) generate a vector that repeats the elements of $x$ 4 times, i.e. the first four elements of the new vector are four times the first element of $x$, etc.



## Matrices

In the following, we will discuss how matrices in R are constructed and how these can be worked with.

The basic command for constructing matrices is matrix()

In most cases, the input of three arguments is sufficient:

- a data vector from which a matrix is constructed,
- the number of rows,
- the number of columns.



As a rule, the product of row and column numbers should correspond to the vector length. If more numbers fit into the matrix than the vector components, the cyclic extension principle is again used. If the vector contains more components than in the matrix, the operation at the corresponding point is simply aborted. The matrix is filled from top to bottom and from left to right with the values of the vector. The filling principle can be changed using the options byrow=F or byrow=T. In the case of the second setting, the matrix would be filled line by line from left to right and from top to bottom. With the commands

rbind() and cbind() we can combine vectors into a matrix, either by ("c" for "column" or "r" for "row").

In [None]:
x<-(1:5)
a0<-sample(x,2)
a1<-sample(x,2)
A <-rbind(a0,a1)
A


In [None]:
a<-dim(A)
A[,1]

In [None]:
solve(A)

As with vectors, access to individual components is obtained by using square brackets. Two arguments are given:

- a scalar/vector for the row(s) of interest
- a scalar/vector for the collumns(s) of interest

A comma without a number value represents all rows or all column values. Negative signs cause all rows / columns except the specified ones to be returned. dim() queries the dimensions of a matrix (rows, columns).

With the command solve(), matrices can be inverted. In the mathematical sense, the multiplication of a matrix with its inverse results in the unit matrix.

As with vectors, matrices can be linked to each other elementally by a computational operation (by +,-,/ ). Alternatively, a matrix multiplication can be performed using %*%.  

**An example**
$$
 \left(\begin{matrix}1&3\\ 4&0 \end{matrix} \right)\left(\begin{matrix}1&3\\ 2&2 
\end{matrix} \right)=\left(\begin{matrix}1\cdot 1 +3 \cdot 2&1 \cdot 3+3 \cdot 2\\ 4\cdot 1+0\cdot 2&4\cdot 3+0\cdot 2 \end{matrix} \right)=\left(\begin{matrix}7&9\\ 4&12 \end{matrix} \right)
$$


**R-Exercises**

- Generate a matrix A


$$ \left(\begin{matrix}
 1   & 2 &   5\\
 4  &  7 &   3
 \end{matrix}\right)$$
  

- Check the dimensions of A.
- Generate a matrix B


$$ \left(\begin{matrix}
    1  &  4  &  2\\
    7  &  5   & 3
 \end{matrix}\right)$$
 
- Check the dimensions of B.
- Generate a square matrix C:
$$ \left(\begin{matrix}
    1   &-1\\
  -1  &  3
 \end{matrix}\right)$$
- Extract the diagonal elements of C.
- Calculate $(C'C)^{-1}$



## Lists 

Lists are the most general object form in R. A list is almost an object of objects. This means that within this data structure, objects of a very different kind and scale can be organized (numerically and non-numerically). These different objects then form the components of a list. Lists are very useful in functions, since functions can only output one object at the end. The basic command for constructing lists is list.

For lists, access to individual components via square brackets [[]]. The number of components in a list can be queried by length(). The access to components of list components is hierachical, i.e objects[[1]] [2] asks the second component of the first list component.

In [None]:
objects[[1]][2]
objects[[2]]
objects[[1]][2]

In [None]:
objects[[2]][,2]

In [None]:
names(objects)


In [None]:
objects=list(Greetings=v,Datamatrix=m)
objects$Greetings
objects$Datamatrix
names(objects)

## Data Frames
In R, data sets are usually stored as "data frames". These are similar to matrices but have certain additional properties. A matrix cannot contain both numeric and non-numeric content. The basic command is data.frame(), then the names of the vectors are entered by comma:

In [None]:
Name 
Gender
Height
Weight
#cbind(Name, Gender, Height, Weight)
Persons=data.frame(Name, Gender, Height, Weight)
summary(Persons)
Persons[1,2:3]
Persons$Name



Usually the calculations of the summary() function are adapted to the type of the vector. Component access to a data frame can be performed as with a list or a matrix.

In [None]:
Persons$Gender

If we only want to view/process part of the data frame, we can use subset().

## Working with logical operators 

TRUE and FALSE

These types of values are generated in the context of logical operations. These are computing operations which check whether individual data fulfill a particular property. The numerical value 1 is assigned to the result TRUE, so that a logical command adds up the number of TRUE results. == checks for equality, != checks for inequality, >, <, >=, <=. 

In [None]:
x=c(8,5,5,1,6,9,7,4)


In [None]:
x<6
sum(x<6)
sum(x>3)
x==5
x!=5

In [None]:
2<x&x<8


In [None]:
x<4|x>6
sum(x<4|x>6)

## Logical Operators

Generate a vector x=c(1:20). Create a vector (each) containing logical operators indicating


- Which elements of x are smaller than 15.
- Which elements of x are smaller than 10 and larger than 5.
- Store the sums for which the two statements are true in a vector k. 


**R-Exercises**

Logical Operators

Generate a vector x=c(1:20). Create a vector (each) containing logical operators indicating


- a. Which elements of x are smaller than 15.
- b. Which elements of x are smaller than 10 and larger than 5.
- c. Store the number for which the two statements are true in a vector k. 




# Ordering of data sets

Data can be ordered by size or alphabetically (ascending or descending).
sort(), rank() und order().


With sort() the values are sorted by size, default is ascending. rank() specifies the ranks of the values in the ordered order. order() has the practically reversed function, i. the first element is the smallest value of the component x, etc.


In [None]:
x=c(4,6,2,7,9,1,5,4)
x

In [None]:
x.sort=sort(x)
x
x.sort
sort(x,decreasing=T)

In [None]:

rank(x)

In [None]:
order(x)

# Statistical functions

This section gives an overview over how statistical functions are constructed in **R**, how to apply them to data and how to write these functions yourself. 

## Sums and frequencies

The simple sum is given by sum(x) and the cumulative sum is calculated with cumsum.

In [None]:
x=c(1,2,3)

In [None]:
x=c(1,1,5,2,1,5,2,3,10,2)
n=length(x)

## Frequencies
Frequency counts (absolute and relative) can be made using the functions:

table() und hist()

In [None]:
table(x)
x

In [None]:
hist(x)$counts

In [None]:
#Frequencies- one-dimensional


table() can also be used for analyzing multi-dimensional frequencies.

In [None]:
x=c(1,1,1,2,1,3,3,2,1,3)
 y=c(0,0,0,1,0,0,1,1,0,1)
 z=c("a","b","b","a","b","b","a","b","a","a")
data.frame(x,y,z)

In [None]:
table(x,y,z)


prop.table() generates tables with the conditional distribution in the columns/rows.

## Exercises: Sums and Frequencies and using functions

$$d=(77 \, 93 \, 92\, 68\, 88\, 75\, 100)$$

- Calculate the mean of $d$, i.e $\bar{d}$. 
- Sort all elements of $d$ in ascending/descending order.
- What is the smallest and biggest element of $d$?

## Statistical Distributions
 
In the context of statistical distributions of discrete or continuous random variables, four types of calculations/procedures are of particular interest, namely
 

- the calculation of values of the probability function or density function,
- the calculation of probabilities for certain events,
- the calculation of quantiles,
- Simulation, i.e. drawing random numbers from a given distribution.

 
In principle, the command syntax in R is uniformly regulated, i.e. they are similar for all distributions.
 
### Normal distribution
The commands for the above calculation types and simulation for the normal distribution are

dnorm(), pnorm() qnorm() und rnorm().

 
Where the "d" is the density and calculates values of the density function at a specified location, the "p" for probability and calculates the distribution function, and "q" for quantile and calculates values of the quantile function. "r" stands for random and draws random numbers from the normal distribution.

**Example**

 $\mathcal{N}(1,4)$

In [None]:
mean(rnorm(n=1000,mean=1,sd=2))
sd(rnorm(n=1000,mean=1,sd=2))
x=seq(-5,7,le=100)
z=dnorm(x,1,2)
plot(x,z,type="l")


Due to the symmetry of the normal distribution, the value of the distribution function at the expected value is always 0.5:

In [None]:
pnorm(0)
pnorm(0,1,2)
pnorm(1,1,2)


How to calculate the probability for the interval $[2,3]$ for a $\mathcal{N}(1,4)$- distributed random variable X, i.e. the probability $P(2\leq X \leq 3 )$? 

In [None]:
pnorm(3,-1,2)-pnorm(2,-1,2)

What is the 0.6 quantile of the $\mathcal{N}(1,4)$ distribution?

In [None]:
 qnorm(0.05,0,1)

What are the 0.25, 0.5, and 0.75 quantiles of the standard normal distribution?

In [None]:
qnorm(c(0.25,0.5,0.75),1,2)

Suppose we want to draw 1000 random numbers from $\mathcal{N}(1,4)$ and visualize the resulting density in a histogram:

In [None]:
x=rnorm(1000,1,2)
?hist
hist(x)

### Binomial distribution
The basic commands are
dbinom(), pbinom(), qbinom() und rbinom().

**Example**
Let $X$ be binomial distributed with $N = 10$ and $\pi = 0.5$. We know this just corresponds to throwing a fair coin 10 times. The random variable X specifies the number of "head" ("number") in 10 throws.

What is the probability for "exactly 8 times head", i.e. $P(X=8)$?

In [None]:
dbinom(8,10,0.8)



What is the probability for $P(2<X\leq 6)$?

In [None]:
pbinom(6,10,0.5)-pbinom(3,10,0.5)

In this case we have to differentiate between $<$ und $\leq$. Why?
What are the 0.25- ,0.5- and the 0.75-quantiles of the  $\mathcal{B}(10,0.5)$-distribution?

In [None]:
qbinom(c(0.25,0.5,0.75),10,0.5)

How can we simulate a 20-time coin toss?

In [None]:
rbinom(20,1,0.5)

 This corresponds to the 20-fold repetition of a Bernoulli experiment, i.e. the 20-time draw from a $\mathcal{B}(1.0.5)$ distribution. The number of zeros/ones is then $\mathcal{B}(20,0.5)$-distributed. The probability function of this distribution can be represented over

In [None]:
x=0:20
y=dbinom(x,20,0.5)
plot(x,y,type="h")

In R, there are similar commands for many other distributions, e.g. dgamma(), dexp(), dpois(), dunif() etc. 

## Multiplication of a random variable with a constant

 For a random variable X with $\mathbb{E}[X] <\infty$ and $\mathbb{E}[X^2] <\infty$, the mean value, and the variance exist, all $c \in \mathbb{R}: \mathbb{E}[c\cdot X]=c \cdot  \mathbb{E}[X]$ $Var[c\cdot X]=c^2\cdot Var[X]$.
 
 Usually the "random generator" of R for each given sample draws new numbers, that is, usually the results are slightly different in the repetition of a draw.

 
With the command set.seed () the "location" at which the random generator starts to simulate can be manipulated so that a particular sample can be replicated.

In [None]:
set.seed(10)
rbinom(10,1,0.5)
set.seed(10)
rbinom(10,1,0.5)

 But: this is different than: 

In [None]:
set.seed(10)
rbinom(10,1,0.5)
rbinom(10,1,0.5)

## R-Exercises: Statistical Distributions

### Exercise 1


Consider this multiple choice test:


- 12 Questions
- 5 possible answers, 1 is correct.



If the answers are chosen randomly


- What is the probability of exactly four correct answers? 
- What is the probability of four or fewer correct answers?

### Exercise 2

Assume that the results of an exam ($x$ ) are normally distributed with $\mu=72$ and $\sigma=15.2$

- What is the proportion of students that have 84 or more points (on avg.)
- 1000 Students take the exam. Simulate the results, calculate $\bar{x}$ and $\hat{\sigma}$ (with sd()).


### Exercise 3

Generate 1000 random numbers on the interval between a=0 und b=1 with 

$$f(x) =\begin{cases}\frac{1}{b-a} \quad \mbox{if} \quad a\leq x \leq b\\
0 \quad \mbox{if} \quad x<a \quad \mbox{or} \quad x>b \end{cases}$$ 

# Simple programming

Very often we want to repeat a particular operation many times: either over different elements of a given data structure (columns of a matrix, individual elements of a vector), or perform an operation until a satisfactory result is reached (run an algorithm until the the value converges to a prespecified level). As well, we sometimes have to check whether a condition about the data is fulfilled, e.g. are elements of a vector equal to 0 and perform different operations depending on the result.

## Using for-loops

A for-loop is a procedure in which a variable repeats a particular routine, changing its value according to the values of a given data set (usually numeric).

In [None]:
x=0
for (i in 1:3) { 
  x
    print(i)
   x=x+i}
x

We create a scalar x with the value 0 and then begin a for loop where the variable i passes through the values 1, 2 and 3. In the run of the first loop, i has the value 1, i. x is overwritten and is given the value 1 = 0 + 1. In the run of the second loop, i the value 2, i. x is overwritten and has the value 3 = 1 + 2. In the run of the third loop, i the value 3, i. x is overwritten and has the value 6 = 3 + 3.
  

In [None]:
z=matrix(NA,3,2)
z
x=c(0,0)
#z=c()
for (i in 1:3)
   {
     x=x+i
     z[i,]=x
     }
x
z

In the next example, we carry out a simulation study on the law of large numbers. This means that "large" deviations of the sample mean from the expectation for increasing sample size become increasingly improbable. We are now repeating samples from a Bernoulli distribution, that is, from $\mathcal{B}(1.0.5)$-distribution that simulates the tossing of a fair coin. The sample mean, which results from zeros and ones, corresponds in this case to the relative proportion of the ones. This relative fraction would converge stochastically to 0.5, the expectation value of a $\mathcal{B}(1.0.5)$ distribution for larger samples.
 

In [None]:
rel_share=c() ###we initialize an "empty vector"
 N=500#Number of draws
 for (n in 1:N)#initialize the loop
   {
    Sample=rbinom(n,1,0.5)#draw a vector with sample size n
     rel_share[n]=mean(Sample)#calculate the proportion of the ones
     }
plot(1:N,rel_share,#plot the results of all samples
        main="B(1,0.5)-Distribution: Law of the large numbers",
        xlab="n",ylab="Relative proportion",pch=16,
        cex.axis=1.5,cex.lab=1.5,cex.main=1.5)
 abline(h=0.5,lwd=2,col=2)#We add a line for the expected value

## The if-Condition

Within a for-loop, it may be useful to perform arithmetic operations only under certain conditions. The basic syntax for this is

if (condition){
     statement  
     } 
 else{
     alternative
     }
     
     
where else can remain unspecified. 

In [None]:
y<-c(1,4,5,2,7,8,2,4)
N <- length(y)
y.sq <- numeric(N)#creates an empty numeric vector of length N
for(i in 1:N){ 
 y.sq[i] <- y[i]^2#every element of y.sq is replaced by the respective squared value.
   #only when this condition is met, i.e. in the last iteration of the loop , i.e. only in the last iteration of the loop
    if(i==N){
        print(y.sq)
    }else{print("not finished")}
}#initializing the for-loop

 
In shorter form the ifelse() command is an alternative:

ifelse(test, yes, no)

In [None]:
y<-c(1,4,5,2,7,8,2,4)
N <- length(y)
y.sq <- numeric(N)#creates an empty numeric vector of length N
for(i in 1:N){ 
 y.sq[i] <- y[i]^2#every element of y.sq is replaced by the respective squared value.
   #only when this condition is met, i.e. in the last iteration of the loop , i.e. only in the last iteration of the loop
    ifelse(i==N ,print(y.sq),print("not finished"))
}#initial

## The while-loop
 
 The while-loop performs a program ("statement") as long as certain data conditions ("test expression") are met. The basic syntax is

 while (testexpression)
{
   #increment (i+1);
    statement
}


Here, test_expression is evaluated and the body of the loop is entered if the result is TRUE.

i <- 2
results<-c()
while (i < 6) {
   print(i)
  x=i
  results[i]<-i
   i = i+1
}
x
results

In the above example, i is initially initialized to 1.

Here, the test_expression is i < 6 which evaluates to TRUE since 1 is less than 6. So, the body of the loop is entered and i is printed and incremented.

Incrementing i is important as this will eventually meet the exit condition. Failing to do so will result into an infinite loop.

In the next iteration, the value of i is 2 and the loop continues.

This will continue until i takes the value 6. The condition 6 < 6 will give FALSE and the while loop finally exits.
 

There is one other loop that can be used in R: repeat. 
 

## Loops and functions


### Exercise 1 

Using the following variables:

- x=1
- y=40
- i=c(1:10)

For this exercise, write a for() loop that increments x by three and decreases y by two, for each i.

### Exercise 2

Using the following variables:

- a=15:10
- b=20:15

For this exercise, type a while () loop that computes a vector x=225 224 221 216 209 200, such that

- x[1]=a[1]*b[6]
- x[2]=a[2]*b[5]
- x[3]=a[3]*b[4]
- .
- .
- x[6]=a[6]*b[1]

# Construction of functions
 
## Simple functions
Functions allow us to bundle certain computing operations and then only call them when needed.


The basic syntax is 

fun_name<-function(<arguments>)
{
##Do something
}


Call this function: 

fun_name(<argument value(s)>)##Option 1, this calls the function directly.
d=fun_name(<argument value(s)>)###This assigns the value of the calculation inside the function to the object d.



Functions return the last value to be evaluated, but it is also possible to return multiple values. 

Function arguments can be entered with and without a default value.

As an example we will look at the variance (implemented in R).
 
 $$s^2=\frac{1}{n-1}\sum_{i=1}^n(x_i-\bar{x})^2$$
 
For the sake of simplicity, we do not use the above-mentioned correction factor, but instead $\frac{1}{n}$, i.e. we estimate
 
  $$\tilde{s}^2=\frac{1}{n}\sum_{i=1}^n(x_i-\bar{x})^2=\frac{1}{n}
  \sum_{i=1}^nx_i^2+\bar{x}$$

In [None]:
vartilde=function(x){
     mean(x^2)-mean(x)^2
}

In [None]:
x<-rnorm(10000)
vartilde(x)
var(x)

In [None]:
vartilde=function(x){
    var=mean(x^2)-mean(x)^2
    a="hello"
    return(list(variance=var,writing=a))
}

In [None]:
vartilde(rnorm(100))

In [None]:
vartilde(x)

## Functions with multiple arguments
 
 We can also write functions with multiple arguments. As an example, we write the simulation of the law of large numbers from the last section as a function.

In [None]:
GGZGraph=function(N,pi)
{
 rel_share=c()
  for (n in 1:N)
  {
    sample=rbinom(n,1,pi)
    rel_share[n]=mean(sample)
  }
  plot(1:N,rel_share,
       main="Law of large numbers",
       xlab="n",ylab="Relative share",pch=16,
       cex.axis=1,cex.lab=1,cex.main=1)
  abline(h=pi,lwd=2,col=2)
}


GGZGraph(N=100,pi=0.4)
GGZGraph(N=500,pi=0.4)
GGZGraph(N=1000,pi=0.4)

## Functions with multiple output objects
 
 If the function is to return an object at the end, which can also be assigned (list, vector, matrix), this object must be printed as an expression at the end of the
function text (of the program). Suppose we want the previous simulation program to output all calculated relative frequencies, as well as the fraction of all sample means in a specified interval (a, b).

In [None]:
GGZSim=function(N=100,pi=0.5,a,b)
{
  rel_share=c()
  for (n in 1:N)
  {
    sample=rbinom(n,1,pi)
    rel_share[n]=mean(sample)
  }
  IntervallAnt=sum(a<rel_share&rel_share<b)/N
   return(list(rel=rel_share,Interval=IntervallAnt))
}

In [None]:
result=GGZSim(100,pi=0.6,a=0.4,b=0.6)
Interval<-result$Interval

dummy<-function(rel,Interval)
    {
    Interval=GGZSim(100,pi=0.6,a=0.4,b=0.6)$Interval
    ifelse(Interval>0.45,"yes","no")
    
}


In [None]:
dummy(Interval)

  Functions can also be arguments to a function itself.

### Exercise 3 

- Write a function that takes one argument: "name". 
- The function should take the first element of "name" return this and a second object along with the sentence: "My first function".

### Exercise 4 

- Draw a vector $u$ with length 30 from $\mathcal{N}=(0,1)$. 
- Calculate $u^2$ for the first k=10 values in u.
- Calculate $u^2$ for the first k=10 values in u only when $|u|>l=0.8$, otherwise it remains the original value.
- Write a function that takes three arguments ($u,k,l$) and returns two results: A vector of length($k$) with the result of the calculation $u^2$, $u$ respectively) and a scalar indicating the number of elements of $u$ for which the condition is fulfilled. 
