# Table of contents
**!¡!¡ Missing**

# Definition of a model

At the beginning it is important to define what actually a "model" is. 
In the lecture it was defined as the following: 
<div class="alert alert-block alert-info"> 
"It’s a human-made (abstract) simplification/representation of (observational) reality that is used to understand, define, quantify, visualize, or simulate a part or feature of reality"
</div>


An easy example for models is a land map of the world. The following picture shows a picture of the whole world in reality: 

!¡!¡ Image missing

![TheEarth.png](attachment:TheEarth.png)


Even if this picture shows the real world, many things cannot be recognized, as for example a large part of the world is hidden by clouds and there are no national borders to recognize.￼￼

The following model can be used to recognize the entire country area and the country borders:

!¡!¡ Image missing

![Worldmodel.png](attachment:Worldmodel.png)

Thereby, the model is just a human-made simplification of the real world to visualise clearly the earth's surface, country borders and country names.

## Example of determining the distribution of human size by a model

In this subchapter an experiment is described in five steps, which was carried out with students of the course. In this experiment, a model describing human distribution was tested with real life data.

>Step 1 - Derivation of the model: After the teacher asked about a model which is describing human height, a student answered that human height is determined by a normal distribution. Therefore, the normal distribution was tested.


>Step 2 - Execution: To verify this, the data was collected in the following Python [code](https://fronter.com/unisg/links/link.phtml?idesc=1&iid=1099891). 

In [None]:
heights = [175,170,205,190,190,200,185,190,185,165,170,185,185,180,195,175,190,190,160,180] # List of all heights from the students
plt.hist(heights, bins='auto')  # arguments are passed to np.histogram

<div class="alert alert-block alert-info"> 
Used functions and keywords (and etc.): [dictionary](https://docs.python.org/3/tutorial/datastructures.html), [.hist()](https://plot.ly/matplotlib/histograms/)</div>

>Step 3 - Result: If you run the written code, then you see the distribution of the size of the students. It becomes clear that the height in the course is not normally distributed. 

>Step 4 - Interpretations:
>>Interpretations of students: 
>>>a. Sample too small

>>>b. Not i.i.d., e.g. only 8 women out of 20 students in total

>> Correct answer: Human size is not determined by the normal distribution, but by other factors, such as size of parents, diet, childhood diseases but also still unknown variables, which one can assume as random variables. 

>Step 5 - Conclusion: The results of models of human height and also of many other models in theory often fit only with those in reality and is not a data generating process that does not generate the size of people by the relevant factors. Because many processes cannot be represented 100% by data transmission processes in reality, this is often not possible because not all factors and data of the generation process are known. There can be many models/human constructs that fit the data of human distribution, but there are only certain data that generate human height. The normal distribution only fits the data and does not generate it, which is why other data is created here.



<div class="alert alert-block alert-info"> 
In order to reproduce the experiment, the [code](https://fronter.com/unisg/links/link.phtml?idesc=1&iid=1099891) is still provided here, in order to carry out the experiment itself with a group:
</div>

In [None]:
heights = [123,456,789] # List of all heights from your examination group
plt.hist(heights, bins='auto')  # arguments are passed to np.histogram

<div class="alert alert-block alert-info"> 
Used functions and keywords (and etc.): [dictionary](https://docs.python.org/3/tutorial/datastructures.html), [.hist()](https://plot.ly/matplotlib/histograms/)</div>


# Introduction to AR(1) Processes


The Auto Regression Process model is a statistical model to represent a random process. Thereby, this model might fit well for the evolution of some variables over time. But it does **NOT** imply that the real-life data is generated by this model. To estimate this model, the correct way is (typically) to use OLS. 
The formula of the model is: 
![Bildschirmfoto%202018-03-27%20um%2017.57.30.png](attachment:Bildschirmfoto%202018-03-27%20um%2017.57.30.png)

The line (4) shows the main formula. Thereby, $x_t$ is the value of the investigated dependend variable, $c$ is a constant variable, which describes $x_t$, $px_t$$_-$$_1$ is a fraction of the value of the dependend variable in the prior period and $e_t$ is a randomised value. The line (5) formulates the rule that the standard error is normal distributed. The line (6) formulates the rule that p has to be smaller than 1. 

**Example for the AR model**:
The GDP of countries can be forecasted by using the AR model. The chart below shows the real GDP per capita values in a country and the dotted line shows the GDP per capita forecasted. 

![Bildschirmfoto%202018-03-27%20um%2018.25.29.png](attachment:Bildschirmfoto%202018-03-27%20um%2018.25.29.png)

You can see that the forecast is pretty accurate. However, as in the human height example, it should be noted that this model only matches the GDP data. However, it is not the model that generates the GDP per capita driven by data.


## Repetition of last week's homework

In the following, exercise five and six are solved and explained through #comments on the side of each line. Related to the topic of this lecture it is shown how data can be generated model-based (Exercise 6) and how they can be displayed (Exercise 5).

<div class="alert alert-block alert-info"> **Exercise 5**: 
The files [06_Apple.txt](https://fronter.com/unisg/links/link.phtml?idesc=1&iid=1078952), [06_Microsoft.txt](https://fronter.com/unisg/links/link.phtml?idesc=1&iid=1078958), and [06_Tesla.txt](https://fronter.com/unisg/links/link.phtml?idesc=1&iid=1078950) contain fake data on the stock market value of these three companies for everyday of the last year. Load these data into Python and plot the time series in one plot.
</div>

The following [code](https://fronter.com/unisg/links/link.phtml?idesc=1&iid=1078954) loads the data of ```06_Apple.txt```, ```06_Microsoft.txt```, and ```06_Tesla.tx``` into Python and plots the time series in one plot:


In [None]:
# Imports data from txt files and plots data
import matplotlib.pyplot as plt # Imports data from matplotlib.pyplot and creates an alias (plt) for it
import numpy as np # Imports data from numpy and creates an alias (np) for it

# Lists the firms Apple, Microsoft and Tesla
firms = ["Apple","Microsoft","Tesla"] 


for f in firms: # Iterates over the three firms
    myname = "06_" + f + ".txt" # myname is "06_" + f + ".txt"
    data = np.loadtxt(myname) # Loads data from the textfiles (myname)
    plt.plot(data, label = f) # Plots a plot with the firms data 

# plots the legend in the plot
plt.legend() 
plt.show() 

<div class="alert alert-block alert-info"> 
Used functions and keywords (and etc.): [import](https://www.programiz.com/python-programming/keyword-list#from_import), [matplotlib.pyplot](https://matplotlib.org/users/pyplot_tutorial.html), [numpy](https://wiki.python.org/moin/NumPy), [as](https://www.programiz.com/python-programming/keyword-list#as), [dictionary](https://docs.python.org/3/tutorial/datastructures.html), [for loop](https://wiki.python.org/moin/ForLoop), [.plt() and .show()](https://stackoverflow.com/questions/8575062/how-to-show-matplotlib-plots-in-python), [.legend()](https://matplotlib.org/users/legend_guide.html)</div>

The plot shows the time series of ```06_Apple.txt```, ```06_Microsoft.txt```, and ```06_Tesla.tx```. Still, it is possible to build an even more efficient code, because it is known that we have a folder that contains different text files. You do not need to know the structure of these, but there is also a function that retrieves them more efficiently. However, this is one level further and will therefore not be deepened too much in the course.


<div class="alert alert-block alert-info"> **Exercise 6**:
The fake data has been created using an autoregressive process of order 1 (AR(1)):
xt+1 = αxt + εt+1, (1)
where x0 = 0, 0 ≤ α < 1 is the persistence parameter, and εt is the innovation shock which is a draw from a standard normal (i.i.d.). Choose three values for the persistence parameter α (one for each company), generate fake data for 365 days, and output these data in three different text files.
</div>

The following [code](https://fronter.com/unisg/links/link.phtml?idesc=1&iid=1078954) chose three values for a, generated fake data for 365 days and output this data in three different text files.

In [None]:
# Generates realizations of AR(1) processes and prints them into a file
import numpy as np # Imports data from numpy and creates an alias (np) for it

# Set seed for innovation
np.random.seed = (0) # Sets the random seed to 0; It generates each time the same billions of billion numbers

α = {'Microsoft': 0.0, 'Apple': 0.5, 'Tesla': 0.99} # Used a list as a dictionary for alpha
ts_length = 365 # ts_length is 365


for firm in α: # Iterates over the three firms
    # Open file where to store realisations
    myname = "06_" + firm + ".txt" # myname is "06_" + firm + ".txt"
    myfile = open(myname, 'w') # myfile is open myname in writing mode
    
    # Initialise current value
    current_x = 0 # The current_x is zero
    for i in range(ts_length): # Iterates over i in the range of ts_length
        current_x = (α[firm] * current_x) + np.random.randn() # current_x is the a of the firm multipled with the current_x summed with np.random.randn()
        myfile.write("%s\n" % current_x) # write myfile
    myfile.close() # close it and free up any system resources taken up by the open file

Before the break we looked at the exercise 5 and therefore, we will solve exercise 6 and 7 in this session. Hence, let us start with the exercise 6. 


**Task**:

The fake data has been created using an autoregressive process of order 1 (AR(1)):

$x_{t+1}$ = $\alpha$$x_{t}$ + $\epsilon_{t+1}$

where $x_{0}$ = 0, 0 ≤ $\alpha$ < 1 is the persistence parameter, and $\epsilon_{t}$ is the innovation shock which is a draw from a standard normal (i.i.d.). Choose three values for the persistence parameter $\alpha$ (one for each company), generate fake data for 365 days, and output these data in three different text files.



Exercise 6 asks us to generate fake data according to an AR(1) Process and to save them in three different text files. The solution for this exercise is found in the file '07_AR1_Calibration'.