# Chapter 1 Foundation of Probability Theory Part I

#### *Zhuo Jianchao* 

Feb 2, 2020 *Rev 1*

## Random Experiments

To investigate economic phenomenon, it's of scientific convenience to characterize it into several **repeated experiments**. When forcasting a country's GDP growth, the so called experiment is repeated each year, each yielding a GDP value for that year. We know exactly what GDP value of a given year *prior* to the current date is, whereas GDP at any year after the current date needs to be **predicted**, until that happens, yielding an *outcome*.

Because of the randomness associated with the experiment, such repeated experiment is also called a **random experiment**.

### Definition 1 Random Experiment

We have mentioned some key features of **experiment**, it 

> has a set of possible outcomes, like GDP value;

> which outcome really ocur is unknown in advance;

> the only way we can figure out which outcome ocurs is to **do** the experiment and observe it.

## Basic Concepts of Probability

Econometrics is built on **probability theory**, about which we'll introduce some basic concepts.

### Definition 2 Sample Space

What we observe in one experiment is called an **outcome**, like your final grade in the course.

Let's say this course's grading is one of *A*, *B* and *C*, then one of this three can be your grade. For instance, you get an *A* for this course, then *A* is the **outcome**, but before your score come realized, you have no idea exactly what it is, then it may be one of *A*, *B* and *C*, which combined are called **sample space**, and denoted by $S$, or $\Omega$ by convention. We use `{}` to denote sets, that is
$$S=\{A,B,C\}$$

In other words, **sample space** is the *set* of all possible outcomes in an experiment.

Each possible outcome in a sample space is called an **element**, or **sample point**. 

We want to draw samples from the sample space and use it to infer the property of interests of the probability associated with eah possible outcome in the sample space.

### Example 2.1

Assume your grade can be *A*, *B* and *C*, and your confidence of *what your grade will be* is equally between these three grades, then every time you run the cell below will yield an outcome.

In [None]:
S = ['A', 'B', 'C']
your_final_grade = rand(S, 1)

When you run the cell, one and only one of the possible outcome in the sample space gets **realized**, which is assigned to the variable named `your_final_grade`.

The process from unknown outcome to a realized outcome is the experiment.

Be aware that outcome doesn't have to contain single component.

### Example 2.2

For example, if you roll two coins in one time, and use $H$ for the heads, and $T$ for the tails, and we **ignore the order** of occurance of heads and tails, that is, rolling a head and tail in no matter order is deemed no difference, under which circumstances *HT* and *TH* is the same, both denoted as *HT* for simplicity. Then the possible outcomes for this experiment is
$$S=\{HH, HT, TT\}$$

If, this time, we **consider the order** of occurance, that is, *HT* and *TH* are two distinct outcomes, then the possible outcomes is
$$S=\{HH, HT, TH, TT\}$$

Let's create two functions each is an experiment simulating rolling of two coins, with or without the consideration of order.

In [9]:
"""
even we do not consider the order of heads and tails, 
occurance of a head and a tail in two rolls is determined by two procedures, HT and TH, not a single HT or TH
if we simply rule out one of it, 
the possibility is evenly spread over three possible outcomes, not the correct four.
"""

function roll2_ignore_order()
    S = ["HH", "HT", "HT", "TT"]
    # that's the reason we put two HT in S
    rand(S, 1)
end


function roll2_consider_order()
    S = ["HH", "HT", "TH", "TT"]
    rand(S, 1)
end

roll2_consider_order (generic function with 1 method)

Simply by calling one of the functions we have just defined yields one possible outcomes, under that circumstance.

In [7]:
roll2_ignore_order()

1-element Array{String,1}:
 "HT"

In this case of rolling two coins, each possible outcome is determined by each outcome of two coins. In other words, it's two outcomes of rolling two coins that combined to be possible outcome of the experiment. That's the reason why the possible outcome is composed by two components.

> **the number of components in a single possible outcome is the number of detective devices we put in a single experiment.**
In example 2.2, we put two detectors each installed on a coin, which tells whether the coin turns out to be a head or a tail.
Reading the result of a detector is called an **observation**.

In [25]:
"""
we can actually incorporate detector into rolling each coin.
remember, a detector always tells us whether the coin is H or T, that is,
the possible outcome is H or T.
"""

function det_roll2()
    rolling_a_coin = ['H', 'T']
    S₁ = rand(rolling_a_coin, 1)
    S₂ = rand(rolling_a_coin, 1)
    S = [x*y for x in S₁ for y in S₂]
end

det_roll2 (generic function with 1 method)

Simply by calling the function `det_roll2` yields an outcome consisting of two components; each is the result, or **observed value**, detected in each coin.

Because a coin always turns out to be a head or tail, the sample space can be enumerated exhaustively. 

This kind of sample space is called *countable* and *finite*, each corresponding to "enumeratedability" and "exhaustivity".

### Example 2.3

A sample space can be infinite. For instance, let *t* be the duration in minutes of a round of a MOBA game. The bases of both teams is immune from any kind of damage at the first 6 minutes. Also, the game reaches a tie if neither of the bases is destroyed after an hour. Let *T* be the possible duration of a round of game just starting, then the sample space of *T* is
$$S=\{t\in\mathbb{R}: 6\leq t \leq 60\},$$
where $\mathbb{R}$ denotes the real line.

*T* can take on any decimal number between 6 and 60 inclusively, simulated by the following function.

In [60]:
"""
because decimal number can have arbitary length of decimal place, 
we should tell our function the granularity under which the real line is divided
in order for us to choose value from,
then, the function takes in one argument as granularity,
the larger the granularity, the more precise the outcome will be.
"""

function duration_gameplay(gran)
    # granularity should be non-negative integer
    if gran isa Int64 && gran >= 0
        interval = 1/(10^gran)
        t = rand(6:interval:60)
    else 
        error("granularity should be non-negative integer.")
    end
end

duration_gameplay (generic function with 1 method)

Everytime we call function `duration_gameplay` yielding a decimal number between 6 and 60, where the argument specifies the length of decimal place of the outcome in this function definition.

In [59]:
duration_gameplay(2)

15.41

In this example, the sample space is **uncountable** and **infinite**, because we can arbitarily specify the granularity.

### Definition 3 Event

The sample space for rolling one cubic dice is
$$S=\{1,2,3,4,5,6\}$$
We can impose more constraints on it, for example, we want a even number. In this case, the constrained set becomes
$$A=\{2,4,6\}$$
The constrained set is called an **event set**, because such event that *we get an even number from rolling one dice* is realized if and only if one of the numbers in the event set is realized.

Sample space contains all the elements, or basic outcomes of a random experiment. The event set contains only a part of elements in the sample space, in other words, <u>the event set is a subset of the sample sapce</u>.

### Example 3.1

Let's say the developers in Example 2.3 want to test whether the game lasts too long or too short in order to optimize some settings of the gameplay. A single round finished in less than 10 minutes is deemed too short and one lasting more than 45 minutes is regarded as too long.

Remember the sample space of the experiment
 $$S=\{t\in\mathbb{R}: 6\leq t \leq 60\}$$

Here, by our specifications, there are three events, 
* A: *the game lasts too short*;
* B: *the game lasts fairly*; 
* C: *the game lasts too long*

each is a subset of the sample space, 
$$A=\{t\in\mathbb{R}: t<10\}$$
$$B=\{t\in\mathbb{R}: 10<t<45\}$$
$$C=\{t\in\mathbb{R}: t>45\}$$

We can extend the function `duration_gameplay` to not generate duration of a single round, but an evaluation of it, whether it being too long, too short, or just fair.

In [7]:
"""
even though we already have a benchmark to evaluate the duration, that is, 10 min and 45 min,
we still need to specify how precise we want our measurement of duration to be, defaulted to be 2.
"""

function duration_gameplay(gran=2)
    if gran isa Int64 && gran >= 0
        interval = 1/(10^gran)
        t = rand(6:interval:60)
        if t < 10
            return 'A'
        elseif t > 45
            return 'C'
        else
            return 'B'
        end
    else 
        error("granularity should be non-negative integer.")
    end
end

duration_gameplay (generic function with 2 methods)

Simply by calling the function, it generate duration of a round, compare it with the benchmarks 10 and 45, then return which event happens in the experiment.

In [8]:
duration_gameplay()

'B': ASCII/Unicode U+0042 (category Lu: Letter, uppercase)