<a href="https://colab.research.google.com/github/DrSubbiah/1.Bayesian-Inference/blob/master/4_Distributions_Two_Intentions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <font color="darkblue"> Two Intentions

In our earlier notes we specify that theoretical distributions represent real time scenarios. Such distributions have two intentions, mathematical and statistical perceptions. Each perception provides a sense of random sample (data).

Understanding a real-time random phenomenon is by observing what the process yields as a **sample** of data. Hence, there are two aspects, one is the **outcome of the process** and another one the **quantity which is responsible for generating the outcome**.

In modeling such processes, one method uses distribution form of the underlying random variable to summarize (for better understanding) the data, assuming the knowledge about the underlying process.

Other one is **estimating** the parameter (quantity which is responsible for generating the outcome) when we have values realized (data) from the random variable $X$.

# <font color="maroon"> Mathematical Tasks

In the language of $f(x|\theta)$, we gain **knowledge** about the random variable $X$ when underlying process $\theta$ is known.

There are two major ways to do;

1. quantities to summarize $X$

2. computing probabilities of certain events involving $X$


<font color="red"> Tossing a coin

Assume a **fair** coin is tossed, that means $\theta = 0.5$. If the coin is tossed 10 times then we can **calculate** the above two quantities

1. What is the outcome on an **Average**? What we could **Expect**?

1. What is the **probability** of getting 6 heads?

1. What is the **probability** of getting at least 3 heads, at most 5 heads, or between 5 and 8 heads?

<font color="red"> We can construct such questions if the **interest** is Amount of rain fall


## <font color="darkblue"> Summaries of a distribution

Assume the variable is a quantitative variable, then its range is an interval in Real line $\mathscr R=(-\infty, +\infty)$.
Let $f(x)$ be a pdf of $X$ in the range $\mathscr A \subset \mathscr R$

Further assume any function of X, say $g(X)$. We define expected value of $g(X)$ or expectation of $g(X)$ as

$$E[g(X)] = \int_\mathscr A {g(x)f(x)} dx$$  

Specifically, if $g(X) = X$ we get mean

$$E[X] = \int_\mathscr A {xf(x)} dx$$

Let $g(X)=X^2$. $$E[X^2] = \int_\mathscr A {x^2f(x)} dx$$

Then  we use this to get the variance of X as

$$V(X)=E[X^2]-E[X]^2$$

### <span style="color:blue"> Percentiles (Quantiles)

These summaries are defined basically as points of divisions for a distribution. Scaled from 0 (minimum) to 100 (maximum); equivalently 0 to 1 scale can also be used. It indicates the percentage of values (of the variable) that fall below a particular value. For some wide spread applications, see [here](https://www.princetonreview.com/grad-school-advice/good-gre-scores) and [here](https://engineering.careers360.com/articles/jee-main-marks-vs-percentile)

This is better than percentage because it gives the relative position of a value (of the variable). Percentile is a  value $x_\alpha$ of the variable $X$ such that with probability $\alpha$ $X$ will be less than or equal to $x_\alpha$. Mathematically,$100\times\alpha^{th}$ percentile is

$$p[X \le x_{\alpha}]=\alpha$$ where $0 < \alpha<1$

For example, we assume income distribution (a Gamma distribution with shape 100 and scale = 0.1) and want to know the value for picking top income group, say 10%. Then we need to find $90^{th}$ percentile. Less than that value (income), 90% of the income lie less than $x_{\alpha}$; equivalently 10% are more than $x_{\alpha}$.

## Probability computations

The above idea of computing percentiles can easily be extended to finding probabilities involving the random variable $X$ and one or two of its values($a, b$). Recall our assumption that $X$ is a quantitative (continuous) variable. $L, U$ are chosen properly from the range $\mathscr A$ of $X$

- $p(X\le b)=\int_L^bf(x)~dx$

- $p(X\ge a)=\int_a^Uf(x)~dx$

- $p(a\le X\le b)=\int_a^bf(x)~dx$

**<font color="darkred"> These formulas will be helpful to distinguish how probability is computed from a pdf $f(X|\theta)$ and value of the function $f(X|\theta)$ computed at a point**

In many computing platforms, we find **density function**; this computes the latter case and should not be confused with probabilities.

## Distribution Function

On the other hand we can compute probabilities involving $X$ using ***Distribution Function***; first quantity in the aforementioned three probabilites $F_X(b)=p(X\le b) = \int_L^b f(x)~dx$.

This probability can be understood as inverse function of percentile relation where we find $b$ when probability is given; where as distribution function computes $\alpha$ when $b$ is given and $b$ can be any real value. This formula computes the probability suitably if $b$ lies outside $\mathscr A$, the range of X.


## <font color="darkred"> Remarks:

1. **For categorical variable, represented by discrete distributions, all the calculations are same, except for replacing $\int$ by $\sum$**

1. However, **mass** function yields probability that $X$ takes on a specific value $x$; which is a difference compared to a continuous distribution


# <font color="maroon"> Statistical Tasks

## Meaning of  random sample (data)

Let $X$ be a random variable. Let us assume that we get $n$ outcomes of $X$ in an independent way in the sense that $i^{th}$ value  is no way dependent on what $(i-1)^{th}$ value is or its preceding value. A [classical text](https://www.amazon.com/Facts-Figures-Penguin-Press-Science/dp/0140135405) provides simple treatment for explaining independence and of course any mathematical statistics text deal with this notions.

Secondly, we assume that the underlying parameter of the process is same for all $n$ points. This constitutes a random sample from $X$ parameterized by $\theta$

we shall write a random sample as $$X_1, X_2,...X_n \stackrel{iid}{\sim}f(x|\theta)$$ *iid* refers to Independent Identical Distribution


Also using the law of independence, the joint distribution of $X_1, X_2,...X_n$ is $$f(x_1, x_2,...x_n) = f(x_1)\times f(x_2)\times f(x_3)......\times f(x_n)=\prod f(x_i)$$ or $$=\prod f(x_i|\theta)$$

**<font color="darkgreen">The parameter $\theta$ is same in each term in this product**  


## <font color="maroon"> An Example

We assume that we have an access for data on final decision of a e-commerce customer purchase decision behavior; made a purchase (Yes or No). For simplicity, let us assume the value (decision of a customer) as 0:Yes and 1: No. Then the sample may contain such a  sequence of 1's and 0's

<font color="blue"> $X=c(1, 0, 0, 1, 0, 0, 0, 0, 0, 0,1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0)$

The outcome of $X$, the random variable (decision) is binary (Yes or No); so we model $X\sim\mathrm{Bernoulli}(\theta)$, where $\theta$ is the proportion of success outcome.

**Here, buying decision "No" is considered as success. These 50 values are assumed to be *realized * from $X$**

We denote them as $X_1, X_2,...X_{50}$ so that each $X_i$ has **same probability density** function $f(X)$.

Also another assumption is that $i^{th}$ customer's decision (0 or 1) is is no way dependent on what $(i-1)^{th}$ value is; so, values are **<span style="color:blue"> independent** .

Collectively, in mathematical notion, it is $$X_1, X_2,...X_{50} \stackrel{iid}{\sim}\mathrm{Bernoulli}(\theta)$$ Observe that $\theta$ is independent of number of samples.

<font color="darkgreen"> So data is the information from customers whether they made a purchase or not.

However, the **objective** is to know (estimate) proportion  of customers do not make purchase (No, coded as 1). Then the problem leads to know about $\theta$, proportion of "No" (success event)

This can be understood that given a random sample, the function of $\theta$ is $\mathcal{L}(\theta|X)=\prod f(x_i|\theta)$.


This whole sentence makes sense if you read as, "likelihood function of $\theta$ **given** X is equal to ........". (recall mathematical aspect of $f(X|\theta)$).

<font color="darkred">This is typically what you see in the previous role as PDF; mathematical form is same. But, the intention changes. In fact, you should be able to appreciate $$\mathcal{L}(\theta|X)=f(X|\theta)$$

# <font color="darkblue"> Final Remarks


This notes aims to list few important two aspects of a theoretical distribution. Conventional notations for probability density/mass function $f(x|\theta)$ ( Parameter $\theta$ and random variable $X$) are discussed

<font color="maroon">Intention I Mathematical task

1.  list the summaries of a distribution using statistical quantities

1.  illustrate Three important functions associated with a random variable

1.  relate Standard Functions associated with any distribution

1.  explore how computing platforms define / handle these quantities



<font color="maroon">Intention II Statistical task

1.  Statistical (Uncertain) sense of distribution

1.  **$\theta$** is the key term (parameter) to be estimated

1.  Data is set of information collected for the **estimation** of quantity of interest

# <font color="darkblue"> Suggested list

**Appreciate the two intentions of a theoretical distribution**

Nevertheless, this attempt is to point out the reading trajectories through keywords and important aspects of a theoretical distribution. Following list of distributions could be a better starting point and it would be a better building block for the realm of distributions

1. Poisson

1. Geometric

1. Normal

1. Cauchy

1. t

1. Log normal

1. Logistic

1. Gamma

1. Exponential

1. Beta