---
### Q (Google, Hard):
What does it mean for an estimator to be unbiased? What about consistent? Give examples of an unbiased but not consistent estimator, as well as a biased but consistent estimator.


### A:
We can think of Estimator Bias as the following:
- Archer: estimator $\hat\theta(X_1,...,X_n)$
- Bow and arrow: data $x_1,...,x_n$
- Arrow in target: estimate, $\hat\theta(x_1,...,x_n)$
- Bullseye: Parameter of interest $\theta$

**Bias**, $B$, of an estimator, $\hat\theta$, is $B=E(\hat\theta)-\theta$.

An estimator is **unbiased** if and only if $B=0$ or equivalently, $E(\hat\theta)=\theta$.

An estimator is **consistent** if it converges in probability to the true value of the parameter.

<br/><br/>
**Bias & Consistent Estimator:**

An example of an estimator that is *consistent* and *biased* is the sample variance:
$$S^2_n=\frac{1}{n}\sum^n_{i=1}(X_i-\bar X)^2$$
It is easy to show that $E(S^2_n)=\frac{n-1}{n}\sigma^2$. Hence the estimator is *biased*.

Assuming finite variance $\sigma^2$, we see that bias converges to zero as $n$ goes to infinity because
$$E(S^2_n)-\sigma^2 = -\frac{1}{n}\sigma^2$$
It can also be shown that the variance of the estimator tends to zero and so the estimator converges in mean-square. Hence, it is also convergent in probability.

<br/><br/>
**Unbiased & Not consistent estimator:**

Consider an iid sample, ${x_1,...,x_n}$, we can use $T_n(X)=x_n$ as an estimator of the mean $E(x)$. Note the sampling distribution of $T_n$ is the same as the underlying distribution (for any $n$, as it ignores all points but the last), so $E[T_n(X)]=E(X)$ and it is unbiased, but it does not converge to any value.
> Suppose your sample was drawn from a distribution with mean $\mu$ and variance $\sigma^2$. Your estimator $\tilde x=x_1$ is unbiased as $E(\tilde x) = E(x_1) = \mu$ implies the expected value of the estimator equals the population mean. Your estimator is on the other hand *inconsistent*, since $\tilde x$ is fixed at $x_1$ and will not change with the changing sample size, i.e. will not converge in probability to $\mu$.

However, if a sequence of estimators is unbiased and converges to a value, then it is consistent, as it must converge to the correct value.


Source:
- https://en.wikipedia.org/wiki/Consistent_estimator
- https://stats.stackexchange.com/questions/174137/an-example-of-a-consistent-and-biased-estimator
- https://math.stackexchange.com/questions/119461/problem-with-unbiased-but-not-consistent-estimator

---
### Q:
Assume we have a classifier that produces a score between 0 and 1 for the probability of a particular loan 
application being fraudulent. In this scenario: 

a) what are false positives 

b) what are false negatives

c) what are the trade-offs between them in terms of dollars and how should the model be weighted accordingly?

### A:
Let's first review what <strong>False positive</strong> and <strong>False negative</strong> means.

<strong>False positive</strong>: Also known as <strong>Type I</strong> error, is when you test yourself for pregnancy and get a positive result, but in reality you are not. <mark>In our scenario, a false positive means that the loan application is classified as fraudulent, but in reality it is not.</mark>

<strong>False negative</strong>: Also known as <strong> Type II</strong> error, is when you get a negative result, but in reality you are pregnant. <mark>In our scenario, the classifier classifies the loan application as not fraudulent, but in reality it is actually fraudulent.</mark>

<strong>Sensitivity (TPR/Recall)</strong>: is the ability of a classifier to detect whether a loan application is fraudulent.
<br/><br/>
$$Senitivity = \frac{number\,of\,true\,positives\,(TP)}{number\,of\,true\,positives\,(TP) + number\,of\,false\,negatives\,(FN)}=\frac{correctly\,classified\,fraudulent\,applications}{total\,number\,of\,fraudulent\,applications}$$

A negative test result for a classifier with high sensitivity is reliable since the classifier rarely misclassifies fraudulent applications. A positive test result for a classifier with high sensitivity is not neccessarily useful. If there is a classifier that always classifies applications positive 100% of the time, then the result will always be positive. However, it is important to note at this point that sensitivity by definition does not take account false positives. So in this sense, this classifier is useless.

<strong>Specificity</strong>: is the ability of the classifier to correctly detect whether a loan application is not fraudulent.
<br/><br/>
$$Specificity = \frac{number\,of\,true\,negatives\,(TN)}{number\,of\,true\,negatives\,(TN) + number\,of\,false\,positives\,(FP)} = \frac{true\,applications\,classified\,as\,not\,fraudulent}{total\,number\,of\,true\,applications}$$

A positive result in a test with high specificity is useful for ruling in as fraudulent. The test rarely gives positive results for applications that are not fraudulent. A positive results means high probability of the presence of fraudulent application.

<strong>False Positive Rate (TPR)</strong>:$1-\frac{FP}{FP+TN}$

<strong>AUC-ROC Curve</strong>: is a performance measurement for classification problem at various threshold settings. ROC (Receiver Operating Characteristics) is a probability curve and AUC (Area Under the Curve) represent degree or measure of separability. It tells us how much model is capable of distinguishing between classes. The higher the AUC, the better the model is at predicting 0's as 0's and 1's as 1's. By analogy, the higher the AUC, better the model is at distinguishing between patients with disease and no disease.
- Speculating model performance: A perfect model has AUC close to 1, on the other hand, a poor model has AUC near 0 (this means that it is classifiying 0s as 1s and 1s and 0s). Lastly, when AUC is 0.5, the model has no class separation capacity. 
<br/><br/>
<strong><center>'An Ideal Model (AUC=1)'</center></strong>
![img](assets/auc_1.png)
When AUC = 1, the model is perfectly able to distinguish betwween positive class and negative class.
<br/><br/>
<strong><center>'AUC = 0.7'</center></strong>
![img](assets/auc_0.5.png)
When AUC = 0.7, then there is a 70% chance that model will be able to distinguish between positive class and negative class.
<br/><br/>
<strong><center>'AUC = 0.5'</center></strong>
![img](assets/auc_0.5_real.png)
When AUC = 0.5 model is unable to distinguish between positive class and negative class.
<br/><br/>
<strong><center>'AUC = 0'</center></strong>
![img](assets/auc_0.png)
When AUC = 0 model is reciprocating classes.

<strong>Relationship between Sensitivity, Specifity, FPR and Threshold</strong>: Sensitivity and Specifity are inversely proportional to each other. So when we increase sensitivity, specificity decreases and vice-versa. 
<br/><br/>
<center>Sensitivity⬆️, Specificity⬇️ and Sensitivity⬇️, Specificity⬆️</center>
<br/><br/>
Similarly, when we increase the threshold, we get more negative values, thus we get higher specificity and lower sensitivity. Since FPR is 1-specificity, so when we increase TPR, FPR also increases and vice-versa.
<br/><br/>
<center>TPR⬆️, FPR⬆️ and TPR⬇️, FPR⬇️</center>


Source:
- https://en.wikipedia.org/wiki/Sensitivity_and_specificity
- https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5

---
### Q:
Given that $X$~$Unif(0,1)$ and $YUnif(0,1)$, what is the expected value of the mimimum of X and Y?

### A:
#### Motivation:
Let $Z=min(X,Y)$. 

Then:
$$1 - F_z(z) = P(min(X,Y) > z) = P(X>z,Y>z)$$

$$= 1-P(X\leq z) - P(Y \leq z) + P(X\leq z, Y \leq z)$$

$$= F_x(z) + F_y(z) - F_x(z)F_y(z) = F_{min}(X,Y)$$

where last inequality is by the <strong>inclusion-exclusion principle</strong>: $A\cup B = A + B - A\cap B$

#### Solution:
Note: $F(y) = 1 - P(Y>y) = 1 - P(min(X_1,...X_n)>y)$ where $min(X_1,...X_n) > y$ whenever $X_i > y$ for all $i$.

Hence, $F(y)=1-P(X_1>y)P(X_2>y)...P(X_n>y)= 1 - P(X_1>y)^n$.

And since $P(X>y)=\frac{b-y}{b-a}$ by definition (note, $P(X<y)=\frac{y-a}{b-a}$), we have that $1-P(X_1>y)^n = 1-(\frac{b-y}{b-a})^n$.

Taking the derivative w.r.t. $y$, we get $f(y)=\frac{n}{b-a}(\frac{b-y}{b-a})^{n-1}$

Since $E(y) = \int_{-\infty}^{\infty} yf(y)dy$, we get that $E(Y)=\frac{b+na}{n+1}=\frac{1}{3}$

#### Another solution:
Let $X_1,...X_n$ be iid R.V. from uniform distribution and let $Z_n = min(X_1,...,X_n)$. Then $E[Z_n]=\int_0^1P(X>x)^ndx=\int_0^1(1-x)^ndx=\frac{1}{n+1}$

https://tsourakakis.com/2015/12/14/expectation-of-minimum-of-n-i-i-d-uniform-random-variables/