Skip to content

Commit 910faae

Browse files
author
Amogh Singhal
authored
Update interview_prep.md
1 parent 1ee7330 commit 910faae

File tree

1 file changed

+23
-1
lines changed

1 file changed

+23
-1
lines changed

interview_prep.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,29 @@ The IQR is also used to determine outliers to the data set. This is in conjuctio
6363
| Based upon (a type of distribution) | Based on Normal distribution. | Based on Student-t distribution. |
6464

6565
### 6. Why do we take n-1 when calculating sample variance? Why is it useful ?
66-
Read about Besel correction
66+
Read about Besel correction for more technical definition
67+
68+
##### Intuitive explaination
69+
70+
If you are giving the standard deviation of an entire population and not a sample you actually do divide by n. However, the denominator is not referencing the number of observations, it's actually referencing degrees of freedom, which is n-1. For you to understand degrees of freedom I would recommend this example using hats.
71+
72+
Basically you divide by the number of things you need to 'know' before you can fill in the blanks yourself. If you are using an entire population, you need every single example as you can't just fill in the blanks. But if you have a sample, you can know all but the last one before you can fill in the blank.
73+
74+
##### Example
75+
76+
![](https://ae01.alicdn.com/kf/HTB1XFW0JXXXXXcKXFXXq6xXFXXX1/225440714/HTB1XFW0JXXXXXcKXFXXq6xXFXXX1.jpg)
77+
78+
Imagine you have a huge bookshelf. You measure the total thickness of the first 6 books and it turns out to be 158mm. This means that the mean thickness of a book based on first 6 samples is 26.3mm.
79+
Now you take out and measure the first book's thickness (one degree of freedom) and find that it is 22mm. This means that the remaining 5 books must have a total thickness of 136mm
80+
Now you measure the second book (second degree of freedom) and find it to be 28mm. So you know that the remaining 4 books should have a total thickness of 108mm .
81+
.
82+
.
83+
In this way, by the time you measure the thickness of the 5th book individually (5th degree of freedom) , you automatically know the thickness of the remaining 1 book.
84+
85+
This means that you automatically know the thickness of 6th book even though you have measured only 5. Extrapolating this concept, In a sample of size n, you know the value of the n'th observation even though you have only taken (n-1) measurements. i.e, the opportunity to vary has been taken away for the n'th observation.
86+
87+
This means that if you have measured (n-1) objects then the nth object has no freedom to vary. Therefore, degree of freedom is only (n-1) and not n.
88+
6789
### 7. What are the assumptions of the normal distribution ? Why is it useful ?
6890
### 8. What are the different approches to outlier detection ? How will you handle the outliers? Why is it useful ?
6991
### 9. Where is RMSE a bad case ? How do we solve this ?

0 commit comments

Comments
 (0)