You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: interview_prep.md
+23-1Lines changed: 23 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -63,7 +63,29 @@ The IQR is also used to determine outliers to the data set. This is in conjuctio
63
63
| Based upon (a type of distribution) | Based on Normal distribution. | Based on Student-t distribution. |
64
64
65
65
### 6. Why do we take n-1 when calculating sample variance? Why is it useful ?
66
-
Read about Besel correction
66
+
Read about Besel correction for more technical definition
67
+
68
+
##### Intuitive explaination
69
+
70
+
If you are giving the standard deviation of an entire population and not a sample you actually do divide by n. However, the denominator is not referencing the number of observations, it's actually referencing degrees of freedom, which is n-1. For you to understand degrees of freedom I would recommend this example using hats.
71
+
72
+
Basically you divide by the number of things you need to 'know' before you can fill in the blanks yourself. If you are using an entire population, you need every single example as you can't just fill in the blanks. But if you have a sample, you can know all but the last one before you can fill in the blank.
Imagine you have a huge bookshelf. You measure the total thickness of the first 6 books and it turns out to be 158mm. This means that the mean thickness of a book based on first 6 samples is 26.3mm.
79
+
Now you take out and measure the first book's thickness (one degree of freedom) and find that it is 22mm. This means that the remaining 5 books must have a total thickness of 136mm
80
+
Now you measure the second book (second degree of freedom) and find it to be 28mm. So you know that the remaining 4 books should have a total thickness of 108mm .
81
+
.
82
+
.
83
+
In this way, by the time you measure the thickness of the 5th book individually (5th degree of freedom) , you automatically know the thickness of the remaining 1 book.
84
+
85
+
This means that you automatically know the thickness of 6th book even though you have measured only 5. Extrapolating this concept, In a sample of size n, you know the value of the n'th observation even though you have only taken (n-1) measurements. i.e, the opportunity to vary has been taken away for the n'th observation.
86
+
87
+
This means that if you have measured (n-1) objects then the nth object has no freedom to vary. Therefore, degree of freedom is only (n-1) and not n.
88
+
67
89
### 7. What are the assumptions of the normal distribution ? Why is it useful ?
68
90
### 8. What are the different approches to outlier detection ? How will you handle the outliers? Why is it useful ?
69
91
### 9. Where is RMSE a bad case ? How do we solve this ?
0 commit comments