### Part 1. Decision trees

#### Question 1. Which of these problems does not fall into 3 main types of ML tasks: classification, regression, and clustering?

- Identifying a topic of a live-chat with a customer.  
Определение темы живого чата с клиентом. Классификация 


- Grouping news into topics.  
Группировка новостей в темы. Кластеризация 


- **Predicting LTV (Life-Time Value) - the amount of money spent by a customer in a certain large period of time.**
 


- Listing top products that a user is prone to buy (based on his/her click history).  
Кластеризация

___

#### Question 2. Maximal possible entropy is achieved when all states are equally probable (prove it yourself for a system with 2 states with probabilities p and 1−p). What's the maximal possible entropy of a system with N states? (here all logs are with base 2)

- $ N\log N $


- $ −\log N $


-  $\log N$ **+**


- −$N \log N$

___

$$ \Large S = -\sum_{i=1}^{N}p_i \log_2{p_i} $$

Максимальная энтропия, при условии N состояний - N разных объёктов. Т.о. вероятность встретить объект $\large p_i = p = \frac{1}{N}$. Получаем:
$$ S = -\sum_{i=1}^{N}p_i \log_2{p_i} = -\sum_{i=1}^{N} \frac{1}{N} \log_2{\frac{1}{N}} = - \frac{N}{N} \log_2{\frac{1}{N}} =  \log_2{N}$$

___

#### Question 3. In Topic 3 article, toy example with 20 balls, what's the information gain of splitting 20 balls in 2 groups based on the condition X <= 8?

- ~ 0.1


- ~ 0.01


- ~ 0.001


- ~ **0.0001 +**

<img src="https://nbviewer.jupyter.org/github/Yorko/mlcourse.ai/blob/master/img/topic3_entropy_balls1.png" />




$$ S_0 = -\frac{9}{20}\log_2{\frac{9}{20}}-\frac{11}{20}\log_2{\frac{11}{20}} \approx 1 $$

$$ \Large IG(Q) = S_O - \sum_{i=1}^{q}\frac{N_i}{N}S_i, $$
$$  S_i = -\sum_{i=1}^{N_i}p_i \log_2{p_i} $$

**X <= 8:**
- Первая группа: 4 синих и 5 оранжевых


- Вторая группа: 5 синих и 6 оранжевых

In [1]:
import numpy as np

S_0 = - ( 9/20 * np.log2(9/20)  + 11/20 * np.log2(11/20) )
S_0

0.9927744539878083

In [2]:
S_l = - ( 4/9 * np.log2(4/9) + 5/9 * np.log2(5/9) )
print(S_l)

S_r = - ( 5/11 * np.log2(5/11) + 6/11 * np.log2(6/11) )
print(S_r)

IG = S_0 - 9/20 * S_l  - 11/20 * S_r
print(round(IG, 4))

0.9910760598382222
0.9940302114769565
0.0001


___

#### Question 4. In a toy binary classification task, there are d features $x_1 \ldots x_d$, but target y depends only on $x_1$ and $x_2$: $y = [\frac{x_1^2}{4} + \frac{x_2^2}{9} \leq 16]$, where [⋅] is an indicator function. All of features x3…xd are noisy, i.e. do not influence the target feature at all. Obviously, machine learning algorithms shall perform almost perfectly in this task, where target is a simple function of input features. 

If we train sklearn's DecisionTreeClassifier for this task, which parameters have crucial effect on accuracy (crucial - meaning that if these parameters are set incorrectly, then accuracy can drop significantly)? Select all that apply (to get credits, you need to select all that apply, no partially correct answers).


- max_features


- criterion


- **min_samples_leaf +**


- **max_depth +**

___

### Question 5. 
- Load iris data with sklearn.datasets.load_iris. 


- Train a decision tree with this data, specifying params max_depth=4 and random_state=17 (all other arguments shall be left unchanged).


- Use all available 150 instances to train a tree (do not perform train/validation split). 


- Visualize the fitted decision tree, see topic 3 for examples. 


- Let's call a leaf in a tree pure if it contains instances of only one class. How many pure leaves are there in this tree?


    - 6
    - 7 +
    - 8
    - 9

In [3]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
import pydot

In [4]:
feature_names = load_iris()['feature_names']
feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [13]:
tree = DecisionTreeClassifier(max_depth=4,
                              random_state=17).fit(load_iris().data,
                                                   load_iris().target)
export_graphviz(tree,
                feature_names=feature_names,
                out_file="some.dot", filled=True
               )

(graph,) = pydot.graph_from_dot_file("some.dot")
graph.write_png('somefile.png')

<img src="somefile.png">

___

### Part 2. Ensembles and Random Forest

#### Question 6. 
- There are 7 jurors in the courtroom. Each of them individually can correctly determine whether the defendant is guilty or not with 80% probability. 

How likely is the jury will make a correct verdict jointly if the decision is made by majority voting?

- 20.97%
- 80.00%
- 83.70%
- **96.66% +**

In [30]:
from scipy.special import binom

n = 7
negative = n // 2

p_neg = 0.2
p_pos = 0.8

prob = 0
for k in range(negative + 1):
    prob += binom(n, k) * np.power(p_neg, k) * np.power(p_pos, n - k)
    
print(prob)

0.9666560000000004


___

**Question 7.** In [Topic 5, part 2](https://mlcourse.ai/articles/topic5-part2-rf/), section 2. "Comparison with Decision Trees and Bagging" we show how bagging and Random Forest improve classification accuracy as compared to a single decision tree. Which of the following is a better explanation of the visual difference between decision boundaries built by a single desicion tree and those built by ensemble models?

 1. Ensembles ignore some of the features. Thus picking only important ones, they build a smoother decision boundary 
 
 
 2. **Some of the classification rules built by a decision tree can be applied only to a small number of training instances +**
 
 
 3. When fitting a decision tree, if two potential splits are equally good in terms of information criterion, then a random split is chosen. This leads to some randomness in building a decision tree. Therefore its decision boundary is so jagged
 
 ___
 
 

**Question 8.** Random Forest learns a coefficient for each input feature, which shows how much this feature influences the target feature. True/False?

Не понял вопроса

 1. True
 1. **False +**
 ___

**Question 9.** Suppose we fit `RandomForestRegressor` to predict age of a customer (a real task actually, good for targeting ads), and the maximal age seen in the dataset is 98 years. Is it possible that for some customer in future the model predicts his/her age to be 105 years?

Лес не умеет предсказывать

 1. Yes
 
 
 2. **No +**
 
 ___

**Question 10.** Select all statements supporting advantages of Random Forest over decision trees (some statements might be true but not about Random Forest's pros, don't select those).

 1. Random Forest is easier to train in terms of computational resources
 
 
 2. **Random Forest typically requires more RAM than a single decision tree  +**
 
 
 3. Random Forest typically achieves better metrics in classification/regression tasks
 
 
 4. **Single decision tree's prediction can be much easier interpreted +** из лекции