#### 1. What is the approximate depth of a Decision Tree trained (without restrictions) on a training set with 1 million instances?

I am not quite sure that the depth will reach 1 million, but I know it will get extremely large. It will overfit the training instances until it reaches 0 Gini impurity, and will not generalize well at all.

#### 2. Is a node's Gini impurity generally lower or greater than its parent's? Is it *generally* lower/greater or *always* lower/greater?

Gini impurity is lower in a child node than in its parent node, because the tree found an optimal threshold that will lower Gini impurity. This is always the case, because the CART algorithm "stops recursing once it reaches the maximum depth... or it cannot find a split that will reduce impurity." If the algorithm cannot find a suitable solution, that node will not split further and will stay as a leaf.

#### 3. If a Decision Tree is overfitting the training set, is it a good idea to try decreasing `max_depth`?
Yes. The `max_depth` in many ways functions as the number of parameters of the model, as in the case of Polynomial Regression/Classification. Therefore, reducing `max_depth` can prevent overfitting, but there are also other hyperparameters to modify as well.

#### 4. If a Decision Tree is underfitting the training set, is it a good idea to try scaling the input features?

No, Decision Trees aren't influenced by differently scaled data. It's one of the benefits of using Decision Trees, in that you don't need to preprocess your data much at all.

#### 5. If it takes one hour to train a Decision Tree on a training set containing 1 million instances, roughly how much time will it take another Decision Tree on a training set containing 10 million instances?

The training algorithm has a complexity of O(n * m log(m)). n is constant so we can ignore it, so the complexity of training 1mil instances is 1e6 * log(1e6) = 6,000,000, and the complexity of training 10mil instances is 1e7 * log(1e7) = 70,000,000.

70,000,000 / 6,000,000 = 11.66667

So the training of 10 million instances will take 11.6667 times longer, so 11.67 hours. Yikes.

#### 6. If your training set contains 100,000 instances, will setting `presort=True` speed up training?

"For small training sets (less than a few thousand instances), Scikit-Learn can speed up training by presorting the data (set `presort=True`), but this slows down training considerably for larger training sets."

No, it will not speed it up, and in fact, will make it worse.

#### 7. Train and find-tune a Decision Tree for the moons dataset.

a. Generate a moons dataset using `make_moons(n_samples=10000, noise=0.4)`.

In [2]:
from sklearn.datasets import make_moons
x, y = make_moons(n_samples=10000, noise=0.4)

b. Split into a training set and a test set using `train_test_split()`.

In [3]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y)

c. Use grid search with cross-validation (with the help of the `GridSearchCV` class) to find good hyperparameter values for a `DecisionTreeClassifier`. Hint: try various values for `max_leaf_nodes`.

In [5]:
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier


param_grid = [
    {'max_leaf_nodes': [4, 10, 20], 'max_depth': [None, 2, 4]}
]
moon_clf = DecisionTreeClassifier()

grid_search = GridSearchCV(moon_clf, param_grid, cv=5, scoring='neg_mean_squared_error')
grid_search.fit(x,y)

GridSearchCV(cv=5, error_score='raise-deprecating',
             estimator=DecisionTreeClassifier(class_weight=None,
                                              criterion='gini', max_depth=None,
                                              max_features=None,
                                              max_leaf_nodes=None,
                                              min_impurity_decrease=0.0,
                                              min_impurity_split=None,
                                              min_samples_leaf=1,
                                              min_samples_split=2,
                                              min_weight_fraction_leaf=0.0,
                                              presort=False, random_state=None,
                                              splitter='best'),
             iid='warn', n_jobs=None,
             param_grid=[{'max_depth': [None, 2, 4],
                          'max_leaf_nodes': [4, 10, 20]}],
             pre_dispatc

In [7]:
grid_search.best_params_

{'max_depth': None, 'max_leaf_nodes': 4}

In [8]:
moon_clf = grid_search.best_estimator_
moon_clf.fit(x,y)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                       max_features=None, max_leaf_nodes=4,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best')

In [None]:
from sklearn.metrics import accuracy_score

pred = moon_clf.predict()
accuracy_score()