### 1. What is the estimated depth of a Decision Tree trained (unrestricted) on a one million instance training set?

Estimated Depth of a Decision Tree:
The depth of a Decision Tree trained on a large dataset like one million instances can vary, but generally, 
it could reach up to 20 to 30 levels deep. The depth depends on the complexity and the number of features in the data.
The tree grows until all leaves are pure or until all features are used up.

### 2. Is the Gini impurity of a node usually lower or higher than that of its parent? Is it always lower/greater, or is it usually lower/greater?

Gini Impurity Comparison:
The Gini impurity of a child node is usually lower than that of its parent node. It is not always lower, but it is typically lower, 
because the tree splits the data in a way that minimizes impurity. However, in some cases, depending on the split, the impurity 
might not reduce as expected.

### 3. Explain if it's a good idea to reduce max depth if a Decision Tree is overfitting the training set?

Reducing Max Depth to Prevent Overfitting:
Yes, reducing the max depth of a Decision Tree is a good strategy to prevent overfitting. A deeper tree might capture noise 
and lead to overfitting. By limiting the depth, you constrain the model, making it more generalized to new, unseen data.

### 4. Explain if it's a good idea to try scaling the input features if a Decision Tree underfits the training set?

Scaling Input Features for Decision Trees:
Scaling the input features does not typically affect the performance of Decision Trees, as they are not sensitive to the 
scale of the input data. If a Decision Tree is underfitting, it's better to focus on other hyperparameters like increasing 
the max depth, reducing the minimum samples required to split a node, or allowing more leaf nodes.

### 5. How much time will it take to train another Decision Tree on a training set of 10 million instances if it takes an hour to train a Decision Tree on a training set with 1 million instances?

Training Time Estimate for Larger Dataset:
The time complexity of training a Decision Tree is O(n log n), where n is the number of instances. If it takes 1 hour for 
1 million instances, for 10 million instances, it could take approximately 10 times longer, so around 10 hours. 
However, the exact time might vary based on the implementation and system resources.

### 6. Will setting presort=True speed up training if your training set has 100,000 instances?

Effect of presort=True on Training Speed:
Setting presort=True can speed up training for small datasets, but for large datasets like 100,000 instances, 
it could actually slow down the process. Presorting helps with splits but incurs additional computational overhead 
that might outweigh the benefits for larger datasets.



### 7. Follow these steps to train and fine-tune a Decision Tree for the moons dataset:

a. To build a moons dataset, use make_moons(n_samples=10000, noise=0.4).

b. Divide the dataset into a training and a test collection with train_test_split().

c. To find good hyperparameters values for a DecisionTreeClassifier, use grid search with cross-validation 
   (with the GridSearchCV class). Try different values for max_leaf_nodes.
   
d. Use these hyperparameters to train the model on the entire training set, and then assess its output on the test set. 
   You can achieve an accuracy of 85 to 87 percent.


In [2]:
# Training and Fine-Tuning a Decision Tree:

from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier

# a. Create moons dataset
X, y = make_moons(n_samples=10000, noise=0.4)

# b. Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# c. Grid search for best hyperparameters
param_grid = {'max_leaf_nodes': [10, 20, 30, 40, 50]}
grid_search = GridSearchCV(DecisionTreeClassifier(random_state=42), param_grid, cv=5)
grid_search.fit(X_train, y_train)

# d. Train and evaluate the model
best_model = grid_search.best_estimator_
test_accuracy = best_model.score(X_test, y_test)
test_accuracy  # Expected accuracy between 85% and 87%

0.865

### 8. Follow these steps to grow a forest:

a. Using the same method as before, create 1,000 subsets of the training set, each containing 100 instances 
   chosen at random. You can do this with Scikit-Learn's ShuffleSplit class.
   
b. Using the best hyperparameter values found in the previous exercise, train one Decision Tree on each subset. 
   On the test collection, evaluate these 1,000 Decision Trees. These Decision Trees would likely perform worse 
   than the first Decision Tree, achieving only around 80% accuracy since they were trained on smaller sets.
   
c. Now the magic begins. Create 1,000 Decision Tree predictions for each test set case, and keep only the most 
   common prediction (you can do this with SciPy's mode() function). Over the test collection, this method gives 
   you majority-vote predictions.
   
d. On the test range, evaluate these predictions: you should achieve a slightly higher accuracy than the first 
   model (approx 0.5 to 1.5 percent higher). You've successfully learned a Random Forest classifier!



In [3]:
# Growing a Forest:

from sklearn.model_selection import ShuffleSplit
from scipy.stats import mode
import numpy as np

# a. Create 1,000 subsets
n_subsets = 1000
n_instances = 100
ss = ShuffleSplit(n_splits=n_subsets, train_size=n_instances, random_state=42)

subsets = []
for train_index, _ in ss.split(X_train):
    subsets.append((X_train[train_index], y_train[train_index]))

# b. Train 1,000 Decision Trees on subsets
all_trees_predictions = []
for subset in subsets:
    tree_clf = DecisionTreeClassifier(max_leaf_nodes=grid_search.best_params_['max_leaf_nodes'], random_state=42)
    tree_clf.fit(subset[0], subset[1])
    predictions = tree_clf.predict(X_test)
    all_trees_predictions.append(predictions)

# c. Aggregate predictions with majority voting
majority_votes, _ = mode(np.array(all_trees_predictions), axis=0)

# d. Evaluate the final predictions
forest_accuracy = np.mean(majority_votes.ravel() == y_test)
forest_accuracy  # Expected to be slightly higher than the single Decision Tree's accuracy


  majority_votes, _ = mode(np.array(all_trees_predictions), axis=0)


0.868