1.What is the estimated depth of a Decision Tree trained (unrestricted) on a one million instance
training set?

ANS-
The estimated depth of a Decision Tree trained on a one million instance training set can vary greatly depending on the complexity of the data and the parameters used for the tree. However, in general, an unrestricted Decision Tree trained on such a large dataset could have a very deep tree, potentially hundreds or thousands of levels deep.

In fact, without any restrictions on the tree depth, it is possible that the tree could become very large and overfit the training data, resulting in poor generalization to new, unseen data. Therefore, it is common to use techniques such as pruning, regularization, or limiting the maximum depth of the tree to prevent overfitting and improve performance on new data.

2.Is the Gini impurity of a node usually lower or higher than that of its parent? Is it always
lower/greater, or is it usually lower/greater?

ANS-
The Gini impurity of a node in a decision tree can be either lower or higher than that of its parent, depending on the characteristics of the data being split.

When a node is split into child nodes, the Gini impurity of the child nodes is calculated based on the distribution of the classes in the child nodes. If the split results in a more homogeneous distribution of classes in the child nodes, then the Gini impurity will decrease compared to the parent node. Conversely, if the split results in a more heterogeneous distribution of classes, then the Gini impurity will increase.

In general, if the decision tree is well-designed and the data is well-behaved, we would expect the Gini impurity of child nodes to be lower than the parent node because the tree is recursively splitting the data into smaller, more homogeneous subsets. However, in some cases, a split may result in a higher Gini impurity due to noise or outliers in the data.

3.Explain if its a good idea to reduce max depth if a Decision Tree is overfitting the training set?

ANS-
If a decision tree is overfitting the training set, it can be a good idea to reduce the maximum depth of the tree to prevent further overfitting and improve the model's generalization performance on new, unseen data.

When a decision tree is overfitting, it means that the tree has learned the noise in the training data, and the model's performance on new data is likely to be worse. Reducing the maximum depth of the tree can help to simplify the model and remove unnecessary complexity, which can help to reduce overfitting and improve generalization performance.

By reducing the maximum depth of the tree, we can limit the number of splits and prevent the tree from becoming too deep and complex. This can help to ensure that the model captures the most important patterns in the data without memorizing the noise in the training set.

However, it's worth noting that reducing the maximum depth too much can lead to underfitting, where the model is too simple and unable to capture the underlying patterns in the data. Therefore, it's important to find the right balance between model complexity and generalization performance by tuning the model's hyperparameters, such as the maximum depth of the tree, on a validation set.

4.Explain if its a good idea to try scaling the input features if a Decision Tree underfits the training
set?

ANS-
Scaling input features is not typically recommended for decision trees because decision trees are not affected by the scale of the input features. Unlike many other machine learning algorithms, decision trees do not make assumptions about the distribution or scale of the input features.

If a decision tree is underfitting the training set, it suggests that the tree is not complex enough to capture the patterns in the data. In this case, scaling the input features is unlikely to help because it does not affect the model's ability to learn complex relationships between the features and the target variable.

5.How much time will it take to train another Decision Tree on a training set of 10 million instances
if it takes an hour to train a Decision Tree on a training set with 1 million instances?

ANS-
It is difficult to estimate the exact amount of time it would take to train a decision tree on a training set of 10 million instances, as the time required can depend on several factors, such as the complexity of the data, the specific algorithm and hardware used, and any hyperparameter tuning required.

However, we can make a rough estimate by assuming that the time required to train a decision tree on a larger dataset is roughly proportional to the size of the dataset. Based on this assumption, we can estimate that training a decision tree on a training set of 10 million instances would take approximately 10 hours (i.e., 10 times longer than training on a set of 1 million instances).

6.Will setting presort=True speed up training if your training set has 100,000 instances?

ANS-
Setting presort=True may speed up training on a small training set, but it is unlikely to have a significant impact on training time for a dataset with 100,000 instances.

When presort=True, the decision tree algorithm pre-sorts the data based on each feature's values before considering splits, which can save computation time during the training process. However, the presorting process can be computationally expensive and may not be beneficial for larger datasets, as it requires significant memory and computational resources.

Therefore, in general, setting presort=True is only recommended for small datasets or datasets with a limited number of features. For larger datasets, it is typically more effective to use alternative techniques to reduce training time, such as parallel processing, subsampling, or dimensionality reduction.

7.Follow these steps to train and fine-tune a Decision Tree for the moons dataset:

a. To build a moons dataset, use make moons(n samples=10000, noise=0.4).

b. Divide the dataset into a training and a test collection with train test split().

c. To find good hyperparameters values for a DecisionTreeClassifier, use grid search with cross-
validation (with the GridSearchCV class). Try different values for max leaf nodes.

d. Use these hyperparameters to train the model on the entire training set, and then assess its
output on the test set. You can achieve an accuracy of 85 to 87 percent.



In [9]:
#To build the moons dataset with 10,000 samples and a noise level of 0.4, 

from sklearn.datasets import make_moons

X, y = make_moons(n_samples=10000, noise=0.4)


In [10]:
#Divide the dataset into a training and a test set using train_test_split():

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [11]:
#Use GridSearchCV to find good hyperparameters for a DecisionTreeClassifier by trying different values for the max_leaf_nodes parameter:

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

param_grid = {'max_leaf_nodes': [2, 4, 8, 16, 32, 64]}

tree_clf = DecisionTreeClassifier(random_state=42)
grid_search = GridSearchCV(tree_clf, param_grid, cv=5)
grid_search.fit(X_train, y_train)


In [12]:
#Once you have found the best hyperparameters using GridSearchCV, you can train the model on the entire training set and evaluate its performance on the test set:

best_params = grid_search.best_params_
tree_clf = DecisionTreeClassifier(max_leaf_nodes=best_params['max_leaf_nodes'], random_state=42)
tree_clf.fit(X_train, y_train)

accuracy = tree_clf.score(X_test, y_test)
print("Test set accuracy: {:.2f}".format(accuracy))


Test set accuracy: 0.85


8.Follow these steps to grow a forest:

a. Using the same method as before, create 1,000 subsets of the training set, each containing
100 instances chosen at random. You can do this with Scikit-ShuffleSplit Learn's class.

b. Using the best hyperparameter values found in the previous exercise, train one Decision
Tree on each subset. On the test collection, evaluate these 1,000 Decision Trees. These Decision

Trees would likely perform worse than the first Decision Tree, achieving only around 80% accuracy,
since they were trained on smaller sets.

c. Now the magic begins. Create 1,000 Decision Tree predictions for each test set case, and
keep only the most common prediction (you can do this with SciPy's mode() function). Over the test
collection, this method gives you majority-vote predictions.

d. On the test range, evaluate these predictions: you should achieve a slightly higher accuracy
than the first model (approx 0.5 to 1.5 percent higher). You've successfully learned a Random Forest
classifier!

In [None]:
# Create 1,000 subsets of the training set, each containing 100 instances chosen at random, using the ShuffleSplit class from Scikit-Learn:

from sklearn.model_selection import ShuffleSplit

n_trees = 1000
n_instances = 100

subsets = []
for i in range(n_trees):
    shuffle_split = ShuffleSplit(n_splits=1, test_size=n_instances, random_state=i)
    indices = next(shuffle_split.split(X_train))
    X_subset = X_train[indices]
    y_subset = y_train[indices]
    subsets.append((X_subset, y_subset))


In [None]:
#Train one Decision Tree on each subset using the best hyperparameter values found in the previous exercise:

from sklearn.base import clone

trees = []
for X_subset, y_subset in subsets:
    tree_clf = DecisionTreeClassifier(max_leaf_nodes=best_params['max_leaf_nodes'], random_state=42)
    tree_clf.fit(X_subset, y_subset)
    trees.append(tree_clf)


In [None]:
#Make 1,000 Decision Tree predictions for each test set case and keep only the most common prediction using SciPy's mode() function:

from scipy.stats import mode

Y_pred = np.empty([n_trees, len(X_test)])
for i, tree in enumerate(trees):
    Y_pred[i] = tree.predict(X_test)

y_pred_majority_votes, n_votes = mode(Y_pred, axis=0)


In [None]:
#Evaluate these predictions on the test set:

from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred_majority_votes.reshape([-1]))
print("Test set accuracy: {:.2f}".format(accuracy))
