# TP5: Decision trees & random forests
The aim of this tutorial is to get familiar with the use of decision trees and their generalizations on simple examples using `scikit-learn` tools.

## Completing your installation first
You will need to install packages `python-graphviz` first. If needed, uncomment the `conda` command below:

In [None]:
# If needed, uncomment the line below:
# pip install graphviz
# import os
# os.environ["PATH"] += os.pathsep + 'C:\\Program Files\\Graphviz\\bin\\'

In [None]:
from pylab import *

# Load matplotlib
import matplotlib.pyplot as plt

# Load the library with the iris dataset
from sklearn.datasets import load_iris, load_wine

# Load scikit's decision tree classifier
from sklearn import tree

# Load scikit's random forest classifier library
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor

# To visualize trees
import graphviz 

# Load pandas to manipulate data frames (Excel like)
import pandas as pd

# Load seaborn
import seaborn as sns

# Set random seed
np.random.seed(0)

The data for this tutorial is famous. Called, **the iris dataset**, it contains four variables measuring various parts of iris flowers of three related species, and then a fourth variable with the species name. The reason it is so famous in machine learning and statistics communities is because the data requires very little preprocessing (i.e. no missing values, all features are floating numbers, etc.).

In [None]:
iris = load_iris()

## Step 1: explore the data set
1. What is the structure of the object `iris` ?

2. Plot this dataset in a well chosen set of representations to explore the data.

## Using `pandas` to manipulate the data
Pandas is great to manipulate data in a Microsoft Excel like way.

In [None]:
import pandas as pd

# Create a dataframe with the four feature variables
df = pd.DataFrame(iris.data, columns=iris.feature_names)

# View the top 5 rows
df.head()

In [None]:
# Add a new column with the species names, this is what we are going to try to predict
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

# View the top 5 rows
df.head()

### Question 1

`iris` is a `sklearn.utils.Bunch` object, which is an extension of a dictionary. It contains the following key/value couples:
- `data`: a `numpy.array` of `float` containing the different attributes for each sample of the Iris dataset.
- `feature_names`: a `list` explaining what each value of a sample corresponds to. Here, they correspond to various characteristics of the flower, given in cm.
- `target`: a `numpy.array` of `int` containing the class label of each sample.
- `target_names`: a `numpy.array` of `String` explaining what species of flower each class label corresponds to.

There are also key/value couples containing metadata about the dataset that we won't be using here.

### Question 2

In [None]:
df.species.value_counts()

The dataset contains 150 samples, equally distributed between the 3 species (50 of each).

Let's represent this dataset using a `seaborn.pairplot`, which plots pairwise relationships between the features. It allows us to have a visual representation of these attributes even though the dimension of data is greater than 2.

In [None]:
sns.pairplot(df, hue="species")

We can observe that in this pairwise representation, the `setosa` species seems to always be linearly separable from the other two species. However, there is more overlap between the `versicolor` and `virginica` species, which could make the prediction more challenging for these two classes.

We can also observe that the separation between the 3 classes is clearer for the `petal width` and `petal length` attributes, which means they will probably be decisive attributes for the classification.

Let's use a last representation to confirm our observations: a `boxplot`, which shows the repartition of data for a single attribute. Let's compare the `boxplots` of one attribute for which the distribution between classes is clear (`petal length`), and one for which it isn't (`sepal length`).

In [None]:
plt.figure(figsize=(15,7))
plt.subplot(121)
sns.boxplot(x="species", y="petal length (cm)", data=df)

plt.subplot(122)
sns.boxplot(x="species", y="sepal length (cm)", data=df)

plt.show()

As expected, the distribution for `petal length` is very concentrated, even though there is a bit of overlap between `versicolor` and `virginica`. For `sepal length`, the distributions are more spread which causes overlap between the 3 classes.

## Step 2: create training and test sets

Create a new column that for each row, generates a random number between 0 and 1, and if that value is less than or equal to .75, then sets the value of that cell as True and false otherwise. This is a quick and dirty way of randomly assigning some rows to be used as the training data and some as the test data.

In [None]:
df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75

# View the top 5 rows
df.head()

In [None]:
# Create two new dataframes, one with the training rows, one with the test rows
train, test = df[df['is_train']==True], df[df['is_train']==False]

In [None]:
# Show the number of observations for the test and training dataframes
print('Number of observations in the training data:', len(train))
print('Number of observations in the test data:',len(test))

In [None]:
# Create a list of the feature column's names
features = df.columns[:4]

# View features
features

In [None]:
# train['species'] contains the actual species names. Before we can use it,
# we need to convert each species name into a digit. So, in this case there
# are three species, which have been coded as 0, 1, or 2.
y = pd.factorize(train['species'])[0]

## Step 3: decision trees for the iris dataset
The method `tree.DecisionTreeClassifier()` from `scikit-learn` builds decision trees objects as follows:

In [None]:
clf = tree.DecisionTreeClassifier()
clf = clf.fit(train[features], y)

# Using the whole dataset you may use directly:
#clf = clf.fit(iris.data, iris.target)

The `export_graphviz` exporter supports a variety of aesthetic options, including coloring nodes by their class (or value for regression) and using explicit variable and class names if desired. Jupyter notebooks also render these plots inline automatically:

In [None]:
dot_data = tree.export_graphviz(clf, out_file=None, 
                         feature_names=iris.feature_names,  
                         class_names=iris.target_names,  
                         filled=True, rounded=True,  
                         special_characters=True)  
graph = graphviz.Source(dot_data)  
graph 

We can also export the tree in Graphviz format and  savethe resulting graph in an output file iris.pdf:

In [None]:
dot_data = tree.export_graphviz(clf, out_file=None) 
graph = graphviz.Source(dot_data) 
graph.render("iris") 

After being fitted, **the model can then be used to predict the class of samples**:

In [None]:
class_pred = clf.predict(iris.data[:1, :])
class_pred

## Exercise 1
1. Train the decision tree on the iris dataset and explain how one should read blocks in `graphviz` representation of the tree.

2. Plot the regions of decision with the points of the training set superimposed.

*Indication: you may find the function `plt.contourf` useful.

### Question 1

The decision tree was trained in the above cells. The blocks in `graphviz` give information about every node, including:
- the attribute used to partition the node as well as the threshold used to classify samples;
- the gini index, which is a measure of impurity: the closest it is to 0, the purest the node;
- the number of samples in this part of the tree (reachable from this node);
- the repartition of these samples in the 3 classes;
- the dominant class in this node.

### Question 2

The code to display pairwise regions of decision was found on the [scikit documentation website](https://scikit-learn.org/0.15/auto_examples/tree/plot_iris.html). We slightly changed the way predictions are displayed for clarity's sake.

In [None]:
# Parameters
n_classes = 3
plot_colors = "brg"
plot_step = 0.02

plt.figure(figsize=(20,10))

for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3],
                                [1, 2], [1, 3], [2, 3]]):
    # We only take the two corresponding features
    X = iris.data[:, pair]
    y = iris.target

    # Shuffle
    idx = np.arange(X.shape[0])
    np.random.seed(13)
    np.random.shuffle(idx)
    X = X[idx]
    y = y[idx]

    # Standardize
    mean = X.mean(axis=0)
    std = X.std(axis=0)
    X = (X - mean) / std

    # Train
    clf_display = tree.DecisionTreeClassifier().fit(X, y)

    # Plot the decision boundary
    plt.subplot(2, 3, pairidx + 1)

    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                         np.arange(y_min, y_max, plot_step))

    Z = clf_display.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    cs = plt.contourf(xx, yy, Z, colors=['b', 'y', 'g', 'r', 'purple'], alpha = 0.3)

    plt.xlabel(iris.feature_names[pair[0]])
    plt.ylabel(iris.feature_names[pair[1]])
    plt.axis("tight")

    # Plot the training points
    for i, color in zip(range(n_classes), plot_colors):
        idx = np.where(y == i)
        plt.scatter(X[idx, 0], X[idx, 1], c = color, label=iris.target_names[i],
                    cmap=plt.cm.Pastel1)

    plt.axis("tight")

plt.suptitle("Decision surface of a decision tree using paired features")
plt.legend()
plt.show()

As we expected, some pairs of attribute are more easily separable than others, especially those involving the `petal length` and `petal width` features. 

We also confirmed that `setosa` is more easily predicted than `versicolor` and `virginica`. However, even in the regions where there is a lot of overlap between them, the tree is able to separate them fairly well.

We can also notice that there may be some overfitting, since some regions contain single isolated points of the training set. This is a known weakness of decision trees and will be mitigated by using random forests.

## Exercise 2
1. Build 2 different trees based on a sepal features (sepal lengths, sepal widths) vs petal features (petal lengths, petal widths) only: which features are the most discriminant?

2. Compare performances with those obtained using all features.

3. Try the same as above using the various splitting criterion available, Gini's index, classification error or cross-entropy. Comment on your results. 

### Question 1

In [None]:
## Tree based on sepal features
features_sepal = features[0:2]
y = pd.factorize(train['species'])[0]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(train[features_sepal], y)

dot_data = tree.export_graphviz(clf, out_file=None, 
                         feature_names=iris.feature_names[0:2],  
                         class_names=iris.target_names,  
                         filled=True, rounded=True,  
                         special_characters=True)  
graph = graphviz.Source(dot_data)  
graph 

In [None]:
## Tree based on petal features
features_petal = features[2:4]
y = pd.factorize(train['species'])[0]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(train[features_petal], y)

dot_data = tree.export_graphviz(clf, out_file=None, 
                         feature_names=iris.feature_names[2:4],  
                         class_names=iris.target_names,  
                         filled=True, rounded=True,  
                         special_characters=True)  
graph = graphviz.Source(dot_data)  
graph 

As we could already tell from the regions of decision, the petal features seem to be much more discriminant than the sepal features. Indeed, the tree generated from sepal features is much more complex and deep than the one generated with petal features: it shows that each node struggles to distinguish classes based on sepal features only.

### Question 2

To compare the performances between the trees, we will compare the scores and confusion matrices they obtain on the test set.

In [None]:
# Training
y = pd.factorize(train['species'])[0]
y_test = pd.factorize(test['species'])[0]

clf_full = tree.DecisionTreeClassifier()
clf_full = clf_full.fit(train[features], y)

clf_petal= tree.DecisionTreeClassifier()
clf_petal = clf_petal.fit(train[features_petal], y)

clf_sepal = tree.DecisionTreeClassifier()
clf_sepal = clf_sepal.fit(train[features_sepal], y)

In [None]:
from sklearn.metrics import accuracy_score

# Scores
pred_full = clf_full.predict(test[features])
score_full = accuracy_score(y_test, pred_full)
print(f"The score obtained with all features is: {score_full}")

pred_petal = clf_petal.predict(test[features_petal])
score_petal = accuracy_score(y_test, pred_petal)
print(f"The score obtained with petal features only is: {score_petal}")

pred_sepal = clf_sepal.predict(test[features_sepal])
score_sepal= accuracy_score(y_test, pred_sepal)
print(f"The score obtained with sepal features only is: {score_sepal}")

In [None]:
# Confusion matrices
pd.crosstab(test['species'], pred_full, rownames=['Actual Species'], colnames=['Predicted Species (all features)'])

In [None]:
pd.crosstab(test['species'], pred_petal, rownames=['Actual Species'], colnames=['Predicted Species (petal features)'])

In [None]:
pd.crosstab(test['species'], pred_sepal, rownames=['Actual Species'], colnames=['Predicted Species (sepal features)'])

We can observe that:
- sepal features give a very poor score, which was to be expected;
- using petal features give an even better score than using all features: it shows that not only are sepal features not very discriminant, they even create more confusion for the classifier, which lowers the score;
- the errors mostly occur with confusion between `versicolor` and `virginica`, further illustrating previous observations.

### Question 3

Let's compare the 3 scores we just obtained with Gini index to the results with cross entropy.

In [None]:
# Training with 
y = pd.factorize(train['species'])[0]
y_test = pd.factorize(test['species'])[0]

clf_full = tree.DecisionTreeClassifier(criterion="entropy")
clf_full = clf_full.fit(train[features], y)

clf_petal= tree.DecisionTreeClassifier(criterion="entropy")
clf_petal = clf_petal.fit(train[features_petal], y)

clf_sepal = tree.DecisionTreeClassifier(criterion="entropy")
clf_sepal = clf_sepal.fit(train[features_sepal], y)

# Scores
pred_full = clf_full.predict(test[features])
score_full = accuracy_score(y_test, pred_full)
print(f"The score obtained with all features is: {score_full}")

pred_petal = clf_petal.predict(test[features_petal])
score_petal = accuracy_score(y_test, pred_petal)
print(f"The score obtained with petal features only is: {score_petal}")

pred_sepal = clf_sepal.predict(test[features_sepal])
score_sepal= accuracy_score(y_test, pred_sepal)
print(f"The score obtained with sepal features only is: {score_sepal}")

We can observe that the accuracy scores are better using cross entropy. In class we saw that the choice of a criterion is an empirical process, since it depends on the problem at hand. 

### Going further ahead (not mandatory) 
Try the same approach adapted to another toy dataset from `scikit-learn` described at:
http://scikit-learn.org/stable/datasets/index.html

Play with another dataset available at:
http://archive.ics.uci.edu/ml/datasets.html

## Step 4: Random forests
Go to 

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html 

for a documentation about the `RandomForestClassifier` provided by `scikit-learn`.

Since target values must be integers, we first need to transform labels into numbers as below.

In [None]:
# train['species'] contains the actual species names. Before we can use it,
# we need to convert each species name into a digit. So, in this case there
# are three species, which have been coded as 0, 1, or 2.
y = pd.factorize(train['species'])[0]

# View target
y

In [None]:
# Create a random forest Classifier. By convention, clf means 'Classifier'
rf = RandomForestClassifier(n_jobs=2, random_state=0)

# Train the Classifier to take the training features and learn how they relate
# to the training y (the species)
rf.fit(train[features], y)

**Make predictions** and create actual english names for the plants for each predicted plant class:

In [None]:
preds = rf.predict(test[features])
preds_names = pd.Categorical.from_codes(preds, iris.target_names)
preds_names

### Create a confusion matrix

In [None]:
# Create confusion matrix unsing pandas:
pd.crosstab(test['species'], preds, rownames=['Actual Species'], colnames=['Predicted Species'])

## Feature selection using random forests byproducts

One of the interesting use cases for random forest is feature selection. One of the byproducts of trying lots of decision tree variations is that you can examine which variables are working best/worst in each tree.

When a certain tree uses one variable and another doesn't, you can compare the value lost or gained from the inclusion/exclusion of that variable. The good random forest implementations are going to do that for you, so all you need to do is know which method or variable to look at.

### View feature importance
While we don't get regression coefficients like with ordinary least squares (OLS), we do get a score telling us how important each feature was in classifying. This is one of the most powerful parts of random forests, because we can clearly see that petal width was more important in classification than sepal width.

In [None]:
# View a list of the features and their importance scores
list(zip(train[features], rf.feature_importances_))

## Exercise 3
1. Comment on the feature importances with respect to your previous observations on decision trees above.

2. Extract and visualize 5 trees belonging to the random forest using the attribute `estimators_` of the trained random forest classifier. Compare them. *Note that you may code a loop on extracted trees.*

3. Study the influence of parameters like `max_depth`, `min_samples_leaf` and `min_samples_split`. Try to optimize them and explain your approach and choices.

4. How is estimated the prediction error of a random forest ?
*Indication: have a look at parameter `oob_score`.*
What are out-of-bag samples ?

5. What should you do when classes are not balanced in the dataset ? (that is when there are much more examples of one class than another)

### Question 1

The feature importances confirm the observations we had already made at previous questions: petal features are way more significant than sepal features.

### Question 2

In [None]:
for estimator in rf.estimators_[:5] :
    dot_data = tree.export_graphviz(estimator, out_file=None, 
                         feature_names=iris.feature_names,  
                         class_names=iris.target_names,  
                         filled=True, rounded=True,  
                         special_characters=True)  
    graph = graphviz.Source(dot_data)  
    # graph.view() # uncomment to display the 5 graphs in browser

We can observe that the trees have very different structures (chosen features, thresholds...): it illustrates that the decision tree method is very unstable, hence the motivation to use a random forest and average results over the different estimators.

We can also notice that the trees tend to be longer when the first node uses sepal features to split the dataset, which further confirms their poor separating power.

### Question 3

In [None]:
y = pd.factorize(train['species'])[0]

In [None]:
import warnings
warnings.filterwarnings("ignore")

# Choosing max_depth
best_max_depth = 0
best_score = 0
scores = []
for max_depth in range(1,20):
    rf = RandomForestClassifier(n_jobs=2, random_state=0, max_depth=max_depth, oob_score=True)
    rf.fit(train[features], y)
    pred = rf.predict(test[features])
    score = rf.oob_score_
    scores.append(score)
    if score > best_score :
        best_score = score
        best_max_depth = max_depth
print(f"Best oob score: {best_score} | Best max_depth: {best_max_depth}")

In [None]:
plt.plot(scores)
plt.title("Evolution of score depending of max_depth")

In [None]:
# Choosing min_samples_leaf
best_min_samples_leaf = 0
best_score = 0
scores = []
for min_samples_leaf in range(1,20):
    rf = RandomForestClassifier(n_jobs=2, random_state=0, min_samples_leaf=min_samples_leaf, oob_score=True)
    rf.fit(train[features], y)
    pred = rf.predict(test[features])
    score = rf.oob_score_
    scores.append(score)
    if score > best_score :
        best_score = score
        best_min_samples_leaf = min_samples_leaf
print(f"Best score: {best_score} | Best min_samples_leaf: {best_min_samples_leaf}")

In [None]:
plt.plot(scores)
plt.title("Evolution of score depending of min_samples_leaf")

In [None]:
# Choosing min_samples_split
best_min_samples_split = 0
best_score = 0
scores = []
for min_samples_split in range(2,20):
    rf = RandomForestClassifier(n_jobs=2, random_state=0, min_samples_split=min_samples_split, oob_score=True)
    rf.fit(train[features], y)
    pred = rf.predict(test[features])
    score = score = rf.oob_score_
    scores.append(score)
    if score > best_score :
        best_score = score
        best_min_samples_split = min_samples_split
print(f"Best score: {best_score} | Best min_samples_split: {best_min_samples_split}")

In [None]:
plt.plot(scores)
plt.title("Evolution of score depending of min_samples_leaf")

In [None]:
# Trying to train a model with all 3 best parameters
rf = RandomForestClassifier(n_jobs=2, random_state=0, min_samples_split=best_min_samples_split, max_depth=best_max_depth, min_samples_leaf=best_min_samples_leaf)
rf.fit(train[features], y)
pred = rf.predict(test[features])
score = accuracy_score(y_test, pred)
print(f"Accuracy score with best parameters: {score}")

In [None]:
for estimator in rf.estimators_[:5] :
    dot_data = tree.export_graphviz(estimator, out_file=None, 
                         feature_names=iris.feature_names,  
                         class_names=iris.target_names,  
                         filled=True, rounded=True,  
                         special_characters=True)  
    graph = graphviz.Source(dot_data)  
    # graph.view() # uncomment to display the 5 graphs in browser

To choose the best parameters, we tried to tune them individually by looping over values between 1 and 20 (which seem to be reasonable values given the trees we had obtained earlier). We then chose the 3 values that gave the best results as our optimized set of parameters.

We used the `oob_score`  to evaluate the performance of the different estimator (cf next questions for explanations).

Here's what we can observe:
- The 3 parameters, though they do influence the `oob_score`, don't have that much of an impact on it (a few percents at most).
- The parameters don't seem to follow a given law, hence the need to determine them empirically.
- The `oob_score` obtained with the 3 best parameters is better than each of the individual best scores, which shows that our method seems to be efficient.

### Question 4

The prediction error of a random forest is estimated by the mistakes it makes on out-of-bag samples, that's to say the samples that were not used during the training process because of bootstrapping. It evaluates the ability of the estimator to generalize on data it had not "seen" during training.

### Question 5

A solution to mitigate imbalance would be to over-sample the minority class or under-sample the majority class during bootstrapping, so that the resulting dataset becomes balanced. Another solution would be to use class weighting, which means placing a heavier penalty on misclassifying the minority class. 

Both solutions can be implemented using `sklearn` or `imblearn`, as explained in more detail in [this article](https://machinelearningmastery.com/bagging-and-random-forest-for-imbalanced-classification/). 

## Step 5: a small example of regression using random forests
Random forest is capable of learning without carefully crafted data transformations. Take the the $f(x) = \sin(x)$ function for example.

Create some fake data and add a little noise.

In [None]:
x = np.random.uniform(-2.5, 2.5, 1000)
y = np.sin(x) + np.random.normal(0, .1, 1000)

plt.plot(x,y,'ko',markersize=1,label='data')
plt.plot(np.arange(-2.5,2.5,0.1),np.sin(np.arange(-2.5,2.5,0.1)),'r-',label='ref')
plt.show()

If we try and build a basic linear model to predict y using x we end up with a straight line that sort of bisects the sin(x) function. Whereas if we use a random forest, it does a much better job of approximating the sin(x) curve and we get something that looks much more like the true function.

Based on this example, we will illustrate how the random forest isn't bound by linear constraints.

## Exercise 4
1. Apply random forests on this dataset for regression and compare performances with ordinary least squares regression.
*Note that ordinay least square regression is available thanks to:
from sklearn.linear_model import LinearRegression*

2. Comment on your results.

### Question 1

### Indications:
You may use half of points for training and others to test predictions. Then you will have an idea of how far the random forest predictor fits the sinus curve.

To this aim, you will need to use the model `RandomForestRegressor`. Be careful that when only 1 feature `x` is used as an input, you will need to reshape it by `x.reshape(-1,1)` when using methods `fit` and `predict`.

In [None]:
models = [RandomForestRegressor(n_estimators=30, max_depth=4),
          LinearRegression()]

plt.figure(figsize=(15,5))

for i, model in enumerate(models):
    # Training regressor
    model.fit(x[0::2].reshape(-1, 1),y[0::2])
    
    # Testing the regressor
    pred = model.predict(x[1::2].reshape(-1, 1))
    plt.subplot(1,len(models), i+1)
    plt.title(f"Regression with {type(model).__name__}")
    plt.plot(x[1::2],pred,'bo',markersize=1,label='data')
    plt.plot(np.arange(-2.5,2.5,0.1),np.sin(np.arange(-2.5,2.5,0.1)),'r-',label='ref')

### Question 2

As expected, the linear regression is only able to learn a linear function and is therefore unable to predict the `sin` curve. The `RandomForestRegressor`, on the other hand, does a good job at approximating the curve, since it isn't bound by a specific model of functions.

Out of curiosity and to understand how it works, we display one of the trees of the `RandomForestRegressor`.

In [None]:
model = RandomForestRegressor(n_estimators=30, max_depth=4)
model.fit(x[0::2].reshape(-1, 1),y[0::2])

estimator = model.estimators_[0]
dot_data = tree.export_graphviz(estimator, out_file=None,  
                        filled=True, rounded=True,  
                        special_characters=True)  
graph = graphviz.Source(dot_data)  
graph

We note that the function used to classify points is a squared error function. The strength of random forests here is that by averaging the results given by 100 trees, it is able to reconstruct a complex function quite well.

## Documentation

### Decision trees
http://scikit-learn.org/stable/modules/tree.html

### Random forests
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

### Plot decision surface : using `plt.contourf`
http://scikit-learn.org/stable/auto_examples/tree/plot_iris.html#sphx-glr-auto-examples-tree-plot-iris-py

## Pruning trees: not available in scikit-learn.
Since post-pruning of tree is not implemented in scikit-learn, you may think of coding your own pruning function. For instance, taking into account the numer of samples per leaf as proposed below:

In [None]:
# Pruning function (useful ?)
def prune(decisiontree, min_samples_leaf = 1):
    if decisiontree.min_samples_leaf >= min_samples_leaf:
        raise Exception('Tree already more pruned')
    else:
        decisiontree.min_samples_leaf = min_samples_leaf
        tree = decisiontree.tree_
        for i in range(tree.node_count):
            n_samples = tree.n_node_samples[i]
            if n_samples <= min_samples_leaf:
                tree.children_left[i]=-1
                tree.children_right[i]=-1
                