## Exercise 1:
#### The following arrays are given:

- X_train, y_train

- X_test, y_test

#### Using the DecisionTreeClassifier class from the scikit-learn package, create classification model (set max_depth=6). Train the model on the train set and evaluate on the test set.

#### In response, print the model accuracy (up to four decimal places) to the console as shown below.

In [2]:
import numpy as np
import pandas as pd

from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier


np.random.seed(42)
raw_data = make_moons(n_samples=2000, noise=0.25, random_state=42)
data = raw_data[0]
target = raw_data[1]

X_train, X_test, y_train, y_test = train_test_split(data, target)

dtc = DecisionTreeClassifier(max_depth = 6)

dtc.fit(X_train, y_train)

score = dtc.score(X_test, y_test)

print(f'Accuracy = {score:.4f}')

Accuracy = 0.9280


## Exercise 2:

#### Using the DecisionTreeClassifier class from the scikit-learn package, create a classification model (set max_depth=6 and min_samples_leaf=6). Train the model on the train set and evaluate on the test set.

#### In response, print the model accuracy (up to four decimal places) to the console as shown below.

In [3]:
dtc = DecisionTreeClassifier(max_depth = 6, min_samples_leaf = 6)

dtc.fit(X_train, y_train)

score = dtc.score(X_test, y_test)

print(f'Accuracy = {score:.4f}')

Accuracy = 0.9300


## Exercise 3:

#### Using the DecisionTreeClassifier class and grid search method (GridSearchCV class - set scoring='accuracy', cv=5) find the optimal values of the max_depth and min_samples_leaf arguments. Search for parameter values from the following:

- for max_depth -> np.arange(1, 10)

- for min_samples_leaf -> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20]

#### Train the model on the train set and evaluate on the test set. In response, print founded optimal values of max_depth and min_samples_leaf to the console.

In [4]:
from sklearn.model_selection import GridSearchCV

dtc = DecisionTreeClassifier()

params = {
    'max_depth': np.arange(1, 10),
    'min_samples_leaf': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20],
}
 
grid_search = GridSearchCV(
    dtc, param_grid=params, scoring='accuracy', cv=5
)

grid_search.fit(X_train, y_train)

print(grid_search.best_params_)

{'max_depth': 6, 'min_samples_leaf': 6}


#### Notes:

- GridSearchCV() exhaustively searches over a specified grid of hyperparameters to find the best model configuration by using cross-validation.
    - Main Arguments:
        - estimator: The model or pipeline you want to tune (e.g., RandomForestClassifier()).
        - param_grid: Dictionary where keys are parameter names and values are lists of parameter settings to try.
        - cv: Number of cross-validation folds (e.g., cv=5).
        - scoring: Metric to evaluate model performance (e.g., scoring='accuracy').
        - Execution: Use grid.fit(X, y) to perform the grid search.

- gridsearch.best_params_: After fitting, this attribute returns the best combination of hyperparameters found during the search.