## 과제: 결정 트리와 랜덤 포레스트를 사용한 moons 데이터셋 분류

### 목표

결정 트리를 사용해 moons 데이터셋 분류 문제를 해결하고, 랜덤 포레스트로 정확도를 향상시켜 보세요.

1. 데이터 준비

In [1]:
# 패키지 로드
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import numpy as np

In [2]:
# 데이터 로드
X, y = make_moons(n_samples=10000, noise=0.4, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

2. 결정 트리 하이퍼파라미터 최적화

In [3]:
param_grid = {
    "max_depth": [None, 5, 10, 20],
    "max_leaf_nodes": [None, 10, 20, 50],
    "min_samples_split": [2, 5, 10]
}
grid_search = GridSearchCV(DecisionTreeClassifier(random_state=42), param_grid, cv=5, scoring="accuracy", n_jobs=-1)
grid_search.fit(X_train, y_train)

In [4]:
# 최적의 하이퍼파라미터로 훈련된 모델
best_model = grid_search.best_estimator_
y_pred_test = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred_test)

3. 랜덤 포레스트 구현

In [5]:
n_trees = 100
n_samples = len(X_train) // 2  # 각 서브셋의 샘플 크기
subset_indices = [np.random.choice(len(X_train), n_samples, replace=True) for _ in range(n_trees)]
sub_models = [DecisionTreeClassifier(random_state=42) for _ in range(n_trees)]

In [6]:
# 각 서브셋으로 결정 트리 학습
for indices, model in zip(subset_indices, sub_models):
    model.fit(X_train[indices], y_train[indices])

4. 다수결 앙상블

In [7]:
predictions = np.array([model.predict(X_test) for model in sub_models])
ensemble_predictions = np.apply_along_axis(lambda x: np.bincount(x).argmax(), axis=0, arr=predictions)
ensemble_accuracy = accuracy_score(y_test, ensemble_predictions)

In [8]:
# 출력
print("Best Parameters:", grid_search.best_params_)
print("Single Tree Accuracy:", round(test_accuracy, 4))
average_tree_accuracy = np.mean([accuracy_score(y_test, model.predict(X_test)) for model in sub_models])
print("Average Single Tree Accuracy:", round(average_tree_accuracy, 4))
print("Ensemble Accuracy:", round(ensemble_accuracy, 4))

Best Parameters: {'max_depth': None, 'max_leaf_nodes': 20, 'min_samples_split': 2}
Single Tree Accuracy: 0.87
Average Single Tree Accuracy: 0.8124
Ensemble Accuracy: 0.8555
