<a href="https://colab.research.google.com/github/drstannwoji2019/ML_Projects/blob/main/RandomForests_SyntheticDataSet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [27]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score, classification_report
from scipy.stats import randint

# Step 1: Create a Synthetic Dataset
X, y = make_classification(
    n_samples=1000,          # Number of samples
    n_features=20,           # Number of features
    n_informative=15,        # Number of informative features
    n_redundant=5,           # Number of redundant features
    n_classes=2,             # Number of classes
    random_state=42
)

# Step 2: Split the Data into Training and Testing Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Define the Random Forest Model and Parameter Distributions
rf_model = RandomForestClassifier(random_state=42)
param_dist = {
    'n_estimators': randint(100, 500),
    'max_depth': randint(5, 20),
    'max_features': ['auto', 'sqrt', 'log2'],
    'min_samples_split': randint(2, 10),
    'min_samples_leaf': randint(1, 5)
}

# Step 4: Run Randomized Search with Cross-Validation
random_search = RandomizedSearchCV(
    estimator=rf_model,
    param_distributions=param_dist,
    n_iter=10,            # Number of random combinations to try
    cv=3,                 # 3-fold cross-validation
    verbose=1,            # Print progress
    random_state=42,
    n_jobs=-1             # Use all available cores
)

# Fit the Randomized Search model
random_search.fit(X_train, y_train)

# Get the best parameters and best model
best_params = random_search.best_params_
best_rf_model = random_search.best_estimator_
print(f'Best Parameters: {best_params}')

# Step 5: Evaluate the Tuned Model on Test Data
y_pred = best_rf_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy on test data: {accuracy:.2f}')
print(classification_report(y_test, y_pred))


Fitting 3 folds for each of 10 candidates, totalling 30 fits


12 fits failed out of a total of 30.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
12 fits failed with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/sklearn/model_selection/_validation.py", line 888, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/local/lib/python3.10/dist-packages/sklearn/base.py", line 1466, in wrapper
    estimator._validate_params()
  File "/usr/local/lib/python3.10/dist-packages/sklearn/base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "/usr/local/lib/python3.10/dist-packages/sklearn/utils/_param_validation.py", line 95, in validate_parameter_constraints
    raise InvalidParameterError(
sklea

Best Parameters: {'max_depth': 12, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 3, 'n_estimators': 291}
Accuracy on test data: 0.88
              precision    recall  f1-score   support

           0       0.89      0.88      0.89       106
           1       0.86      0.88      0.87        94

    accuracy                           0.88       200
   macro avg       0.88      0.88      0.88       200
weighted avg       0.88      0.88      0.88       200

