# Exploring ML Classification Techniques: Parameter Tuning and Model Quality

This notebook demonstrates how tweaking parameters of a classification algorithm can impact model performance. We'll use the Random Forest Classifier from scikit-learn, retrain the model with different parameter values, and explain the effects of these changes.


In [5]:
## 1. Load Data

##python
%pip install scikit-learn

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


## 2. Baseline Model: Default Parameters

##python
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

rf_default = RandomForestClassifier(random_state=42)
rf_default.fit(X_train, y_train)
y_pred_default = rf_default.predict(X_test)
acc_default = accuracy_score(y_test, y_pred_default)
print(f"Default Accuracy: {acc_default:.4f}")




OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect

### Baseline Model Explanation

The Random Forest Classifier with default parameters provides a solid starting point. It uses 100 trees (`n_estimators=100`), considers all features for splits, and uses Gini impurity for decision making. The baseline accuracy will serve as a reference for further tuning.

---

In [None]:


## 3. Tweaking `n_estimators`: Number of Trees
rf_more_trees = RandomForestClassifier(n_estimators=200, random_state=42)
rf_more_trees.fit(X_train, y_train)
y_pred_more_trees = rf_more_trees.predict(X_test)
acc_more_trees = accuracy_score(y_test, y_pred_more_trees)
print(f"Accuracy with 200 trees: {acc_more_trees:.4f}")


NameError: name 'RandomForestClassifier' is not defined



### Effect of Limiting `max_depth`

Limiting tree depth can prevent overfitting, especially on small datasets. However, too shallow trees may underfit, missing important patterns and reducing accuracy.
 

In [7]:
## 5. Tweaking `min_samples_split`: Minimum Samples to Split

rf_split = RandomForestClassifier(min_samples_split=10, random_state=42)
rf_split.fit(X_train, y_train)
y_pred_split = rf_split.predict(X_test)
acc_split = accuracy_score(y_test, y_pred_split)
print(f"Accuracy with min_samples_split=10: {acc_split:.4f}")


NameError: name 'RandomForestClassifier' is not defined



### Effect of Increasing `min_samples_split`

Increasing `min_samples_split` forces trees to consider more samples before splitting, which can reduce overfitting but may also miss subtle patterns, leading to lower accuracy.


In [None]:
## 6. Summary Table

```python
import pandas as pd

results = pd.DataFrame({
    'Model': ['Default', '200 Trees', 'Max Depth=2', 'Min Samples Split=10'],
    'Accuracy': [acc_default, acc_more_trees, acc_shallow, acc_split]
})
results



## 7. Conclusion

Parameter tuning is crucial for optimizing classifier performance. Increasing trees (`n_estimators`) can improve accuracy up to a point, while limiting depth (`max_depth`) and increasing minimum samples to split (`min_samples_split`) can help control overfitting. However, excessive constraints may degrade model quality by causing underfitting. Always validate changes with test data to ensure improvements are genuine.

---

This notebook demonstrates how thoughtful parameter tuning can enhance or degrade model quality. Experimentation and validation are key to building robust classifiers.