# Model Comparison and Final Conclusions â€“ Software Defect Prediction

Objective: To compare the performance of the evaluated models and select the most suitable approach for software defect prediction based on performance metrics and domain priorities.

In [1]:
import pandas as pd

comparison = pd.DataFrame({
    "Model": ["Logistic Regression", 
              "Random Forest (0.5 threshold)", 
              "Random Forest (0.3 threshold)"],
    "Recall": [0.65, 0.35, 0.62],
    "Precision": [0.35, 0.62, 0.46],
    "F1-score": [0.45, 0.45, 0.53],
    "ROC-AUC": [0.80, 0.81, 0.81]
})

comparison

Unnamed: 0,Model,Recall,Precision,F1-score,ROC-AUC
0,Logistic Regression,0.65,0.35,0.45,0.8
1,Random Forest (0.5 threshold),0.35,0.62,0.45,0.81
2,Random Forest (0.3 threshold),0.62,0.46,0.53,0.81


# Performance Comparison

Logistic Regression achieved higher recall but relatively low precision, resulting in many false positives.

The Random Forest model with the default threshold (0.5) significantly improved precision but at the cost of recall, making it too conservative for defect detection.

After adjusting the classification threshold to 0.3, Random Forest achieved a better balance between recall and precision, resulting in the highest F1-score (0.53) and maintaining the best ROC-AUC (0.81).

# Model Selection

Since defect detection prioritizes identifying defective modules, recall and F1-score are considered more important than accuracy alone.

The Random Forest model with an adjusted threshold of 0.3 provides:
- Comparable recall to Logistic Regression
- Higher precision
- Highest F1-score
- Best overall discrimination (ROC-AUC)

Therefore, Random Forest with threshold adjustment is selected as the preferred model.

# Final Conclusion

This study demonstrated that both linear and ensemble methods can capture predictive signals in static software metrics. While Logistic Regression provides a strong and interpretable baseline, Random Forest with an adjusted decision threshold achieves better balance between recall and precision.

The results confirm that software module size and complexity metrics play a significant role in defect prediction. Overall, ensemble-based approaches appear better suited for capturing nonlinear relationships in defect datasets.