In this project we have data that provides insights into factors influencing defect rates in a manufacturing environment. Our task in this report is to make a Machine Learning model to predict high or low defect occurrences in production processes.
Let's begin with the description of the metrics of the dataset:
ProductionVolume: Number of units produced per day.
Data Type: Integer.
Range: 100 to 1000 units/day.
ProductionCost: Cost incurred for production per day.
Data Type: Float.
Range: $5000 to $20000.
SupplierQuality: Quality ratings of suppliers.
Data Type: Float (%).
Range: 80% to 100%.
DeliveryDelay: Average delay in delivery.
Data Type: Integer (days).
Range: 0 to 5 days.
DefectRate: Defects per thousand units produced.
Data Type: Float.
Range: 0.5 to 5.0 defects.
QualityScore: Overall quality assessment.
Data Type: Float (%).
Range: 60% to 100%.
MaintenanceHours: Hours spent on maintenance per week.
Data Type: Integer.
Range: 0 to 24 hours.
DowntimePercentage: Percentage of production downtime.
Data Type: Float (%).
Range: 0% to 5%.
InventoryTurnover: Ratio of inventory turnover.
Data Type: Float.
Range: 2 to 10.
StockoutRate: Rate of inventory stockouts.
Data Type: Float (%).
Range: 0% to 10%.
WorkerProductivity: Productivity level of the workforce.
Data Type: Float (%).
Range: 80% to 100%.
SafetyIncidents: Number of safety incidents per month.
Data Type: Integer.
Range: 0 to 10 incidents.
EnergyConsumption: Energy consumed in kWh.
Data Type: Float.
Range: 1000 to 5000 kWh.
EnergyEfficiency: Efficiency factor of energy usage.
Data Type: Float.
Range: 0.1 to 0.5.
AdditiveProcessTime: Time taken for additive manufacturing.
Data Type: Float (hours).
Range: 1 to 10 hours.
AdditiveMaterialCost: Cost of additive materials per unit.
Data Type: Float ($).
Range: $100 to $500.
DefectStatus: Predicted defect status.
Data Type: Binary (0 for Low Defects, 1 for High Defects).
According with the Correlation Matrix:
We can see that target variable DefectStatus has a strong correlation with the variables MaintenanceHours, DefectRate and ProductionVolume. If we see the charts of these Variables we notice the following behavior:
- The probability of produce more defects in production increase if the production volume is above 800 pieces per day.
- If the maintenance hours per week are above 10 hours, the probability of produce defects in production increase.
For this project we trained 3 models:
- Logistic Regression
- Random Forest
- Gradient Boosting
And according with the metrics:
- Accuracy
- Precision
- Recall
- F1
- ROC_AUC
The results for the models are the following:
| Accuracy | Precision | Recall | F1 | ROC_AUC |
|---|---|---|---|---|
| 0.75 | 0.93 | 0.75 | 0.83 | 0.84 |
| Accuracy | Precision | Recall | F1 | ROC_AUC |
|---|---|---|---|---|
| 0.95 | 0.95 | 0.99 | 0.97 | 0.88 |
| Accuracy | Precision | Recall | F1 | ROC_AUC |
|---|---|---|---|---|
| 0.94 | 0.95 | 0.98 | 0.96 | 0.91 |
As we can see the models more trustable are "Random Forest" and "Gradient Boosting", where the latter is slightly better, so we will choose this option.
According with the results of this project we can conclude the following:
- The probability of produce more defects in production increase if the production volume is above 800 pieces per day.
- If the maintenance hours per weeks are above 10 hours, the probability of produce defects in production increase.
- The most optimized model for the prediction of Manufacturing Defects is "Gradient Boosting".
The data used in this project was provided for Rabie El Kharoua.
Rabie El Kharoua. (2024). 🏭 Predicting Manufacturing Defects Dataset [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/8715500



