
## 8. Conclusion & Recommendations

From the evaluation metrics (R² and RMSE), the performance summary is as follows:

| Model | R² (↑ better) | RMSE (↓ better) | Remarks |
|:------|:--------------:|:---------------:|:--------|
| **Linear Regression** | ~0.852 | ~1403.6 | Simple and interpretable baseline model |
| **Polynomial Regression (degree=2)** | ~0.910 | ~1095.7 | Best predictive performance, but higher model complexity |
| **Ridge Regression** | ~0.852 | ~1403.7 | Similar to Linear; helps with multicollinearity |
| **Lasso Regression** | ~0.852 | ~1403.6 | Similar to Ridge; adds feature selection (sparse coefficients) |

---

### Interpretation

- **Polynomial Regression** achieved the **highest R² (≈0.91)** and the **lowest RMSE (≈1095)**, indicating it captures non-linear relationships in the data effectively.  
- **Linear Regression**, while simpler, still performs decently and is easier to explain — suitable if interpretability is prioritized.  
- **Ridge and Lasso** provided minimal performance gains, implying the dataset doesn’t suffer heavily from overfitting or multicollinearity.

---

### Final Recommendation

- For **predictive accuracy**, **Polynomial Regression** is the most suitable model.  
- For **interpretability and production simplicity**, **Linear Regression** remains a solid choice.  
- Ridge and Lasso are beneficial for regularization but not strictly necessary here.

---

**In summary:**  
> The price of used Toyota cars is most strongly influenced by **Age, Weight, KM, and Horsepower**.  
> Polynomial relationships improve prediction accuracy, suggesting that depreciation and wear effects are **non-linear** in nature.



## 9. Conclusion & Next Steps

### Summary of Findings

This project focused on predicting **used Toyota Corolla car prices** using multiple regression models.  
After cleaning, encoding, and scaling the dataset, we evaluated several algorithms:

| Model | R² (↑ better) | RMSE (↓ better) | Remarks |
|:------|:--------------:|:---------------:|:--------|
| **Linear Regression** | ~0.852 | ~1403.6 | Simple, transparent, and interpretable baseline |
| **Polynomial Regression (degree=2)** | ~0.910 | ~1095.7 | Best overall performance, captures non-linearity effectively |
| **Ridge Regression** | ~0.852 | ~1403.7 | Stabilizes coefficients, minimal improvement |
| **Lasso Regression** | ~0.852 | ~1403.6 | Adds feature selection; performs similarly to Ridge |

**Key Influencing Factors:**  
- **Age_08_04 (Car Age)** — strongest negative impact on price (older cars depreciate).  
- **Weight** — positively correlated with price (heavier, often higher-end models).  
- **KM** — higher mileage reduces price significantly.  
- **HP** — higher horsepower slightly increases resale value.

---

### Next Steps / Future Improvements

To further refine the model and make it deployment-ready:

1. **Hyperparameter Tuning:**  
   Use `GridSearchCV` or `RandomizedSearchCV` to optimize Ridge/Lasso α values, polynomial degree, and cross-validation folds.

2. **Cross-Validation:**  
   Implement `KFold` or `cross_val_score` to ensure model generalization and prevent overfitting.

3. **Pipeline Creation:**  
   Build a complete `sklearn.pipeline` integrating preprocessing, scaling, and model training for reproducibility.

4. **Feature Engineering:**  
   - Introduce interaction terms (e.g., `Age × KM`) for nuanced relationships.  
   - Test log-transforms for skewed variables like `Price` or `KM`.

5. **Model Export:**  
   Save the trained model using `joblib` or `pickle`:
   ```python
   import joblib
   joblib.dump(best_model, 'toyota_price_model.pkl')
   ```

---

### Final Note

The final model demonstrates that **car depreciation is not purely linear**, and incorporating non-linear terms greatly improves accuracy.  
Further validation and tuning can make this model production-ready for **used car price prediction systems**.
