# Model Performance Analysis

## Overview

This README provides a summary of the model performance analysis conducted on a dataset using various machine learning algorithms. The goal of the analysis was to evaluate and compare the effectiveness of different models in predicting the `price_range` of mobile phones based on their features.

## Problem Formulation

You are starting a revolutionary mobile phone startup and need to understand the pricing of mobile phones. To make informed decisions, you need to predict the price range of mobile phones based on their features, rather than the exact price. Fortunately, a mobile dataset from Kaggle provides the necessary data to learn about the price ranges of mobiles based on their attributes such as Wi-Fi and Bluetooth support.

## Dataset Information

The dataset contains features related to mobile phones and their attributes. Here are the details:

- **battery_power:** Total energy a battery can store in one time, measured in mAh.
- **blue:** Whether the phone supports Bluetooth (1 for Yes, 0 for No).
- **clock_speed:** Speed at which the microprocessor executes instructions.
- **dual_sim:** Whether the phone supports dual SIM cards (1 for Yes, 0 for No).
- **fc:** Front camera megapixels.
- **four_g:** Whether the phone supports 4G (1 for Yes, 0 for No).
- **int_memory:** Internal memory of the phone in gigabytes.
- **m_dep:** Mobile depth (thickness) in centimeters.
- **mobile_wt:** Weight of the mobile phone in grams.
- **n_cores:** Number of processor cores.
- **pc:** Primary camera megapixels.
- **px_height:** Pixel resolution height.
- **px_width:** Pixel resolution width.
- **ram:** Random Access Memory in megabytes.
- **sc_h:** Screen height of the mobile in centimeters.
- **sc_w:** Screen width of the mobile in centimeters.
- **talk_time:** Longest time that a single battery charge will last during talking.
- **three_g:** Whether the phone supports 3G (1 for Yes, 0 for No).
- **touch_screen:** Whether the phone has a touch screen (1 for Yes, 0 for No).
- **wifi:** Whether the phone supports Wi-Fi (1 for Yes, 0 for No).
- **price_range:** Target variable with four price ranges:
  - 0: Low cost
  - 1: Medium cost
  - 2: High cost
  - 3: Very high cost

## Models Evaluated

The following machine learning models were evaluated:

1. **Logistic Regression**
2. **Stochastic Gradient Descent (SGD) Classifier**
3. **Decision Tree Classifier**
4. **K-Nearest Neighbors (KNN)**
5. **Random Forest Classifier**
6. **Naive Bayes Classifier**

## Evaluation Metrics

- **Accuracy Score:** The proportion of correct predictions made by the model.
- **Cross-Validation Score:** Evaluates model performance by splitting the data into multiple folds and assessing the model's ability to generalize to unseen data.
- **Classification Report:** Includes precision, recall, and F1-score for each class.

## Results

### Logistic Regression
- **Training Accuracy:** 0.947
- **Cross-Validation Mean Score:** 0.925

### Stochastic Gradient Descent (SGD) Classifier
- **Training Accuracy:** 0.769
- **Cross-Validation Mean Score:** 0.767

### Decision Tree Classifier
- **Training Accuracy:** 1.0
- **Cross-Validation Mean Score:** 0.834

### K-Nearest Neighbors (KNN)
- **Accuracy:** 0.6305
- **Cross-Validation Mean Score:** 0.4005
- **Classification Report:** 
  - Precision, Recall, and F1-Score vary significantly across classes.

### Random Forest Classifier
- **Accuracy:** 0.8855
- **Cross-Validation Mean Score:** 0.8845
- **Classification Report:** 
  - High precision and recall, particularly for classes 0 and 3.

### Naive Bayes Classifier
- **Accuracy:** 0.812
- **Cross-Validation Mean Score:** 0.812
- **Classification Report:** 
  - Balanced performance with good precision and recall across classes.

## Key Findings

- **Best Model:** The Random Forest classifier demonstrated the highest accuracy and consistency, making it the best-performing model for this dataset.
- **Second Best:** Logistic Regression also performed well, with high accuracy and good cross-validation scores.
- **Underperforming Models:** The KNN model exhibited poor performance, with low accuracy and cross-validation scores. The Decision Tree classifier, despite high training accuracy, showed signs of overfitting.
- **Naive Bayes:** Provided good results but was slightly less accurate compared to the Random Forest.

## Conclusions

1. **Model Choice:** The Random Forest classifier is recommended for its superior accuracy and reliability. Logistic Regression is also a strong candidate, especially when model simplicity is a priority.
2. **Overfitting:** The Decision Tree classifier's perfect training accuracy suggests overfitting. Further evaluation and adjustments are needed to improve its generalization.
3. **Data and Complexity:** Additional data collection and feature engineering may enhance model performance, particularly for complex models.

## Future Work

1. **Data Expansion:** Collect more data to improve the performance of complex models like Random Forests and Support Vector Machines.
2. **Hyperparameter Tuning:** Continue experimenting with hyperparameters to optimize model performance.

