# Machine Learning Project Report

In this project, we explored various approaches to building a linear regression model. Our goal was to see how different data manipulations would affect the model’s performance. Below is a summary of our experiments and findings.

---

## Version 1 (v1)
- **Method**: 
  - We started by training a linear regression model on the raw data.
  - The data was split into training and test sets without any preprocessing or feature manipulation.
  - This model was saved as **v1** in our `ML/saved_model` directory.
- **Result**:
  - **R Squared**: 0.61637  
  - **MSE**: 0.0229

---

## Version 2 (v2)
- **Method**: 
  - We tried dropping one column from each pair of symmetrical columns from the dataset before training.
  - We then retrained the linear regression model.
- **Result**:
  - **R Squared**: 0.61700  
  - **MSE**: 0.0228

---

## Version 3 (v3)
- **Method**:
  - We computed the correlation matrix for our features and dropped the least correlated columns (with respect to the target variable).
  - We also tested combinations including dropping symmetrical columns alongside the least correlated ones.
  - The best outcome in this version came from only dropping FSM.
- **Result**:
  - **R Squared**: 0.62206  
  - **MSE**: 0.0225

---

## Version 4 (v4) – *Champion Model*
- **Method**:
  - We multiplied the symmetrical columns to create a new feature and added it to the dataset.
  - We also tried creating an average of the symmetrical columns and dropping the originals, but that gave worse results.
  - After experimentation, the best performing approach in v4 was to use the newly created column from the multiplied symmetrical features.
- **Result**:
  - **R Squared**: 0.64677  
  - **MSE**: 0.0210

---

## Additional Notes
- We tried weighting some of the features based on guidelines provided by the organization, but it did not improve model accuracy.
- Overall, **v4** gave us the highest R² and the lowest MSE, making it our **champion model**.

---

## Summary of Results

| **Version** | **Used Method**         | **R Squared** | **MSE**   |
|-------------|-------------------------|--------------:|----------:|
| 1.0         | Linear regression      | 0.61637       | 0.0229    |
| 2.0         | Dropped one symmetrical column | 0.61700       | 0.0228    |
| 3.0         | Dropped least-correlated + symmetrical features | 0.62206       | 0.0225    |
| 4.0         | Created new column by multiplying symmetrical features | **0.64677**  | **0.0210** |

---


## FrontEnd And BackEnd

## Frontend Implementation
The frontend is built with **React** and provides a user-friendly interface for interacting with the backend.

### Key Features
1. **Make Prediction**
   - Allows users to upload a file.
   - Provides a dropdown to select a regression model.
   - Sends the data to the backend for predictions.

2. **Prediction Results**
   - Displays prediction scores and corresponding risk categories.
   - Includes a **speedometer visualization** for easy interpretation.

3. **Health System**
   - Monitors the system status.
   - Displays whether the system is in a **healthy** state.

4. **Model Management**
   - Allows users to update and refresh available models.
   - Ensures the latest models are accessible for predictions.

## Integration
- The frontend interacts with the backend via API calls

## Backend Implementation
The backend is built with **FastAPI** and it is responsible for handling machine learning predictions, model management, and health monitoring. It consists of:

### Machine Learning Models
- Supports multiple pre-trained models that can be selected by the user.
- Predictions return a **performance score** categorized into four risk levels:
  - **Bad (0-39):** Poor performance or high risk.
  - **Good (40-69):** Acceptable performance or moderate risk.
  - **Great (70-89):** Strong performance or low risk.
  - **Excellent (90-100):** Exceptional performance or minimal risk.

### API Endpoints
The backend exposes RESTful API endpoints for:
- **Prediction**: Accepts a file and model selection, then returns predictions.
- **Health Status**: Monitors system health and displays status.
- **Model Management**: Allows updating and refreshing available models.



## Conclusion
This project successfully integrates a **machine learning backend** with a **React frontend** to provide a seamless user experience. The system is scalable, easy to use, and allows real-time model predictions with clear risk assessments.


## MLflow Implementation

At the end of this development cycle, we began setting up **MLflow** for experiment tracking. Here’s a brief overview:

1. **MLflow Server Setup**  
   - We installed and configured the MLflow server to run locally.
   - Verified that the UI is accessible, ensuring that models, metrics, and parameters can be logged.

2. **Initial Tests**  
   - We performed a few sample runs to confirm logging of artifacts and metrics works as expected.
   - Verified that each new run is recorded correctly, including version tags for easy comparison.

3. **Next Steps**  
   - Starting next week, we will integrate MLflow into all new experiments.
   - Ensure that metrics, hyperparameters, and model versions are systematically tracked.
   - This will streamline comparison between different model iterations.

With the MLflow server now operational, we’re positioned to maintain detailed experiment records and improve reproducibility across our machine learning workflows.
