# <center> Data Science in Healthcare : Breast Cancer Detection <center/>
<center> <b>DLBDSME01<b/> - Model Engineering <center/>
<center> IU International University of Applied Sciences <center/>

Hello, in this project, our objective is _(State it here)_

## List of contents :
1. Overview
2. Creating models
3. Training/testing
4. Evaluation
4.1. _Actual vs. Predicted values Plot_
4.2. _Residuals Plots_
4.3. _Metrics_
5. Monitoring
6. Summary

Start with importing the required libraries :

In [1]:
# Importing the required libraries
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score, root_mean_squared_error, accuracy_score, classification_report
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.neural_network import MLPClassifier
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from pathlib import Path
import pandas as pd
import warnings
import mlflow

# Avoid unnecessary warnings
warnings.filterwarnings("ignore")

### 1. Overview
The data we'll use is fictitious, created to reflect the typical operational conditions of the manufacturing machines during regular operation. These conditions are around the following ranges :
* __Sound__       : Between 60 dB and 85 dB.
* __Temperature__ : Between 68°F and 86°F.
* __Humidity__    : Between 40% and 60% of RH.

Starting with loading the data

In [None]:
# Formulating the directory
path = Path.cwd().parent

# Loading the data
data = pd.read_csv(f'{path}/data/Data for Task 1.csv')

# Display a sample of the data
data.sample(15)

In [None]:
# Displaying the data's description
data.describe()

### 2. Creating models
Subsequently, the following models are chosen to become our model :
* Linear Regression.
* Decision Tree.
* Random Forest.
* Support Vector Machine.

In [None]:
# Dictionary to store models
models = {
    'Linear Regression': LinearRegression(),
    'Decision Tree': DecisionTreeRegressor(),
    'Random Forest': RandomForestRegressor(),
    'Support Vector Machine': SVR()
}

### 3. Training/Testing
Next, the models created are going to be trained on 70% of the data and tested on the rest.

In [None]:
# Specifying the train/test data
X_train, X_test, y_train, y_test = train_test_split(
    data[['sound', 'temperature', 'humidity']], data['score'],
    test_size=0.3
)

# Setting a default tracking directory
mlflow.set_tracking_uri(f"file:{path}/mlruns")

# Starting a new experiment
mlflow.create_experiment("Training prospected models")

for model_name, model_instance in models.items():
    # Tagging the experiment (if needed)
    mlflow.set_experiment("Training prospected models")

    # Starting a run
    with mlflow.start_run(run_name=f'Training {model_name}'):
        # Training the model
        model_instance.fit(X_train, y_train)

        # Evaluating the model
        y_estimate = model_instance.predict(X_test)

        # Measuring accuracy of the model
        mse = mean_squared_error(y_test, y_estimate)
        rmse = root_mean_squared_error(y_test, y_estimate)
        mae = mean_absolute_error(y_test, y_estimate)
        r2 = r2_score(y_test, y_estimate)

        # Logging the model
        mlflow.sklearn.log_model(sk_model=model_instance, artifact_path="model",
                                 registered_model_name=model_name)

        # Logging metrics
        mlflow.log_metrics({"MSE": mse, "RMSE": rmse, "MAE": mae, "R-squared": r2})

### 5. Monitoring

For better illustration, a scatter plot is used to provide a visual comparison of the models.

### 6. __Summary__
In this project, 4 models underwent training on fictional data. The models included _Linear Regression_, _Decision Tree_, _Random Forest_, and _Support Vector Machine_.
Upon evaluating these models, it was determined that __Linear Regression__ stood out as the most efficient for the dataset and the preceding visualization effectively depicted the distinctions among the models.

## Author
<a href="https://www.linkedin.com/in/ab0858s/">Abdelali BARIR</a> is a former veteran in the Moroccan's Royal Armed Forces, and a self-taught python programmer. Currently enrolled in B.Sc. Data Science in __IU International University of Applied Sciences__.

## Change Log

| Date         | Version   | Changed By       | Change Description        |
|--------------|-----------|------------------|---------------------------|
| 2024-07-10   | 1.0       | Abdelali Barir   | Modified markdown         |
| ------------ | --------- | ---------------- | ------------------------- |
