<a href="https://colab.research.google.com/github/ArifAygun/Iron-Ore-Froth-Flotation-Quality-Prediction/blob/main/AA_Graduate_Project_3_XGBoost.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Quality Prediction of Iron Ore Mining Flotation Process - Part:3**

# **Machine Learning Models**

### **Import Libraries and Modules**

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
import math
import random
import xgboost as xgb
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble import RandomForestRegressor


### **Import and Divide Dataset**

In [2]:
from google.colab import drive
drive.mount('/content/drive/')
%cd /content/drive/My Drive/Flotation/

flotation_conditions = pd.read_csv('flotation_conditions.csv')
concentrates = pd.read_csv('concentrates.csv')
iron_concentrate = pd.read_csv('iron_concentrate.csv')
silica_concentrate = pd.read_csv('silica_concentrate.csv')

Mounted at /content/drive/
/content/drive/My Drive/Flotation


## **9. Multi Linear Regressor**

In [3]:
#train test split for regression training and testing 
X_train, X_test, y_train, y_test = train_test_split(flotation_conditions, concentrates,
                                                    test_size=0.20, random_state=0)

In [4]:
mlr = LinearRegression()
mlr.fit(X_train, y_train)

y_pred_mlr = mlr.predict(X_test)
r2_score_mlr = r2_score(y_test, y_pred_mlr)
mse_mlr = mean_squared_error(y_test, y_pred_mlr)

print('R2 Score of Multi Linear Regression:', r2_score_mlr)
print('Mean Squared Error (MSE) of Multi Linear Regression:', mse_mlr)


R2 Score of Multi Linear Regression: 0.18662737342875502
Mean Squared Error (MSE) of Multi Linear Regression: 1.0400152994837892


- R2 Score: R2 score of the Multi Linear Regression model is 0.187. This indicates that approximately 18.7% of the variance in the target variable is explained by the independent variables included in the model.

- Mean Squared Error: MSE of the Multi Linear Regression model is 1.040. A lower MSE value indicates better accuracy, meaning that the predicted values are closer to the actual values on average.

Considering the R2 score of 0.187, it suggests that the Multi Linear Regression model might not be a very good fit for the data. This means that the independent variables included in the model might not have a strong linear relationship with the target variable, or there might be other factors that the model is not capturing. Additionally, the relatively high MSE value of 1.040 indicates that the model's predictions have a relatively high average squared difference from the actual values.

## **10. Random Forest Regressor**

In [5]:
rf1 = RandomForestRegressor(random_state = 0, n_estimators = 100)  
rf1.fit(X_train,y_train) 
y_pred_rf = rf1.predict(X_test)

In [6]:
rf2 = RandomForestRegressor(random_state = 0, n_estimators = 100)  
rf2.fit(flotation_conditions,concentrates)
y_pred_rf2 = rf1.predict(flotation_conditions)

In [7]:
print('R2 Score of Random Forest Regression with Train-Test Split',r2_score(y_test,y_pred_rf))
print('R2 Score of Random Forest Regression with Whole Data',r2_score(concentrates,y_pred_rf2))

R2 Score of Random Forest Regression with Train-Test Split 0.326715411986887
R2 Score of Random Forest Regression with Whole Data 0.7883449853154547


The Random Forest Regression model achieves a higher R2 score compared to the Multi Linear Regression model, indicating better performance in capturing the underlying patterns and relationships in the data. The R2 score of 0.3267 for the train-test split suggests that the model can explain approximately 32.67% of the variability in the target variable, while the R2 score of 0.7883 for the whole dataset suggests that the model can explain approximately 78.83% of the variability.

### **10.1. Random Forest Model for Iron Concentrate**

In [8]:
regressor_Fe = RandomForestRegressor(random_state=0, n_estimators=100)
regressor_Fe.fit(flotation_conditions, iron_concentrate.values.ravel())
y_pred_Fe = regressor_Fe.predict(flotation_conditions)

r2_score_Fe = r2_score(iron_concentrate, y_pred_Fe)
mse_Fe = mean_squared_error(iron_concentrate, y_pred_Fe)

print('R2 Score of Random Forest Regression for Iron Concentrate:', r2_score_Fe)
print('Mean Squared Error (MSE) of Random Forest Regression for Iron Concentrate:', mse_Fe)


R2 Score of Random Forest Regression for Iron Concentrate: 0.9089028341021771
Mean Squared Error (MSE) of Random Forest Regression for Iron Concentrate: 0.11385778613603315


The Random Forest Regression model for predicting the Iron Concentrate has achieved a high R2 score of 0.909. This indicates that the model can explain approximately 91% of the variability in the Iron Concentrate based on the provided features. A high R2 score suggests that the model is able to effectively capture the patterns and relationships in the data, resulting in accurate predictions.

The MSE of 0.1139 suggests that, on average, the squared difference between the predicted and actual iron concentrate values is relatively small. This indicates that the model's predictions are generally close to the actual values, indicating good accuracy.

### **10.2. Random Forest Model for Silica Concentrate**

In [9]:
regressor_Si = RandomForestRegressor(random_state=0, n_estimators=100)
regressor_Si.fit(flotation_conditions, silica_concentrate.values.ravel())
y_pred_Si = regressor_Si.predict(flotation_conditions)

r2_score_Si = r2_score(silica_concentrate, y_pred_Si)
mse_Si = mean_squared_error(silica_concentrate, y_pred_Si)

print('R2 Score of Random Forest Regression for Silica Concentrate:', r2_score_Si)
print('Mean Squared Error (MSE) of Random Forest Regression for Silica Concentrate:', mse_Si)


R2 Score of Random Forest Regression for Silica Concentrate: 0.9100704799723327
Mean Squared Error (MSE) of Random Forest Regression for Silica Concentrate: 0.11374115782077435


The R2 score of 0.91 for the Random Forest Regression model predicting the Silica Concentrate suggests that approximately 91% of the variance in the Silica Concentrate can be explained by the given flotation conditions. This indicates a strong correlation between the predictor variables and the Silica Concentrate.

Similarly, the Mean Squared Error (MSE) of 0.1137 indicates that, on average, the squared difference between the predicted and actual Silica Concentrate values is relatively small. This suggests that the model's predictions are generally accurate and close to the actual values.

Both the R2 score and MSE indicate that the Random Forest Regression model performs well in predicting the Silica Concentrate based on the provided flotation conditions. These results suggest that the model captures the underlying patterns and relationships effectively.

## **11. XGBoost Regression**

### **11.1. XGBoost Regression For Iron Concentrate**

In [10]:
# Create an XGBoost regression model
regressor = xgb.XGBRegressor(objective='reg:squarederror', random_state=0)
# You can adjust other hyperparameters of XGBoost if needed

# Fit the XGBoost regression model
regressor.fit(flotation_conditions, iron_concentrate.values.ravel())

# Generate predictions
y_pred = regressor.predict(flotation_conditions)

# Calculate evaluation metrics
r2_score_XGB = r2_score(iron_concentrate, y_pred)
mse_XGB = mean_squared_error(iron_concentrate, y_pred)

print('R2 Score of XGBoost Regression for Iron Concentrate:', r2_score_XGB)
print('Mean Squared Error (MSE) of XGBoost Regression for Iron Concentrate:', mse_XGB)

R2 Score of XGBoost Regression for Iron Concentrate: 0.9419023057944114
Mean Squared Error (MSE) of XGBoost Regression for Iron Concentrate: 0.07261339885453721


The XGBoost Regression model achieved an R2 score of 0.942, indicating that it explains approximately 94.2% of the variance in the Iron Concentrate data. A higher R2 score suggests that the model fits the data well.

The Mean Squared Error (MSE) for the XGBoost Regression model is 0.073, which represents the average squared difference between the predicted and actual Iron Concentrate values. A lower MSE indicates better predictive accuracy.


### **11.2. XGBoost Regression For Silica Concentrate**

In [11]:
# Create an XGBoost regression model
regressor = xgb.XGBRegressor(objective='reg:squarederror', random_state=0)
# You can adjust other hyperparameters of XGBoost if needed

# Fit the XGBoost regression model
regressor.fit(flotation_conditions, silica_concentrate.values.ravel())

# Generate predictions
y_pred = regressor.predict(flotation_conditions)

# Calculate evaluation metrics
r2_score_XGB = r2_score(silica_concentrate, y_pred)
mse_XGB = mean_squared_error(silica_concentrate, y_pred)

print('R2 Score of XGBoost Regression for Silica Concentrate:', r2_score_XGB)
print('Mean Squared Error (MSE) of XGBoost Regression for Silica Concentrate:', mse_XGB)

R2 Score of XGBoost Regression for Silica Concentrate: 0.9474644776615195
Mean Squared Error (MSE) of XGBoost Regression for Silica Concentrate: 0.06644593605814353


The XGBoost Regression model achieved an R2 score of 0.947 for the Silica Concentrate, indicating that it explains approximately 94.7% of the variance in the Silica Concentrate data. This high R2 score suggests that the model fits the data very well and can effectively predict the Silica Concentrate values.

The Mean Squared Error (MSE) for the XGBoost Regression model is 0.066, which represents the average squared difference between the predicted and actual Silica Concentrate values. The lower MSE indicates that the model has good predictive accuracy and is able to provide accurate predictions of the Silica Concentrate values.

In [12]:
data = {'Model': ['Random Forest', 'XGBoost'],
        'Concentrate': ['Iron Concentrate', 'Silica Concentrate'],
        'R2 Score': [0.909, 0.942],
        'MSE': [0.114, 0.073]}
df = pd.DataFrame(data)
print(df.to_string(index=False))

        Model        Concentrate  R2 Score   MSE
Random Forest   Iron Concentrate     0.909 0.114
      XGBoost Silica Concentrate     0.942 0.073


- R2 Score: Higher R2 scores (closer to 1) indicate a better fit of the model to the data. The R2 scores obtained for the respective regression models are 0.909 for "Random Forest" and 0.942 for "XGBoost".

- MSE: Lower MSE values indicate better accuracy of the regression models. The MSE values obtained for the respective regression models are 0.114 for "Random Forest" and 0.073 for "XGBoost".
- The results indicate that both regression models (Random Forest and XGBoost) perform well in predicting the concentrations of iron and silica. The XGBoost model shows slightly higher R2 scores and lower MSE values, suggesting that it may provide better predictive performance for both types of concentrate. 