<a href="https://colab.research.google.com/github/ArifAygun/Iron-Ore-Froth-Flotation-Quality-Prediction/blob/main/AA_Graduate_Project_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Quality Prediction of Iron Ore Mining Flotation Process - Part:3**

# **Machine Learning Models**

### **Import Libraries and Modules**

In [45]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
import math
import random
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble import RandomForestRegressor

### **Import and Divide Dataset**

In [46]:
from google.colab import drive
drive.mount('/content/drive/')
%cd /content/drive/My Drive/Flotation/

flotation_conditions = pd.read_csv('flotation_conditions.csv')
concentrates = pd.read_csv('concentrates.csv')
iron_concentrate = pd.read_csv('iron_concentrate.csv')
silica_concentrate = pd.read_csv('silica_concentrate.csv')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).
/content/drive/My Drive/Flotation


## **9. Multi Linear Regressor**

In [47]:
#train test split for regression training and testing 
x_train, x_test, y_train, y_test = train_test_split(flotation_conditions, concentrates,
                                                    test_size=0.20, random_state=0)

In [48]:
mlr = LinearRegression()
mlr.fit(x_train,y_train)

y_pred_mlr = mlr.predict(x_test)
print('R2 Score of Multi Linear Regression',r2_score(y_test,y_pred_mlr))

R2 Score of Multi Linear Regression 0.18662737342875502


The R2 score of 0.1866 for the Multi Linear Regression model suggests that the model explains only around 18.66% of the variability in the target variables. This indicates that the model has limited predictive power in this case. It is important to consider other factors and evaluation metrics to assess the model's effectiveness. A low R2 score may indicate weak correlation between the independent variables and the target variables or the presence of unaccounted factors affecting the target variable. Further analysis and model improvements may be needed.

## **10. Random Forest Regressor**

In [49]:
rf1 = RandomForestRegressor(random_state = 0, n_estimators = 100)  
rf1.fit(x_train,y_train) 
y_pred_rf = rf1.predict(x_test)

In [50]:
rf2 = RandomForestRegressor(random_state = 0, n_estimators = 100)  
rf2.fit(flotation_conditions,concentrates)
y_pred_rf2 = rf1.predict(flotation_conditions)

In [51]:
print('R2 Score of Random Forest Regression with Train-Test Split',r2_score(y_test,y_pred_rf))
print('R2 Score of Random Forest Regression with Whole Data',r2_score(concentrates,y_pred_rf2))

R2 Score of Random Forest Regression with Train-Test Split 0.326715411986887
R2 Score of Random Forest Regression with Whole Data 0.7883449853154547


The Random Forest Regression model achieves a higher R2 score compared to the Multi Linear Regression model, indicating better performance in capturing the underlying patterns and relationships in the data. The R2 score of 0.3267 for the train-test split suggests that the model can explain approximately 32.67% of the variability in the target variable, while the R2 score of 0.7883 for the whole dataset suggests that the model can explain approximately 78.83% of the variability.

### **10.1. Random Forest Model for Iron Concentrate**

In [52]:
regressor_Fe = RandomForestRegressor(random_state=0, n_estimators=100)
regressor_Fe.fit(flotation_conditions, iron_concentrate.values.ravel())
y_pred_Fe = regressor_Fe.predict(flotation_conditions)

r2_score_Fe = r2_score(iron_concentrate, y_pred_Fe)
mse_Fe = mean_squared_error(iron_concentrate, y_pred_Fe)

print('R2 Score of Random Forest Regression for Iron Concentrate:', r2_score_Fe)
print('Mean Squared Error (MSE) of Random Forest Regression for Iron Concentrate:', mse_Fe)


R2 Score of Random Forest Regression for Iron Concentrate: 0.9089028341021771
Mean Squared Error (MSE) of Random Forest Regression for Iron Concentrate: 0.11385778613603315


The Random Forest Regression model for predicting the Iron Concentrate has achieved a high R2 score of 0.909. This indicates that the model can explain approximately 91% of the variability in the Iron Concentrate based on the provided features. A high R2 score suggests that the model is able to effectively capture the patterns and relationships in the data, resulting in accurate predictions.

The MSE of 0.1139 suggests that, on average, the squared difference between the predicted and actual iron concentrate values is relatively small. This indicates that the model's predictions are generally close to the actual values, indicating good accuracy.

### **10.2. Random Forest Model for Silica Concentrate**

In [53]:
regressor_Si = RandomForestRegressor(random_state=0, n_estimators=100)
regressor_Si.fit(flotation_conditions, silica_concentrate.values.ravel())
y_pred_Si = regressor_Si.predict(flotation_conditions)

r2_score_Si = r2_score(silica_concentrate, y_pred_Si)
mse_Si = mean_squared_error(silica_concentrate, y_pred_Si)

print('R2 Score of Random Forest Regression for Silica Concentrate:', r2_score_Si)
print('Mean Squared Error (MSE) of Random Forest Regression for Silica Concentrate:', mse_Si)


R2 Score of Random Forest Regression for Silica Concentrate: 0.9100704799723327
Mean Squared Error (MSE) of Random Forest Regression for Silica Concentrate: 0.11374115782077435


The R2 score of 0.91 for the Random Forest Regression model predicting the Silica Concentrate suggests that approximately 91% of the variance in the Silica Concentrate can be explained by the given flotation conditions. This indicates a strong correlation between the predictor variables and the Silica Concentrate.

Similarly, the Mean Squared Error (MSE) of 0.1137 indicates that, on average, the squared difference between the predicted and actual Silica Concentrate values is relatively small. This suggests that the model's predictions are generally accurate and close to the actual values.

Both the R2 score and MSE indicate that the Random Forest Regression model performs well in predicting the Silica Concentrate based on the provided flotation conditions. These results suggest that the model captures the underlying patterns and relationships effectively.