# Grind Size Classification Model

The primary job of mineral processing plants is to crush/grind the crude ore to a targetted grind size and use the flotation process to separate the valuable metals from the tailings (wastes).

Most of the plant's process data are monitored online, but there are still some that are sampled and measured manually due to technological/economical limitations. For instance, after the grinding process, the fine ore's particle size distribution (grind size) is only measured once a day with a manual sampling scoop. The grind size is a crucial parameter that will affect the downstream flotation kinetics if it is not on target. In general, when the size is too coarse (big), valuable metal recovery is reduced. When the size is too fine (small), too much waste would be recovered, which affects metal purity. Whenever there is a problem in the process, the process engineer would not be able to immediately verify the grind size, as it takes several hours to sample and measure the grind size distribution manually.


The purpose of this model is to predict whether the mineral processing plant's grinding circuit output is too coarse, on target, or too fine, using the plant's online data of various process parameters (eg. mill power output, mill tonnage, minerals' chemical compositions, type of ore processed, cyclone pressure, number of cyclones online, etc). The online data (features) are monitored continuously with OSIsoft's PI System. They are downloaded to Excel with Pi Datalink. There will be mostly numerical + some categorical features.
The daily grind size data (label) has been recorded in an Excel Spreadsheet since 2013.

As the residence time of the grinding circuit is around 30-60 min, each instance in the dataset would have a grind size of a daily sample + the average of online data in the hour prior to when the sample was taken.

For confidentiality reasons, the plant name and the raw dataset cannot be shared.

## Data Upload

In [None]:
### Import libraries that will be used throughout the code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

In [None]:
### Import raw data file
from google.colab import files
uploaded=files.upload()

Saving GrindSize Raw Data.xlsx to GrindSize Raw Data (1).xlsx


In [None]:
import io
raw_data = pd.read_excel(io.BytesIO(uploaded['GrindSize Raw Data.xlsx']))

In [None]:
raw_data.head(20)

Unnamed: 0,Date,Rod Mill Tonnage tph,Rod Mill Water Flow Usgpm,Pump Box ReCycle Water cmph,Pump Box Process Water cmph,Ball Mill Process Water cmph,Rod Mill kW,Ball Mill kW,# of Cyclone,Cyclone Pressure psi,Cyclone Overflow %Sol,Ni%,Cu%,Fe%,S%,Feed Type,NickelCopper,% Passing 200M
0,2013-01-01 06:00:00,159.996931,119.166428,65.05579,34.940079,36.823228,566.927778,1306.875225,2.0,9.928975,46.751768,0.630952,4.247876,8.287141,4.758753,Cu Fraser,Copper,56.105263
1,2013-01-02 06:00:00,155.968999,121.627277,74.73137,40.253819,40.466375,536.924948,1304.684595,2.0,10.232744,47.024574,2.106935,5.88698,13.828072,9.168528,Cu Nickel Rim,Copper,58.994365
2,2013-01-03 06:00:00,155.982331,121.755537,93.620943,23.396572,40.399002,557.222612,1293.50969,2.0,10.390768,46.977012,2.125585,6.191957,13.448156,9.57534,Cu Nickel Rim,Copper,58.305085
3,2013-01-04 06:00:00,0.061037,0.030519,2.785809,[-11059] No Good Data For Calculation,0.0,1.245155,0.0,2.0,0.262459,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,Cu Nickel Rim,Copper,0
4,2013-01-05 06:00:00,0.061037,0.030519,2.880947,[-11059] No Good Data For Calculation,0.0,1.538133,0.0,2.0,0.325022,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,Cu Nickel Rim,Copper,61.59861
5,2013-01-06 06:00:00,155.95563,121.648648,82.69795,20.708904,35.307017,545.498283,1295.078341,2.0,10.545744,46.145758,1.773198,5.007279,11.94587,7.564536,Cu Nickel Rim,Copper,59.474482
6,2013-01-07 06:00:00,156.005002,121.422319,82.138866,20.540448,39.00011,565.583596,1304.806346,2.0,10.266011,46.051834,1.895707,5.327316,12.673051,8.151054,Cu Nickel Rim,Copper,57.165272
7,2013-01-08 06:00:00,0.061037,0.627933,3.37203,[-11059] No Good Data For Calculation,0.0,4.816641,0.0,1.0,0.23957,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,Cu Nickel Rim,Copper,Down
8,2013-01-09 06:00:00,0.061037,0.038148,3.251203,[-11059] No Good Data For Calculation,0.0,1.977599,0.0,1.0,0.347912,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,Cu Nickel Rim,Copper,Down
9,2013-01-10 06:00:00,0.022889,0.038148,3.173297,[-11059] No Good Data For Calculation,0.0,1.977599,0.0,1.0,0.302896,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,[-11059] No Good Data For Calculation,Cu Nickel Rim,Copper,0


## Data Cleaning

One can notice that the dataset is not purely in numbers. There often exist error messages when the plant is down or when the measurement sensors are offline.

In [None]:
raw_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2998 entries, 0 to 2997
Data columns (total 18 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Date                          2998 non-null   object 
 1   Rod Mill Tonnage tph          2998 non-null   object 
 2   Rod Mill Water Flow Usgpm     2998 non-null   object 
 3   Pump Box ReCycle Water cmph   2998 non-null   object 
 4   Pump Box Process Water cmph   2998 non-null   object 
 5   Ball Mill Process Water cmph  2998 non-null   object 
 6   Rod Mill kW                   2998 non-null   object 
 7   Ball Mill kW                  2998 non-null   object 
 8   # of Cyclone                  2996 non-null   float64
 9   Cyclone Pressure psi          2998 non-null   object 
 10  Cyclone Overflow %Sol         2998 non-null   object 
 11  Ni%                           2996 non-null   object 
 12  Cu%                           2996 non-null   object 
 13  Fe%

One way to characterize particle size distribution is to measure the weight percentage of the sample passing through a sieve with fixed hole size. In this case, the plant measures the %Passing 200M (hole diameters = 75 micron). In general, the particle size distribution is considered to be on target when the "%Passing 200M" is around 55%. In other words, the plant wants at least 55% of the particles to be smaller than 75 micron. A low percentage would indicate that the particle size is too coarse, and a high percentage would indicate that the particle size is too small.