- This example is for the clastic rock group. 
- In this example some logs are not recorded in some depths, and before using the models for prediction, we have to estimate these values with methods such as placing the mean, median, and KNN imputer. 
- The input data are RHOB, PHIN and Vp (considering that all three logs are available, XGBoost model number 12 has been used).

In [1]:
import pandas as pd
import numpy as np
from joblib import load

In [4]:
data = pd.read_csv('Carbonates.csv').drop('Unnamed: 0', axis=1)
data.head()

Unnamed: 0,RHOB,PHIN,Vp
0,2.719,0.078,5.995204
1,,0.0975,5.943536
2,,0.117,5.892752
3,,0.1365,5.842828
4,,0.156,5.793743


- With isna(), you can see the missing or unregistered values of each of the logs.
- According to the output (for example, RHOB log) depth is not recorded for 19 and we have to estimate these values.
- SimpleImputer and KNNImputer

In [10]:
data.isna().sum()

RHOB    19
PHIN    16
Vp      12
dtype: int64

# Approach 1 - SimpleImputer
For more information: https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html

In [20]:
from sklearn.impute import SimpleImputer
imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')
data_with_out_NAN = pd.DataFrame(imp_mean.fit_transform(data), columns=data.columns)

In [30]:
print('Missing vlaues:')
print(data_with_out_NAN.isna().sum())
data_with_out_NAN

Missing vlaues:
RHOB    0
PHIN    0
Vp      0
dtype: int64


Unnamed: 0,RHOB,PHIN,Vp
0,2.719000,0.0780,5.995204
1,2.632599,0.0975,5.943536
2,2.632599,0.1170,5.892752
3,2.632599,0.1365,5.842828
4,2.632599,0.1560,5.793743
...,...,...,...
95,2.611000,0.0530,5.832604
96,2.599500,0.0725,5.783690
97,2.588000,0.0920,5.735589
98,2.576500,0.1115,5.688282


In [31]:
model = load('12.joblib')

In [32]:
PC_predicted = model.predict(data_with_out_NAN)
PC_predicted

array([853.2812 , 863.6663 , 880.69   , 881.41144, 902.5431 , 914.5154 ,
       925.1025 , 840.6163 , 862.0619 , 862.86224, 867.2737 , 843.7783 ,
       848.6025 , 842.9161 , 862.3669 , 855.79   , 857.8212 , 868.93835,
       880.6438 , 892.50275, 904.2699 , 822.60333, 837.9035 , 855.4887 ,
       859.1827 , 853.4801 , 850.33997, 845.69635, 865.2829 , 858.4325 ,
       839.4088 , 851.7682 , 862.64435, 873.3781 , 887.2294 , 813.1881 ,
       830.29645, 842.7314 , 854.53723, 857.91376, 877.9623 , 821.3658 ,
       834.19037, 845.3136 , 855.40594, 869.3879 , 823.8073 , 835.75305,
       847.85547, 860.60223, 826.69476, 840.2569 , 851.4484 , 830.0755 ,
       842.82135, 833.95746, 843.1307 , 853.4048 , 865.9311 , 877.7714 ,
       889.35596, 901.2615 , 910.27136, 830.8535 , 845.62915, 857.30304,
       869.14484, 880.79694, 892.33356, 904.55365, 820.98456, 837.0594 ,
       848.74567, 860.15985, 871.39233, 884.76917, 895.37305, 814.33124,
       818.0288 , 829.9551 , 851.23456, 864.9729 , 

# Approach 2 - KNNImputer
For more information: https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html

In [33]:
from sklearn.impute import KNNImputer
imputer = KNNImputer(n_neighbors=2)
data_with_out_NAN = pd.DataFrame(imputer.fit_transform(data), columns=data.columns)

In [34]:
print('Missing vlaues:')
print(data_with_out_NAN.isna().sum())
data_with_out_NAN

Missing vlaues:
RHOB    0
PHIN    0
Vp      0
dtype: int64


Unnamed: 0,RHOB,PHIN,Vp
0,2.71900,0.0780,5.995204
1,2.67000,0.0975,5.943536
2,2.68175,0.1170,5.892752
3,2.65450,0.1365,5.842828
4,2.64300,0.1560,5.793743
...,...,...,...
95,2.61100,0.0530,5.832604
96,2.59950,0.0725,5.783690
97,2.58800,0.0920,5.735589
98,2.57650,0.1115,5.688282


In [35]:
model = load('12.joblib')

In [36]:
PC_predicted = model.predict(data_with_out_NAN)
PC_predicted

array([853.2812 , 860.5459 , 872.96515, 894.7734 , 897.6668 , 913.9954 ,
       925.2927 , 836.3218 , 841.4907 , 856.33044, 981.32764, 908.3473 ,
       900.45917, 911.72925, 825.0011 , 840.0376 , 857.8212 , 868.93835,
       880.6438 , 892.50275, 904.2699 , 822.60333, 847.39874, 837.83185,
       848.2253 , 862.474  , 883.7555 , 893.5941 , 852.2699 , 866.0544 ,
       839.4088 , 851.7682 , 862.64435, 873.3781 , 887.2294 , 813.1881 ,
       830.29645, 842.7314 , 854.53723, 857.91376, 877.9623 , 821.3658 ,
       834.19037, 845.3136 , 855.40594, 869.3879 , 823.8073 , 835.75305,
       847.85547, 860.60223, 826.69476, 840.2569 , 851.4484 , 830.0755 ,
       842.82135, 833.95746, 843.1307 , 853.4048 , 865.9311 , 877.7714 ,
       889.35596, 901.2615 , 910.27136, 830.8535 , 845.62915, 857.30304,
       869.14484, 880.79694, 892.33356, 904.55365, 820.98456, 837.0594 ,
       848.74567, 860.15985, 871.39233, 884.76917, 895.37305, 814.33124,
       824.0424 , 839.6789 , 847.58514, 867.1133 , 