<span style='color:gray'> <span style="font-size:25px;"> **Development of "Machine Learning Models"  (Workflow)**
    
In this Notebook, the machine learning model will be created and then the data from well-logs DLIS file [after preprocessing, sorting and finalizing the data] is loaded as input for Machine Learning model (ML); 
* Random Forest Regressor
* Gradient Boosting Regressor
    
    
For the prediction of petrophysical properties, such as porosity, permeability and water saturation, these two Regressor models **Random Forest Regressor** and **Gradient Boosting Regressor** are suitable.

They are Ensemble Based Tree Methods; they are based on the generation of Decision Trees.

We use Regression Models since we want to predict a continuous variable.

**Advantages** of the 2 regression models, since they are based on Decision Trees:

* They do not need the normalization or scaling of the original dataset;
* They are not sensitive to outliers, thus, outliers detection and removal are not required.

**==================================================================================================================**
    
In well-log machine learning models, the choice between regression and classification (Supervised ML) depends on the nature of the problem you are trying to solve and the type of data you have. Let's break down the reasons why regression is often preferred over classification in this context:

**Continuous Output**: Well-log data often involves continuous measurements such as porosity, permeability, resistivity, and other geological properties. Regression is well-suited for predicting and modeling continuous numerical values. Classification, on the other hand, is typically used when the output is categorical or discrete, like classifying lithology or rock types.

**Data Distribution**: Well-log data tends to have a wide range of continuous values. Using classification would require discretizing this data into bins or classes, which can lead to loss of information and potentially introduce biases. Regression models can capture the nuances and variations present in the continuous data more effectively.

**Evaluation Metrics**: Regression models are evaluated using metrics such as mean squared error (MSE), root mean squared error (RMSE), or mean absolute error (MAE). These metrics are well-suited for measuring the accuracy of predictions involving continuous values. Classification models, on the other hand, use metrics like accuracy, precision, recall, and F1-score, which are designed for categorical predictions.

**Feature Importance**: Well-log data analysis often involves understanding the relationships between different geological features and the target property. Regression models can provide insights into the quantitative impact of each feature on the predicted values, aiding in geological interpretation.


<span style='color:gray'> <span style="font-size:20px;"> 
**Importing Libraries, Regressors, and Required Dependencies**

In [1]:
%pip install --quiet --upgrade scikit-learn==1.2.2
%pip install --quiet qbstyles


# Importing the dependencies
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns

from qbstyles import mpl_style
mpl_style(dark=False)  # Set light matplotlib style

import matplotlib.patches as mpatches  # To create a legend with a color box
import pickle

# Importing the models 
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor

from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import PolynomialFeatures
from sklearn.neural_network import MLPRegressor
                                         
from sklearn.model_selection import RandomizedSearchCV

# train_test_split is a function 
# cross_val_score and KFold are functions

from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score, KFold 

# Regression metrics
from sklearn.metrics import r2_score, mean_squared_error, mean_squared_error, mean_absolute_error

# The package "Matplotlib Inline Back-end" provides support for Matplotlib to display figures directly inline
# "svg" stands for "scalable vector graphic". The plot can be scaled without compromising its quality
from matplotlib_inline.backend_inline import set_matplotlib_formats
set_matplotlib_formats('svg')

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


<span style='color:brown'> <span style="font-size:20px;"> **=+=+=+=+=++=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+**</span> </span>

<span style='color:red'> <span style="font-size:20px;"> **3-BRSA-944A-RJS**:</span> </span>

<span style='color:blue'> <span style="font-size:15px;"> **Well-log data**:</span> </span>

In [22]:
file_path = '/Users/amirhosseinakhondzadeh/CODE_WELLLOGS/Petrobras Well-log Analysis/Processed Data (out put of preprocessing == Input of ML)/Output of well data, [unique dataframe creation]/df_944_ML.csv'
df_944_ML =  pd.read_csv(file_path)
df_944_ML.rename(columns={'CMFF': 'NMRFF', 'CMRP_3MS': 'NMREFF', 'TCMR': 'NMRTOT'}, inplace=True)
df_944_ML.drop(columns = ['AT10', 'AT30'], inplace=True) # Replace with the actual column names you want to remove
df_944_ML.reset_index(drop=True, inplace=True)                       # Reset the index
df_944_ML

Unnamed: 0,DEPTH,GR,AT90,RHOZ,NPHI,DTCO,PEFZ,NMRFF,NMREFF,NMRTOT
0,5488.6860,18.344227,152.231600,2.493285,0.181833,67.179180,6.066652,0.109875,0.143338,0.143341
1,5488.8384,15.589806,96.225930,2.427294,0.209300,66.437190,5.704009,0.138243,0.169972,0.169975
2,5488.9907,15.148575,64.343010,2.410696,0.214533,65.967220,5.369229,0.134214,0.163915,0.165086
3,5489.1430,14.718891,34.749203,2.413879,0.210541,64.888306,5.229071,0.117416,0.150267,0.155419
4,5489.2954,14.587625,34.080303,2.422581,0.214654,64.874230,5.184336,0.097082,0.134775,0.139856
...,...,...,...,...,...,...,...,...,...,...
2011,5795.1626,20.780230,3.046565,2.463385,0.139730,64.736520,5.335359,0.001222,0.001480,0.001480
2012,5795.3150,22.939886,3.046565,2.452042,0.147132,66.271866,5.288734,0.001352,0.001640,0.001685
2013,5795.4673,23.912481,3.046565,2.450901,0.161174,66.790855,5.273507,0.001352,0.001640,0.001685
2014,5795.6196,22.685303,3.046565,2.448614,0.170256,66.684540,5.314323,0.001450,0.001748,0.001813


<span style='color:blue'> <span style="font-size:15px;"> **Trajectory data**:</span> </span>

In [3]:
def remove_lines_from_file(input_file, output_file, lines_to_remove):
    with open(input_file, 'r') as file:
        lines = file.readlines()

    # Remove specified lines
    lines = [line for i, line in enumerate(lines, 1) if i not in lines_to_remove]

    with open(output_file, 'w') as file:
        file.writelines(lines)

# Specify the input file, output file, and line numbers to remove
input_file = '/Users/amirhosseinakhondzadeh/@ DATA/3-BRSA-944A-RJS/Dados Direcionais/3-brsa-944a-rjs_direcionais.txt'
output_file = '/Users/amirhosseinakhondzadeh/@ DATA/3-BRSA-944A-RJS/Dados Direcionais/output.txt'
lines_to_remove = list(range(0, 25))     # remove to line 

# Call the function to remove the specified lines
remove_lines_from_file(input_file, output_file, lines_to_remove)

#======================== loading new file
file_path = '/Users/amirhosseinakhondzadeh/@ DATA/3-BRSA-944A-RJS/Dados Direcionais/output.txt'

# Read the file into a Pandas DataFrame using space as the delimiter
df = pd.read_csv(file_path, delim_whitespace=True)   
df = df.iloc[1:-1]    # Remove the and first last row

#======================== loading new file
df = df[['PROFUNDIDADE', 'PROFUNDIDADE.1', 'INCLINACAO', 'AZIMUTE']]
df = df.rename(columns={'PROFUNDIDADE': 'MD'})
df = df.rename(columns={'PROFUNDIDADE.1': 'TVD'})
df = df.rename(columns={'INCLINACAO': 'INC'})
df = df.rename(columns={'AZIMUTE': 'AZI'})
df.reset_index(drop=True, inplace=True)                       # Reset the index
df

Unnamed: 0,MD,TVD,INC,AZI
0,1886.00,1886.00,0.00,0.00
1,1906.00,1906.00,0.20,234.75
2,1935.00,1935.00,0.69,227.27
3,1994.00,1994.00,0.55,236.54
4,2022.00,2021.99,0.57,244.28
...,...,...,...,...
147,5875.00,5873.75,1.23,238.80
148,5903.00,5901.75,1.32,244.26
149,5932.00,5930.74,1.33,247.59
150,5959.00,5957.73,1.38,249.95


<span style='color:blue'> <span style="font-size:15px;"> **Petrophysics data**:</span> </span>

In [4]:
# Replace 'your_file.xlsx' with the path to your Excel file
file_path = '/Users/amirhosseinakhondzadeh/@ DATA/3-BRSA-944A-RJS/Dados de Rochas e Fluidos/PETROGRAFIA_BASICA/3BRSA944ARJSPetrofisica_2.xls'
df_Ptroph_944 = pd.read_excel(file_path)

# Extract the desired columns and sort by "Profundidade" while dropping NaN values
df_Ptroph_944 = df_Ptroph_944[["Profundidade", "Permeab. Long. (mD)", "Porosidade %"]].sort_values("Profundidade", ascending=True).dropna()

#df_Ptroph = df_Ptroph.iloc[102:164]  # Extract rows A to B (inclusive)
df_Ptroph_944 = df_Ptroph_944.reset_index(drop=True)  # Reset the index after sorting

"""
# 1) Filter out rows with '-' in the "Porosidade%" column
# 2) Now, you can replace ',' with '.' and convert the "Porosidade%" column to float
df_Ptroph = df_Ptroph[df_Ptroph["Porosidade%"] != '-'] # 1
df_Ptroph["Porosidade%"] = df_Ptroph["Porosidade%"].str.replace(',', '.').astype(float) # 2
"""
#=====
"""
# Continue with your code
DEPTH_lab = df_Ptroph_944["Profundidade"]
K_lab = df_Ptroph_944["Permeab. Long. (mD)"]
PHI_lab = df_Ptroph_944["Porosidade %"]
"""

df_Ptroph_944 = df_Ptroph_944.rename(columns={'Profundidade': 'MD', 'Permeab. Long. (mD)': 'k', 'Porosidade %': 'phi'})
df_Ptroph_944

Unnamed: 0,MD,k,phi
0,5715.15,0.036,6.6
1,5715.45,0.516,8.1
2,5715.55,0.898,10.6
3,5715.95,0.011,5.6
4,5716.30,0.016,6.4
...,...,...,...
315,5795.15,19.800,19.1
316,5795.20,13.200,14.8
317,5795.40,10.300,17.6
318,5795.70,3.790,13.0


<span style='color:blue'> <span style="font-size:15px;"> **Build a well-log dataframe based on Petrophysical dataframe**:</span> </span>

In [5]:
depth_range = (df_Ptroph_944['MD'].min(), df_Ptroph_944['MD'].max())

# Convert to strings, replace commas with dots, and convert to floats
initial_depth = float(str(depth_range[0]).replace(',', '.'))
final_depth = float(str(depth_range[1]).replace(',', '.'))

print("Initial Depth:", initial_depth)
print("Final Depth:", final_depth)


Initial Depth: 5715.15
Final Depth: 5795.75


Limit the well-log data [defining the upper and lower limit] for well-log data based on petrophyscial data

In [6]:
# Filter the DataFrame to include rows with 'MD' within the depth range
df_944_ML = df_944_ML[(df_944_ML['DEPTH'] >= depth_range[0]) & (df_944_ML['DEPTH'] <= depth_range[1])]
df_944_ML

Unnamed: 0,DEPTH,GR,AT90,RHOZ,NPHI,DTCO,PEFZ,NMRFF,NMREFF,NMRTOT
1486,5715.1523,54.626095,45.572468,2.584553,0.068567,57.281326,4.242327,0.000685,0.000940,0.000940
1487,5715.3047,52.651684,27.635782,2.600371,0.043168,55.913692,4.422918,0.000486,0.000676,0.000676
1488,5715.4570,52.230045,20.734453,2.609256,0.053917,55.044290,4.669455,0.000486,0.000676,0.000676
1489,5715.6100,53.026020,18.841827,2.626682,0.067139,55.927204,4.801651,0.000440,0.000601,0.000601
1490,5715.7620,50.906727,20.888021,2.652325,0.077850,56.057182,4.782407,0.000530,0.000664,0.000665
...,...,...,...,...,...,...,...,...,...,...
2010,5795.0103,19.937273,3.057777,2.473910,0.138260,63.659725,5.412995,0.001136,0.001360,0.001360
2011,5795.1626,20.780230,3.046565,2.463385,0.139730,64.736520,5.335359,0.001222,0.001480,0.001480
2012,5795.3150,22.939886,3.046565,2.452042,0.147132,66.271866,5.288734,0.001352,0.001640,0.001685
2013,5795.4673,23.912481,3.046565,2.450901,0.161174,66.790855,5.273507,0.001352,0.001640,0.001685


we filtered the well-log data corresponding to the DEPTH in Petrophysical data, 

In [7]:
# Convert the 'MD' columns to NumPy arrays for faster calculations
depth_values = df_Ptroph_944['MD'].values
depth_944_values = df_944_ML['DEPTH'].values

selected_rows = []   # Create an empty list to store the selected rows

for depth in depth_values:
    absolute_diff = np.abs(depth_944_values - depth)              # Calculate the absolute differences between 'MD' values
                                                                  # in df_Ptroph_944 and 'DEPTH' values in df_944_ML

    
    nearest_depth_index = np.argmin(absolute_diff)                # Find the index of the minimum absolute difference
    
    selected_rows.append(df_944_ML.iloc[nearest_depth_index])     # Append the nearest row to the list

df_944_ML = pd.DataFrame(selected_rows)                              # Create a DataFrame from the selected rows
df_944_ML.reset_index(drop=True, inplace=True)                       # Reset the index
df_944_ML

Unnamed: 0,DEPTH,GR,AT90,RHOZ,NPHI,DTCO,PEFZ,NMRFF,NMREFF,NMRTOT
0,5715.1523,54.626095,45.572468,2.584553,0.068567,57.281326,4.242327,0.000685,0.000940,0.000940
1,5715.4570,52.230045,20.734453,2.609256,0.053917,55.044290,4.669455,0.000486,0.000676,0.000676
2,5715.6100,53.026020,18.841827,2.626682,0.067139,55.927204,4.801651,0.000440,0.000601,0.000601
3,5715.9146,47.765880,22.193060,2.666580,0.077615,55.133550,4.737260,0.000558,0.000675,0.000816
4,5716.3716,48.292297,26.896774,2.668222,0.061159,54.197830,4.733544,0.000405,0.000617,0.000617
...,...,...,...,...,...,...,...,...,...,...
315,5795.1626,20.780230,3.046565,2.463385,0.139730,64.736520,5.335359,0.001222,0.001480,0.001480
316,5795.1626,20.780230,3.046565,2.463385,0.139730,64.736520,5.335359,0.001222,0.001480,0.001480
317,5795.4673,23.912481,3.046565,2.450901,0.161174,66.790855,5.273507,0.001352,0.001640,0.001685
318,5795.6196,22.685303,3.046565,2.448614,0.170256,66.684540,5.314323,0.001450,0.001748,0.001813


In [8]:
# Sort both dataframes by 'MD' and 'DEPTH' respectively
df_944_ML = df_944_ML.sort_values(by='DEPTH')
df_Ptroph_944 = df_Ptroph_944.sort_values(by='MD')

# Merge the dataframes using 'merge_asof' to find the nearest depth in df_944_ML for each depth in df_Ptroph
df_944_PM = pd.merge_asof(df_Ptroph_944, df_944_ML, left_on='MD', right_on='DEPTH', direction='nearest')

df_944_PM = df_944_PM.drop('MD', axis=1)

# The 'result' dataframe now contains the rows from df_944_ML corresponding to depths in df_Ptroph
df_944_PM

Unnamed: 0,k,phi,DEPTH,GR,AT90,RHOZ,NPHI,DTCO,PEFZ,NMRFF,NMREFF,NMRTOT
0,0.036,6.6,5715.1523,54.626095,45.572468,2.584553,0.068567,57.281326,4.242327,0.000685,0.000940,0.000940
1,0.516,8.1,5715.4570,52.230045,20.734453,2.609256,0.053917,55.044290,4.669455,0.000486,0.000676,0.000676
2,0.898,10.6,5715.6100,53.026020,18.841827,2.626682,0.067139,55.927204,4.801651,0.000440,0.000601,0.000601
3,0.011,5.6,5715.9146,47.765880,22.193060,2.666580,0.077615,55.133550,4.737260,0.000558,0.000675,0.000816
4,0.016,6.4,5716.3716,48.292297,26.896774,2.668222,0.061159,54.197830,4.733544,0.000405,0.000617,0.000617
...,...,...,...,...,...,...,...,...,...,...,...,...
315,19.800,19.1,5795.1626,20.780230,3.046565,2.463385,0.139730,64.736520,5.335359,0.001222,0.001480,0.001480
316,13.200,14.8,5795.1626,20.780230,3.046565,2.463385,0.139730,64.736520,5.335359,0.001222,0.001480,0.001480
317,10.300,17.6,5795.4673,23.912481,3.046565,2.450901,0.161174,66.790855,5.273507,0.001352,0.001640,0.001685
318,3.790,13.0,5795.6196,22.685303,3.046565,2.448614,0.170256,66.684540,5.314323,0.001450,0.001748,0.001813


In [9]:
df_944_PM = df_944_PM.drop_duplicates(subset=['DEPTH', 'GR', 'AT90', 'RHOZ', 'NPHI', 'DTCO', 'PEFZ', 'NMRFF', 'NMREFF', 'NMRTOT'])
df_944_PM.reset_index(drop=True, inplace=True)                       # Reset the index
df_944_PM

Unnamed: 0,k,phi,DEPTH,GR,AT90,RHOZ,NPHI,DTCO,PEFZ,NMRFF,NMREFF,NMRTOT
0,0.036,6.6,5715.1523,54.626095,45.572468,2.584553,0.068567,57.281326,4.242327,0.000685,0.000940,0.000940
1,0.516,8.1,5715.4570,52.230045,20.734453,2.609256,0.053917,55.044290,4.669455,0.000486,0.000676,0.000676
2,0.898,10.6,5715.6100,53.026020,18.841827,2.626682,0.067139,55.927204,4.801651,0.000440,0.000601,0.000601
3,0.011,5.6,5715.9146,47.765880,22.193060,2.666580,0.077615,55.133550,4.737260,0.000558,0.000675,0.000816
4,0.016,6.4,5716.3716,48.292297,26.896774,2.668222,0.061159,54.197830,4.733544,0.000405,0.000617,0.000617
...,...,...,...,...,...,...,...,...,...,...,...,...
280,22.500,18.1,5794.4004,19.129719,6.142383,2.559285,0.109517,61.002037,6.030134,0.000894,0.001053,0.001053
281,15.400,18.5,5794.8574,19.666580,3.009704,2.474878,0.141756,63.389927,5.481481,0.001111,0.001326,0.001327
282,19.800,19.1,5795.1626,20.780230,3.046565,2.463385,0.139730,64.736520,5.335359,0.001222,0.001480,0.001480
283,10.300,17.6,5795.4673,23.912481,3.046565,2.450901,0.161174,66.790855,5.273507,0.001352,0.001640,0.001685


<span style='color:brown'> <span style="font-size:20px;"> **=+=+=+=+=++=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+**</span> </span>

<span style='color:red'> <span style="font-size:20px;"> **3-BRSA-1215-RJS**:</span> </span>

<span style='color:blue'> <span style="font-size:15px;"> **Well-log data**:</span> </span>

In [10]:
file_path = '/Users/amirhosseinakhondzadeh/CODE_WELLLOGS/Petrobras Well-log Analysis/Processed Data (out put of preprocessing == Input of ML)/Output of well data, [unique dataframe creation]/df_1215_ML.csv'
df_1215_ML =  pd.read_csv(file_path)
df_1215_ML.rename(columns={'CMFF': 'NMRFF', 'CMRP_3MS': 'NMREFF', 'TCMR': 'NMRTOT'}, inplace=True)
df_1215_ML.drop(columns = ['AT10', 'AT30'], inplace=True) # Replace with the actual column names you want to remove
df_1215_ML.reset_index(drop=True, inplace=True)                       # Reset the index
df_1215_ML

Unnamed: 0,DEPTH,GR,AT90,RHOZ,NPHI,DTCO,PEFZ,NMRFF,NMREFF,NMRTOT
0,5417.6676,21.410461,15.038757,2.621732,12.225975,62.967560,4.949074,0.008995,0.030131,0.050489
1,5417.7438,21.198140,13.604523,2.622316,12.180438,62.562557,4.973447,0.008995,0.030131,0.050489
2,5417.8200,20.987156,12.510310,2.623430,12.112868,62.562557,4.992749,0.012004,0.037343,0.054942
3,5417.8962,20.760265,11.831970,2.624798,12.056069,62.226500,5.012835,0.012004,0.037343,0.054942
4,5417.9724,20.520035,11.554049,2.625881,12.036470,62.226500,5.021335,0.012773,0.026898,0.047561
...,...,...,...,...,...,...,...,...,...,...
3056,5650.5348,50.979370,2.985158,2.567846,20.989720,67.835724,4.236481,0.007976,0.015516,0.055404
3057,5650.6110,52.180763,2.985158,2.565697,20.989720,67.835724,4.231709,0.007976,0.015516,0.055404
3058,5650.6872,53.384766,2.985158,2.558831,20.989720,69.624330,4.249410,0.007976,0.015516,0.055404
3059,5650.7634,54.232105,2.985158,2.558831,20.989720,69.624330,4.249410,0.007976,0.015516,0.055404


<span style='color:blue'> <span style="font-size:15px;"> **Trajectory data**:</span> </span>

In [11]:
def remove_lines_from_file(input_file, output_file, lines_to_remove):
    with open(input_file, 'r') as file:
        lines = file.readlines()

    # Remove specified lines
    lines = [line for i, line in enumerate(lines, 1) if i not in lines_to_remove]

    with open(output_file, 'w') as file:
        file.writelines(lines)

# Specify the input file, output file, and line numbers to remove
input_file = '/Users/amirhosseinakhondzadeh/@ DATA/3-BRSA-1215-RJS/Dados Direcionais/3-brsa-1215-rjs_direcionais.txt'
output_file = '/Users/amirhosseinakhondzadeh/@ DATA/3-BRSA-1215-RJS/Dados Direcionais/output.txt'
lines_to_remove = list(range(0, 23))     # remove to line 

# Call the function to remove the specified lines
remove_lines_from_file(input_file, output_file, lines_to_remove)

#======================== loading new file
file_path = '/Users/amirhosseinakhondzadeh/@ DATA/3-BRSA-1215-RJS/Dados Direcionais/output.txt'

# Read the file into a Pandas DataFrame using space as the delimiter
df = pd.read_csv(file_path, delim_whitespace=True)   
df = df.iloc[1:-1]    # Remove the and first last row
df
#======================== loading new file
df = df[['PROFUNDIDADE', 'PROFUNDIDADE.1', 'INCLINACAO', 'AZIMUTE']]
df = df.rename(columns={'PROFUNDIDADE': 'MD'})
df = df.rename(columns={'PROFUNDIDADE.1': 'TVD'})
df = df.rename(columns={'INCLINACAO': 'INC'})
df = df.rename(columns={'AZIMUTE': 'AZI'})
df

Unnamed: 0,MD,TVD,INC,AZI
1,1997.00,1997.00,0.00,0.00
2,2056.00,2056.00,0.25,175.60
3,2203.00,2203.00,0.53,350.02
4,2319.00,2319.00,0.15,287.78
5,2389.00,2389.00,0.32,272.43
...,...,...,...,...
67,5428.00,5427.42,0.47,123.09
68,5466.00,5465.42,0.27,129.19
69,5505.00,5504.42,0.17,147.41
70,5544.00,5543.42,0.24,221.73


<span style='color:blue'> <span style="font-size:15px;"> **Petrophysics data**:</span> </span>

In [12]:
# Replace 'your_file.xlsx' with the path to your Excel file
file_path = '/Users/amirhosseinakhondzadeh/@ DATA/3-BRSA-1215-RJS/Dados de Rochas e Fluidos/PETROFISICABASICA/3BRSA1215RJS_Petrofisica_Basica.xlsx'
df_Ptroph_1215 = pd.read_excel(file_path)

# Extract the desired columns and sort by "Profundidade" while dropping NaN values
df_Ptroph_1215 = df_Ptroph_1215[["Profundidade", "Permeabilidade Long. (mD)", "Porosidade%"]].sort_values("Profundidade", ascending=True).dropna()

#df_Ptroph = df_Ptroph.iloc[102:164]  # Extract rows A to B (inclusive)
df_Ptroph_1215 = df_Ptroph_1215.reset_index(drop=True)  # Reset the index after sorting

# Replace ',' with '.' and convert to float for the 'k' and 'phi' columns
df_Ptroph_1215['Permeabilidade Long. (mD)'] = df_Ptroph_1215['Permeabilidade Long. (mD)'].str.replace(',', '.').astype(float)
df_Ptroph_1215['Porosidade%'] = df_Ptroph_1215['Porosidade%'].str.replace(',', '.').astype(float)
df_Ptroph_1215['Profundidade'] = df_Ptroph_1215['Profundidade'].str.replace(',', '.').astype(float)

df_Ptroph_1215 = df_Ptroph_1215.rename(columns={'Profundidade': 'MD', 'Permeabilidade Long. (mD)': 'k', 'Porosidade%': 'phi'})
df_Ptroph_1215.reset_index(drop=True, inplace=True)                       # Reset the index
df_Ptroph_1215

Unnamed: 0,MD,k,phi
0,5413.4,0.0,0.01
1,5418.6,0.0,1.4
2,5424.0,27.6,15.2
3,5429.8,0.001,8.5
4,5450.0,2.79,12.2
5,5453.0,43.1,16.2
6,5458.0,772.0,20.0
7,5459.0,482.0,21.1
8,5460.8,96.7,16.8
9,5462.7,765.0,16.5


<span style='color:blue'> <span style="font-size:15px;"> **Build a well-log dataframe based on Petrophysical dataframe**:</span> </span>

In [13]:
depth_range = (df_Ptroph_1215['MD'].min(), df_Ptroph_1215['MD'].max())

# Replace commas with dots (periods) and convert to floats
initial_depth = float(str(depth_range[0]).replace(',', '.'))
final_depth = float(str(depth_range[1]).replace(',', '.'))

print("Initial Depth:", initial_depth)
print("Final Depth:", final_depth)

Initial Depth: 5413.4
Final Depth: 5677.3


In [14]:

# Filter the DataFrame to include rows with 'MD' within the depth range
df_1215_ML = df_1215_ML[(df_1215_ML['DEPTH'] >= depth_range[0]) & (df_1215_ML['DEPTH'] <= depth_range[1])]
df_1215_ML

Unnamed: 0,DEPTH,GR,AT90,RHOZ,NPHI,DTCO,PEFZ,NMRFF,NMREFF,NMRTOT
0,5417.6676,21.410461,15.038757,2.621732,12.225975,62.967560,4.949074,0.008995,0.030131,0.050489
1,5417.7438,21.198140,13.604523,2.622316,12.180438,62.562557,4.973447,0.008995,0.030131,0.050489
2,5417.8200,20.987156,12.510310,2.623430,12.112868,62.562557,4.992749,0.012004,0.037343,0.054942
3,5417.8962,20.760265,11.831970,2.624798,12.056069,62.226500,5.012835,0.012004,0.037343,0.054942
4,5417.9724,20.520035,11.554049,2.625881,12.036470,62.226500,5.021335,0.012773,0.026898,0.047561
...,...,...,...,...,...,...,...,...,...,...
3056,5650.5348,50.979370,2.985158,2.567846,20.989720,67.835724,4.236481,0.007976,0.015516,0.055404
3057,5650.6110,52.180763,2.985158,2.565697,20.989720,67.835724,4.231709,0.007976,0.015516,0.055404
3058,5650.6872,53.384766,2.985158,2.558831,20.989720,69.624330,4.249410,0.007976,0.015516,0.055404
3059,5650.7634,54.232105,2.985158,2.558831,20.989720,69.624330,4.249410,0.007976,0.015516,0.055404


In [15]:
"""# Filter the DataFrame to include rows with 'MD' within the depth range
df_1215_ML = df_1215_ML[(df_1215_ML['DEPTH'] >= float(depth_range[0])) & (df_1215_ML['DEPTH'] <= float(depth_range[1]))]
df_1215_ML"""

"# Filter the DataFrame to include rows with 'MD' within the depth range\ndf_1215_ML = df_1215_ML[(df_1215_ML['DEPTH'] >= float(depth_range[0])) & (df_1215_ML['DEPTH'] <= float(depth_range[1]))]\ndf_1215_ML"

In [16]:
# Convert the 'MD' columns to NumPy arrays for faster calculations
depth_values = df_Ptroph_1215['MD'].values
depth_1215_values = df_1215_ML['DEPTH'].values

selected_rows = []   # Create an empty list to store the selected rows

for depth in depth_values:
    absolute_diff = np.abs(depth_1215_values - depth)              # Calculate the absolute differences between 'MD' values
                                                                  # in df_Ptroph_944 and 'DEPTH' values in df_944_ML

    
    nearest_depth_index = np.argmin(absolute_diff)                # Find the index of the minimum absolute difference
    
    selected_rows.append(df_1215_ML.iloc[nearest_depth_index])     # Append the nearest row to the list

df_1215_ML = pd.DataFrame(selected_rows)                              # Create a DataFrame from the selected rows
df_1215_ML.reset_index(drop=True, inplace=True)                       # Reset the index
df_1215_ML

Unnamed: 0,DEPTH,GR,AT90,RHOZ,NPHI,DTCO,PEFZ,NMRFF,NMREFF,NMRTOT
0,5417.6676,21.410461,15.038757,2.621732,12.225975,62.96756,4.949074,0.008995,0.030131,0.050489
1,5418.582,20.519266,12.024226,2.641977,12.942612,62.64548,4.558427,0.013441,0.035606,0.067882
2,5423.9922,40.42212,5.905,2.527786,21.469666,62.395718,3.152332,0.000275,0.020538,0.048193
3,5429.7834,29.282438,12.040865,2.587283,14.456285,67.21927,5.37447,0.00164,0.019711,0.061156
4,5449.9764,22.537579,918.8996,2.477144,17.886215,65.86409,4.3116,0.163715,0.200007,0.200414
5,5453.0244,34.25385,1396.3385,2.362352,23.711674,71.64694,4.578998,0.122532,0.139118,0.14221
6,5457.9774,21.608595,1293.351,2.365811,21.804495,69.42828,4.903101,0.155981,0.192525,0.192541
7,5458.968,18.998669,741.41583,2.480622,17.817705,62.82063,4.384272,0.134886,0.181051,0.181051
8,5460.7968,19.865223,1417.8202,2.368061,23.345491,70.70588,5.075385,0.139051,0.167281,0.16729
9,5462.7018,20.158842,105.34858,2.438635,18.199104,66.10057,4.676818,0.14695,0.176516,0.176517


In [17]:
# Sort both dataframes by 'MD' and 'DEPTH' respectively
df_1215_ML = df_1215_ML.sort_values(by='DEPTH')
df_Ptroph_1215 = df_Ptroph_1215.sort_values(by='MD')

# Merge the dataframes using 'merge_asof' to find the nearest depth in df_944_ML for each depth in df_Ptroph
df_1215_PM = pd.merge_asof(df_Ptroph_1215, df_1215_ML, left_on='MD', right_on='DEPTH', direction='nearest')

df_1215_PM = df_1215_PM.drop('MD', axis=1)

# The 'result' dataframe now contains the rows from df_944_ML corresponding to depths in df_Ptroph
df_1215_PM.reset_index(drop=True, inplace=True)                       # Reset the index
df_1215_PM

Unnamed: 0,k,phi,DEPTH,GR,AT90,RHOZ,NPHI,DTCO,PEFZ,NMRFF,NMREFF,NMRTOT
0,0.0,0.01,5417.6676,21.410461,15.038757,2.621732,12.225975,62.96756,4.949074,0.008995,0.030131,0.050489
1,0.0,1.4,5418.582,20.519266,12.024226,2.641977,12.942612,62.64548,4.558427,0.013441,0.035606,0.067882
2,27.6,15.2,5423.9922,40.42212,5.905,2.527786,21.469666,62.395718,3.152332,0.000275,0.020538,0.048193
3,0.001,8.5,5429.7834,29.282438,12.040865,2.587283,14.456285,67.21927,5.37447,0.00164,0.019711,0.061156
4,2.79,12.2,5449.9764,22.537579,918.8996,2.477144,17.886215,65.86409,4.3116,0.163715,0.200007,0.200414
5,43.1,16.2,5453.0244,34.25385,1396.3385,2.362352,23.711674,71.64694,4.578998,0.122532,0.139118,0.14221
6,772.0,20.0,5457.9774,21.608595,1293.351,2.365811,21.804495,69.42828,4.903101,0.155981,0.192525,0.192541
7,482.0,21.1,5458.968,18.998669,741.41583,2.480622,17.817705,62.82063,4.384272,0.134886,0.181051,0.181051
8,96.7,16.8,5460.7968,19.865223,1417.8202,2.368061,23.345491,70.70588,5.075385,0.139051,0.167281,0.16729
9,765.0,16.5,5462.7018,20.158842,105.34858,2.438635,18.199104,66.10057,4.676818,0.14695,0.176516,0.176517


In [18]:
df_1215_PM = df_1215_PM.drop_duplicates(subset=['DEPTH', 'GR', 'AT90', 'RHOZ', 'NPHI', 'DTCO', 'PEFZ', 'NMRFF', 'NMREFF', 'NMRTOT'])
df_1215_PM.reset_index(drop=True, inplace=True)                       # Reset the index
df_1215_PM

Unnamed: 0,k,phi,DEPTH,GR,AT90,RHOZ,NPHI,DTCO,PEFZ,NMRFF,NMREFF,NMRTOT
0,0.0,0.01,5417.6676,21.410461,15.038757,2.621732,12.225975,62.96756,4.949074,0.008995,0.030131,0.050489
1,0.0,1.4,5418.582,20.519266,12.024226,2.641977,12.942612,62.64548,4.558427,0.013441,0.035606,0.067882
2,27.6,15.2,5423.9922,40.42212,5.905,2.527786,21.469666,62.395718,3.152332,0.000275,0.020538,0.048193
3,0.001,8.5,5429.7834,29.282438,12.040865,2.587283,14.456285,67.21927,5.37447,0.00164,0.019711,0.061156
4,2.79,12.2,5449.9764,22.537579,918.8996,2.477144,17.886215,65.86409,4.3116,0.163715,0.200007,0.200414
5,43.1,16.2,5453.0244,34.25385,1396.3385,2.362352,23.711674,71.64694,4.578998,0.122532,0.139118,0.14221
6,772.0,20.0,5457.9774,21.608595,1293.351,2.365811,21.804495,69.42828,4.903101,0.155981,0.192525,0.192541
7,482.0,21.1,5458.968,18.998669,741.41583,2.480622,17.817705,62.82063,4.384272,0.134886,0.181051,0.181051
8,96.7,16.8,5460.7968,19.865223,1417.8202,2.368061,23.345491,70.70588,5.075385,0.139051,0.167281,0.16729
9,765.0,16.5,5462.7018,20.158842,105.34858,2.438635,18.199104,66.10057,4.676818,0.14695,0.176516,0.176517


<span style='color:brown'> <span style="font-size:20px;"> **=+=+=+=+=++=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+**</span> </span>

<span style='color:Red'> <span style="font-size:20px;"> 
**1-BRSA-1116-RJS**

<span style='color:blue'> <span style="font-size:15px;"> **Well-log data**:</span> </span>

In [21]:
file_path = '/Users/amirhosseinakhondzadeh/CODE_WELLLOGS/Petrobras Well-log Analysis/Processed Data (out put of preprocessing == Input of ML)/Output of well data, [unique dataframe creation]/df_1116_ML.csv'
df_1116_ML =  pd.read_csv(file_path)
df_1116_ML.rename(columns={'CMFF': 'NMRFF', 'CMRP_3MS': 'NMREFF', 'TCMR': 'NMRTOT'}, inplace=True)
df_1116_ML.drop(columns = ['AT10', 'AT30'], inplace=True) # Replace with the actual column names you want to remove
df_1116_ML.reset_index(drop=True, inplace=True)                       # Reset the index
df_1116_ML

Unnamed: 0,DEPTH,GR,AT90,RHOZ,NPHI,DTCO,PEFZ,NMRFF,NMREFF,NMRTOT
0,5350.7640,29.228598,65.436070,1.911232,0.462064,66.147160,10.000000,0.141804,0.399998,0.675131
1,5350.9165,28.384375,45.512573,1.897772,0.493337,66.147160,10.000000,0.141804,0.399998,0.675131
2,5351.0690,25.366892,35.453316,1.883895,0.503673,68.560080,10.000000,0.146237,0.430924,0.676864
3,5351.2210,21.658792,48.546840,1.892580,0.383231,70.085560,10.000000,0.148821,0.464980,0.681370
4,5351.3735,20.227568,42.771717,1.915861,0.341551,70.682304,9.996842,0.143266,0.485577,0.667513
...,...,...,...,...,...,...,...,...,...,...
2900,5793.9434,44.209950,10.082510,2.714987,0.070175,66.917780,5.404206,0.011153,0.042941,0.049551
2901,5794.0957,44.209950,10.135521,2.714987,0.070175,66.917780,5.404206,0.011153,0.042941,0.049551
2902,5794.2480,44.209950,10.204328,2.714987,0.070175,66.917780,5.404206,0.011153,0.042941,0.049551
2903,5794.4004,44.209950,10.170754,2.714987,0.070175,66.917780,5.404206,0.011153,0.042941,0.049551


<span style='color:blue'> <span style="font-size:15px;"> **Trajectory data**:</span> </span>

In [34]:
def remove_lines_from_file(input_file, output_file, lines_to_remove):
    with open(input_file, 'r') as file:
        lines = file.readlines()

    # Remove specified lines
    lines = [line for i, line in enumerate(lines, 1) if i not in lines_to_remove]

    with open(output_file, 'w') as file:
        file.writelines(lines)

# Specify the input file, output file, and line numbers to remove
input_file = '/Users/amirhosseinakhondzadeh/@ DATA/1-BRSA-1116-RJS/Dados Direcionais/1-brsa-1116-rjs_direcionais.txt'
output_file = '/Users/amirhosseinakhondzadeh/@ DATA/1-BRSA-1116-RJS/Dados Direcionais/output.txt'
lines_to_remove = list(range(0, 25))     # remove to line 

# Call the function to remove the specified lines
remove_lines_from_file(input_file, output_file, lines_to_remove)

#======================== loading new file
file_path = '/Users/amirhosseinakhondzadeh/@ DATA/1-BRSA-1116-RJS/Dados Direcionais/output.txt'

# Read the file into a Pandas DataFrame using space as the delimiter
df = pd.read_csv(file_path, delim_whitespace=True)   
df = df.iloc[1:-1]    # Remove the and first last row

#======================== loading new file
df = df[['PROFUNDIDADE', 'PROFUNDIDADE.1', 'INCLINACAO', 'AZIMUTE']]
df = df.rename(columns={'PROFUNDIDADE': 'MD'})
df = df.rename(columns={'PROFUNDIDADE.1': 'TVD'})
df = df.rename(columns={'INCLINACAO': 'INC'})
df = df.rename(columns={'AZIMUTE': 'AZI'})
df.reset_index(drop=True, inplace=True)                       # Reset the index
df

Unnamed: 0,MD,TVD,INC,AZI
0,2040.00,2040.00,0.00,0.00
1,2054.00,2054.00,0.03,328.39
2,2090.00,2090.00,0.09,148.26
3,2142.00,2142.00,0.03,279.54
4,2179.00,2179.00,0.10,189.95
...,...,...,...,...
74,5771.00,5770.97,0.72,67.15
75,5810.00,5809.95,1.87,69.05
76,5848.00,5847.93,2.25,70.53
77,5921.00,5920.87,2.42,74.33


<span style='color:blue'> <span style="font-size:15px;"> **Petrophysics data**:</span> </span>

In [42]:
# Replace 'your_file.xlsx' with the path to your Excel file
file_path = '/Users/amirhosseinakhondzadeh/@ DATA/1-BRSA-1116-RJS/Dados de Rochas e Fluidos/PETROFISICABASICA/1BRSA1116RJS_Petrofisicabasica_3.xlsx'
df_Ptroph_1116 = pd.read_excel(file_path)

# Extract the desired columns and sort by "Profundidade" while dropping NaN values
df_Ptroph_1116 = df_Ptroph_1116[["Profundidade", "Permeab. Long. (mD)", "Porosidade %"]].sort_values("Profundidade", ascending=True).dropna()

#df_Ptroph = df_Ptroph.iloc[102:164]  # Extract rows A to B (inclusive)
df_Ptroph_1116 = df_Ptroph_1116.reset_index(drop=True)  # Reset the index after sorting


# 1) Filter out rows with '-' in the "Porosidade%" column
# 2) Now, you can replace ',' with '.' and convert the "Porosidade%" column to float
#df_Ptroph = df_Ptroph[df_Ptroph["Porosidade%"] != '-'] # 1
df_Ptroph_1116["Porosidade %"] = df_Ptroph_1116["Porosidade %"].str.replace(',', '.').astype(float) # 2
df_Ptroph_1116["Permeab. Long. (mD)"] = df_Ptroph_1116["Permeab. Long. (mD)"].str.replace(',', '.').astype(float) # 2

#=====
"""
# Continue with your code
DEPTH_lab = df_Ptroph_944["Profundidade"]
K_lab = df_Ptroph_944["Permeab. Long. (mD)"]
PHI_lab = df_Ptroph_944["Porosidade %"]
"""

df_Ptroph_1116 = df_Ptroph_1116.rename(columns={'Profundidade': 'MD', 'Permeab. Long. (mD)': 'k', 'Porosidade %': 'phi'})
df_Ptroph_1116

Unnamed: 0,MD,k,phi
0,5479,6.88,12.8
1,5502,0.07,4.9
2,5517,0.011,4.8
3,5535,3.21,8.9
4,5538,141.0,13.5
5,5544,1.28,11.7
6,5547,43.1,15.6
7,5550,0.456,12.3
8,5556,115.0,18.5
9,5559,54.0,15.1


<span style='color:blue'> <span style="font-size:15px;"> **Build a well-log dataframe based on Petrophysical dataframe**:</span> </span>

In [46]:
depth_range = (df_Ptroph_1116['MD'].min(), df_Ptroph_1116['MD'].max())

# Convert to strings, replace commas with dots, and convert to floats
initial_depth = float(str(depth_range[0]).replace(',', '.'))
final_depth = float(str(depth_range[1]).replace(',', '.'))

print("Initial Depth:", initial_depth)
print("Final Depth:", final_depth)


Initial Depth: 5479.0
Final Depth: 5936.0


Limit the well-log data [defining the upper and lower limit] for well-log data based on petrophyscial data

In [45]:
# Filter the DataFrame to include rows with 'MD' within the depth range
df_1116_ML = df_1116_ML[(df_1116_ML['DEPTH'] >= depth_range[0]) & (df_1116_ML['DEPTH'] <= depth_range[1])]
df_1116_ML

TypeError: Invalid comparison between dtype=float64 and str

In [None]:
# Calculate the depth range from df_1116_ML
min_depth_df = df_1116_ML['DEPTH'].min()
max_depth_df = df_1116_ML['DEPTH'].max()

# Check if the minimum depth in df_1116_ML is higher than depth_range[0]
if min_depth_df > depth_range[0]:
    initial_depth = min_depth_df

# Check if the maximum depth in df_1116_ML is lower than depth_range[1]
if max_depth_df < depth_range[1]:
    final_depth = max_depth_df

print("Initial Depth:", initial_depth)
print("Final Depth:", final_depth)

# Filter the DataFrame to include rows with 'MD' within the updated depth range
df_1116_ML = df_1116_ML[(df_1116_ML['DEPTH'] >= initial_depth) & (df_1116_ML['DEPTH'] <= final_depth)]
df_1116_ML


================================================================================================================================================================================================================================================

In [98]:
def remove_lines_from_file(input_file, output_file, lines_to_remove):
    with open(input_file, 'r') as file:
        lines = file.readlines()

    # Remove specified lines
    lines = [line for i, line in enumerate(lines, 1) if i not in lines_to_remove]

    with open(output_file, 'w') as file:
        file.writelines(lines)

# Specify the input file, output file, and line numbers to remove
input_file = '/Users/amirhosseinakhondzadeh/@ DATA/1-BRSA-1115-RJS/Dados Direcionais/1-BRSA-1115-RJS_direcionais.txt'
output_file = '/Users/amirhosseinakhondzadeh/@ DATA/1-BRSA-1115-RJS/Dados Direcionais/output.txt'
lines_to_remove = list(range(1, 25))     # remove to line 25

# Call the function to remove the specified lines
remove_lines_from_file(input_file, output_file, lines_to_remove)

#======================== loading new file
file_path = '/Users/amirhosseinakhondzadeh/@ DATA/1-BRSA-1115-RJS/Dados Direcionais/output.txt'

# Read the file into a Pandas DataFrame using space as the delimiter
df = pd.read_csv(file_path, delim_whitespace=True)
df = df = df.drop(index=list(range(0, 1))).iloc[:-1]      # Remove the first row AND Remove the last row
df

Unnamed: 0,PROFUNDIDADE,INCLINACAO,PROFUNDIDADE.1,AZIMUTE,RUMO,AFAST,N/S,AFAST.1,E/W,LATITUDE,LONGITUDE,UTM,NORTE,UTM.1,ESTE
1,1369.00,0.00,1369.00,0.00,(N00.00E),0.00,0.00,-24:13:32.126,-42:26:40.072,7318411.84,759529.56,,,,
2,1395.00,0.17,1395.00,15.67,(N15.67E),0.04,0.01,-24:13:32.124,-42:26:40.072,7318411.88,759529.57,,,,
3,1417.00,0.25,1417.00,101.54,(S78.46E),0.06,0.07,-24:13:32.124,-42:26:40.070,7318411.90,759529.63,,,,
4,1452.00,0.65,1452.00,82.13,(N82.13E),0.07,0.34,-24:13:32.123,-42:26:40.060,7318411.91,759529.90,,,,
5,1477.00,0.62,1477.00,66.70,(N66.70E),0.14,0.60,-24:13:32.121,-42:26:40.051,7318411.98,759530.16,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
171,6367.00,4.12,6364.92,146.80,(S33.20E),-48.56,45.42,-24:13:33.676,-42:26:38.432,7318363.28,759574.98,,,,
172,6395.00,4.30,6392.84,147.52,(S32.48E),-50.29,46.53,-24:13:33.732,-42:26:38.391,7318361.55,759576.09,,,,
173,6423.00,4.12,6420.77,148.83,(S31.17E),-52.04,47.62,-24:13:33.788,-42:26:38.352,7318359.80,759577.18,,,,
174,6452.00,4.46,6449.69,144.78,(S35.22E),-53.85,48.81,-24:13:33.846,-42:26:38.308,7318357.99,759578.37,,,,


In [99]:
df = df[['PROFUNDIDADE.1', 'INCLINACAO', 'AZIMUTE']]
df = df.rename(columns={'PROFUNDIDADE.1': 'TVD'})
df = df.rename(columns={'INCLINACAO': 'INC'})
df = df.rename(columns={'AZIMUTE': 'AZI'})
df

Unnamed: 0,TVD,INC,AZI
1,1369.00,0.00,0.00
2,1395.00,0.17,15.67
3,1417.00,0.25,101.54
4,1452.00,0.65,82.13
5,1477.00,0.62,66.70
...,...,...,...
171,6364.92,4.12,146.80
172,6392.84,4.30,147.52
173,6420.77,4.12,148.83
174,6449.69,4.46,144.78



<span style='color:blue'> <span style="font-size:15px;"> **Well-log data**:</span> </span>

In [100]:
import pandas as pd

# Replace 'your_file.xlsx' with the path to your Excel file
file_path = '/Users/amirhosseinakhondzadeh/@ DATA/1-BRSA-1115-RJS/Dados de Rochas e Fluidos/PETROFISICABASICA/1-BRSA-1115-RJS_PETROFISICABASICA.xls'
df_Ptroph = pd.read_excel(file_path)

# Extract the desired columns and sort by "Profundidade" while dropping NaN values
df_Ptroph = df_Ptroph[["Profundidade", "Permeab. Long. (mD)", "Porosidade %"]].sort_values("Profundidade", ascending=True).dropna()

# Replace '-' with NaN in the entire DataFrame
df_Ptroph = df_Ptroph.replace('-', float('nan'))

# Convert the "Porosidade %" column to strings
df_Ptroph["Porosidade %"] = df_Ptroph["Porosidade %"].astype(str)

# 2) Now, you can replace ',' with '.' and convert the "Porosidade%" column to float
df_Ptroph["Porosidade %"] = df_Ptroph["Porosidade %"].str.replace(',', '.').astype(float)

# Continue with your code
DEPTH_lab = df_Ptroph["Profundidade"]
K_lab = df_Ptroph["Permeab. Long. (mD)"]
PHI_lab = df_Ptroph["Porosidade %"]

df_Ptroph = df_Ptroph.rename(columns={'Profundidade': 'MD', 'Permeab. Long. (mD)': 'k', 'Porosidade %': 'phi'})

df_Ptroph

Unnamed: 0,MD,k,phi
2,4781.2,0.0,4.47
16,5482.0,0.0,2.61
6,5500.8,0.0,1.18
12,5579.0,0.0,0.38
0,6099.8,0.0,0.5
5,6122.3,0.0,0.25
1,6170.5,0.0,0.98
10,6188.0,0.013,4.6
11,6194.0,0.088,12.3
14,6249.4,0.0,0.3


<span style='color:blue'> <span style="font-size:15px;"> **Well-log data**:</span> </span>

In [101]:
file_path = '/Users/amirhosseinakhondzadeh/CODE_WELLLOGS/Petrobras Well-log Analysis/Processed Data (out put of preprocessing == Input of ML)/Output of well data, [unique dataframe creation]/df_944_ML.csv'
df_944_ML =  pd.read_csv(file_path)
df_944_ML.rename(columns={'CMFF': 'NMRFF', 'CMRP_3MS': 'NMREFF', 'TCMR': 'NMRTOT'}, inplace=True)
df_944_ML

Unnamed: 0,DEPTH,GR,AT10,AT30,AT90,RHOZ,NPHI,DTCO,PEFZ,NMRFF,NMREFF,NMRTOT
0,5488.6860,18.344227,229.749070,73.139640,152.231600,2.493285,0.181833,67.179180,6.066652,0.109875,0.143338,0.143341
1,5488.8384,15.589806,183.014630,83.015100,96.225930,2.427294,0.209300,66.437190,5.704009,0.138243,0.169972,0.169975
2,5488.9907,15.148575,128.323290,60.429356,64.343010,2.410696,0.214533,65.967220,5.369229,0.134214,0.163915,0.165086
3,5489.1430,14.718891,76.603134,43.963387,34.749203,2.413879,0.210541,64.888306,5.229071,0.117416,0.150267,0.155419
4,5489.2954,14.587625,72.992805,45.506030,34.080303,2.422581,0.214654,64.874230,5.184336,0.097082,0.134775,0.139856
...,...,...,...,...,...,...,...,...,...,...,...,...
2011,5795.1626,20.780230,6.563786,3.358152,3.046565,2.463385,0.139730,64.736520,5.335359,0.001222,0.001480,0.001480
2012,5795.3150,22.939886,6.563786,3.358152,3.046565,2.452042,0.147132,66.271866,5.288734,0.001352,0.001640,0.001685
2013,5795.4673,23.912481,6.563786,3.358152,3.046565,2.450901,0.161174,66.790855,5.273507,0.001352,0.001640,0.001685
2014,5795.6196,22.685303,6.563786,3.358152,3.046565,2.448614,0.170256,66.684540,5.314323,0.001450,0.001748,0.001813


======

<span style='color:blue'> <span style="font-size:15px;"> **Print the best hyperparameters**:</span> </span>

Combination that gives **the highest accuracy (coefficient of determination) during the cross-validation**

## Finalized GB Model

Create the tuned Gradient Boosting 

<span style='color:gray'> <span style="font-size:20px;">**Evaluation of the Tuned Models and Visualization of Results**</span> </span>

We consider the Test Dataset or, also called, the Hold-Out Dataset (20% of the Original Dataset) and we perform the prediction on this Dataset which is the "unseen" Dataset.


<span style='color:gray'> <span style="font-size:30px;">**PLOTS**</span> </span>