<a class="anchor" id="up-bullet"></a>

# Generation of Synthetic Sonic Log Data Using Random Forest Algorithm at the Lagoa Parda Field in Espirito Santo, Brasil

* __Author__: Gabriel Senra
* __Date__: 10/02/2022


- [2. Imports](#second-bullet)


- [3. Methods](#thrid-bullet)


- [4. Split DataSets](#fourth-bullet)


- [5. Data Cleaning](#fifth-bullet)


- [6. Build Machine Learning Models](#sixth-bullet)


- [7. References](#seventh-bullet)    

<a class="anchor" id="first1-bullet"></a>

## 1.3. Data Decription 

- Curve: DEPT, Units: M, Description: Measured Depth
- Curve: CALI, Units: in, Description: CALIPER
- Curve: DT, Units: us/ft, Description: DELTA-T (ALSO CALLED SLOWNESS OR INTERVAL TRANSIT TIME)
- Curve: GR, Units: gAPI, Description: GAMMA RAY
- Curve: RHOB, Units: g/cm3, Description: BULK DENSITY
- Curve: NPHI, Units: %, Description: THERMAL NEUTRON POROSITY (ORIGINAL RATIO METHOD) IN SELECTED LITHOLOGY
- Curve: ILD, Units: ohm.m, Description: INDUCTION DEEP RESISTIVITY

<a class="anchor" id="first4-bullet"></a>

## 1.4. Evaluation Metric

We will be evaluated by the metirc Root Mean Squared Error and r².

### The RMSE is calculated as:

### RMSE = $\sqrt{\frac{1}{n}\Sigma_{i=1}^{n}{\Big(\frac{d_i -f_i}{\sigma_i}\Big)^2}}$

Where:

- "di" is the predicted curve for DT
- "fi" is the true value for evaluation.

### R² (Variance Explained) is calculated as:

## $R^2 = \frac {{SS}_{regression}}{{SS}_{total}} = 1 - \frac{\sum_{i}({y}_{i} - \hat{y}_{i})^2}{\sum_{i}({y}_{i} - \bar{y}_{i})^2}$

# 2. Imports

In [1]:
# import libraries
import pandas as pd
import numpy as np
import seaborn as sb
import plotly.express as px
import missingno as msno
import matplotlib.pyplot as plt
import lasio
import os

<a class="anchor" id="thrid-bullet"></a>

[Up](#up-bullet)

# 3. Methods

In [2]:
#Seleciona os poços que contem a lista de minemonicos de interesse
def selectMinemonico(listaDB, minemonicos, selectedTrainingList):
    for i in range (len(listaDB)):
        count = 0
        for j in range(len(minemonicos)):
            if (minemonicos[j] in listaDB[i].columns):
                count = count+1
        if count == len(minemonicos):
            selectedTrainingList.append(listaDB[i])

In [3]:
#Função padroniza o nome das colunas, recebe duas lista (Atual e Ideal).
def standartazeColumns(atual, ideal):
    global well
    for j in range (len(well.columns)):
        for s in range(len(atual)):
            if well.columns[j] == atual[s]:
                well = well.rename(columns={atual[s]: ideal[s]})

In [4]:
#Aplica o modelo preditivo
def grid_search(clf, param_grid, X_train, y_train):

    grid = GridSearchCV(estimator=clf,
                        param_grid=param_grid, 
                        scoring='r2', 
                        cv=5)
    grid.fit(X_train, y_train.ravel())
    print("R²:")
    print(grid.best_score_) #ROC AUC
    
    return grid.best_estimator_

In [5]:
#Visualização de dados

def result_plot(y_predict, y_real):
    # check the accuracy of predicted data and plot the result
    print('Root Mean Square Error is:', '{:.5f}'.format(np.sqrt(mean_squared_error(y_real, y_predict))))
    plt.subplots(figsize=(42,12))
    plt.subplot(2, 2, 1)
    plt.plot(y_real[:])
    plt.plot(y_predict[:])
    plt.legend(['True', 'Predicted'])
    plt.xlabel('Sample')
    plt.ylabel('DT')
    plt.title('DT Prediction Comparison')
    
    plt.subplot(2, 2, 3)
    plt.scatter(y_real[:], y_predict[:])
    plt.xlabel('Real Value')
    plt.ylabel('Predicted Value')
    plt.title('DT Prediction Comparison')
    
    plt.show()

def wellLogPlot(df):
    fig, ax = plt.subplots(figsize=(24,42))

    #Set up the plot axes
    ax1 = plt.subplot2grid((1,4), (0,0), rowspan=1, colspan = 1)
    ax2 = plt.subplot2grid((1,4), (0,1), rowspan=1, colspan = 1)
    ax3 = plt.subplot2grid((1,4), (0,2), rowspan=1, colspan = 1)
    ax4 = plt.subplot2grid((1,4), (0,3), rowspan=1, colspan = 1)
    
    # As our curve scales will be detached from the top of the track,
    # this code adds the top border back in without dealing with splines
    ax8 = ax1.twiny()
    ax8.xaxis.set_visible(False)
    ax9 = ax2.twiny()
    ax9.xaxis.set_visible(False)
    ax10 = ax3.twiny()
    ax10.xaxis.set_visible(False)
    ax11 = ax4.twiny()
    ax11.xaxis.set_visible(False)

    # Gamma Ray track
    ax1.plot("GR", "DEPT", data = df, color = "green",linewidth=3)
    ax1.set_xlabel("Gamma",size = 24)
    ax1.xaxis.label.set_color("green")
    ax1.set_ylabel("Depth (m)",size = 24)
    ax1.tick_params(axis='x', colors="green",size = 24)
    ax1.spines["top"].set_edgecolor("green")
    ax1.title.set_color('green')
    ax1.set_xticks([0, 50, 100, 150, 200])

    # Density track
    ax4.plot("ILD", "DEPT", data = df, color = "red",linewidth=3)
    ax4.set_xlabel("ILD",size = 24)
    ax4.xaxis.label.set_color("red")
    ax4.tick_params(axis='x', colors="red",size = 24)
    ax4.spines["top"].set_edgecolor("red")

    # Sonic track
    ax3.plot("DT", "DEPT", data = df, color = "purple",linewidth=3)
    ax3.set_xlabel("Sonic",size = 24)
    ax3.xaxis.label.set_color("purple")
    ax3.tick_params(axis='x', colors="purple",size = 24)
    ax3.spines["top"].set_edgecolor("purple")

    # Sonic track
    ax2.plot("CALI", "DEPT", data = df, color = "darkgray",linewidth=3)
    ax2.set_xlabel("Caliper",size = 24)
    ax2.xaxis.label.set_color("darkgray")
    ax2.tick_params(axis='x', colors="darkgray",size = 24)
    ax2.spines["top"].set_edgecolor("darkgray")

    # Common functions for setting up the plot can be extracted into
    # a for loop. This saves repeating code.
    for ax in [ax1, ax2, ax3, ax4]:
        ax.grid(which='major', color='lightgrey')
        ax.xaxis.set_ticks_position("top")
        ax.xaxis.set_label_position("top")
        ax.spines["top"].set_position(("axes", 1.02))


    plt.tight_layout()
    plt.show()

<a class="anchor" id="fourth-bullet"></a>

# 4. Split Datasets

In order to syntesise the Sonic Curve we need to split our dataset betwin train and test. We want to do this before any substantial visualizations that way we can avoid biases inherent to the visualization process

- In the code below, I identify which wells have DT and which need to be predicted

In [6]:
%cd "C:\Users\James Bond\Desktop\AI\LP Updated"

#Todos os bancos de dados disponíveis
lasList = pd.DataFrame(os.listdir())[0].tolist()
lasListWell = []

#Poços a serem utilizados como treino (Com DT)
training_list = []

#Defino os possiveis nomes a serem padronizados e o nome padronizado
atual = ["MSFL","RXOZ","MDT","RHOZ","CAL","HCAL","DEPTH"]
ideal = ["RXO","RXO","DT","RHOB","CALI","CALI","DEPT"]

#Seperar os dados a serem pré-processados
for i in range (len(lasList)):

    #Ler arquivo las e de seus minemônicos
    las = lasio.read(lasList[i])
    well = (las.df()).reset_index()
    well['wellName'] = las.well.WELL.value
    
    #Padronizar
    standartazeColumns(atual, ideal)

    lasListWell.append(well)

    #Separar os poços com e sem DT
    for j in range(len(well.columns)):
        if well.columns[j] == "DT":
            training_list.append(lasListWell[i])

print("There are", len(lasList), "well logs in Lagoa Parda, of which", len(training_list),"have registered DT, other", len(lasList)-len(training_list),"wells do not have DT.")

C:\Users\James Bond\Desktop\AI\LP Updated
There are 88 well logs in Lagoa Parda, of which 29 have registered DT, other 59 wells do not have DT.


<a class="anchor" id="fifth-bullet"></a>

[Up](#up-bullet)

# 5. Data Cleaning

- Select minemonics of interest, handle null and inconsistent values

In [11]:
#Seleciona os poços que contem o minemonicos de interesse
minemonicos = ['NPHI', 'GR', 'CALI', 'DT', 'DEPT', 'RHOB', 'ILD', 'wellName']
selectedTrainingListWell = []

selectMinemonico(training_list, minemonicos, selectedTrainingListWell)

#Tratamento dos valores faltantes, inconsistentes e seleciono apenas os minemônicos de interesse
df = []
for i in range(len(selectedTrainingListWell)):
    #selectedTrainingListWell[i] = (selectedTrainingListWell[i].dropna())
    #selectedTrainingListWell[i] = selectedTrainingListWell[i][minemonicos]
    selectedTrainingListWell[i] = selectedTrainingListWell[i][(selectedTrainingListWell[i]['DT'] < 170)].reset_index(drop = True)
    if len(selectedTrainingListWell[i]) != 0:
        df.append(selectedTrainingListWell[i])
        
df = df.concat()

AttributeError: 'list' object has no attribute 'concat'

<a class="anchor" id="sixth-bullet"></a>

[Up](#up-bullet)

In [None]:
msno.bar(df[1])

In [None]:
msno.matrix(df[1])

In [None]:
msno.heatmap(df[1])

In [None]:
df[1].info()

In [None]:
df[1].describe()

In [None]:
#Scatterplot matrix
fig = px.scatter_matrix(df[1], dimensions=['NPHI', 'GR', 'CALI','DT', 'RHOB','ILD'],
labels={col:col.replace('_', ' ') for col in df[1].columns},           height=900, color="DEPT", color_continuous_scale=px.colors.diverging.Tealrose)
fig.show()

In [None]:
df[1][['ILD','GR','RHOB','DT','CALI','NPHI']].hist(bins=40, figsize=(20, 15))

In [None]:
fig = px.box(df[1], x="DT",
color_discrete_sequence=px.colors.qualitative.Dark24,
labels={col:col.replace('_', ' ') for col in df[1].columns},
category_orders={})
fig.update_layout(legend=dict(orientation="h", yanchor="bottom",
y=1.02, xanchor="right", x=1))
fig.show()

In [None]:
df[1].corr()

In [None]:
import seaborn as sns

corr = df[1][['NPHI', 'GR', 'CALI','DT', 'RHOB', 'ILD']].corr()

mask = np.triu(np.ones_like(corr, dtype=bool))

f, ax = plt.subplots(figsize=(11, 9))
cmap = sns.diverging_palette(230, 20, as_cmap=True)
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .5})

In [None]:
for s in range(len(df)):
    
    poco = df[s]['wellName'][0]
    print('Poço:',poco)
    
    wellLogPlot(df[s])
    
    ### Standardizing data for next iteration ###
    df[s]['wellName'] = poco

<a class="anchor" id="seventh-bullet"></a>

[Up](#up-bullet)

## 7. References:

https://github.com/pddasig/Machine-Learning-Competition-2020/blob/master/Synthetic%20Sonic%20Log%20Generation%20Starter_Yu%202_27_2020.ipynb
    
https://github.com/andymcdgeo/Petrophysics-Python-Series/blob/master/14%20-%20Displaying%20Lithology%20Data.ipynb
    
https://github.com/andymcdgeo/Petrophysics-Python-Series/blob/master/05%20-%20Petrophysical%20Calculations.ipynb
    
https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html
    
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html