# Hydrocarbon prospective well logs analysis and curve reconstruction with Python/Sklearn.

By: Miguel La Cruz.

In [1]:
#Importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
import seaborn as sns
import functions

In [7]:
#Import wells data
Well_1 = pd.read_csv("../data/raw/Well_1_mincols.csv", sep=";")
Well_3 = pd.read_csv("../data/processed/Well_3_modeldata.csv", sep=";")

As we know, ***machine learning*** has endless applications and we can use it to develop optimal solutions to diverse problems.

In the field of petrophysics has a considerable utility and everyday is in constant growth, applied to the cluster analysis to determine sedimentary facies, generation of synthetic well curves, and so on.

The present analysis approachs the posibility of repairing low cuality curves (logs) available of a well, either for presence of washouts or other factors that may disturb the response of before mentioned curves.

We will evaluate two wells ("Well_1", "Well_3").

Logs we will be using:

* **DEPTH**: Depth in (Ft)
* **CAL**: Caliper diameter in (In)
* **NPHI**: Neutron porosity in (Frac)
* **DT**: Sonic in (ms/in)
* **GR**: Gamma ray in ÂºAPI
* **RHOB**: Density in (g/cm3)
* **RLLD**: Deep electric log in (ohm/mts)
* **RT**: Electric log in (ohm/mts)
* **BS**: Bit size in (In)

**Columns (Well_1)**

<img src="img/excel_img_2.png" alt="eximg_1" width="700"/>

* #### **Step one: Data from "object" to "float"**

Data is received in .las format, and when we transform it to .csv format in order to work with it easier the formats of columns are changed, therefore we need to transform columns to float so we can operate with them.

<br>

* #### **Step two: Repair errors caused by data transforming**

When we transform data to .csv errors in the data appear, as exaggerated values (with more "0s"), so we have to correct it.

<br>

* #### **Step three: Preliminar visualization**

Brief visualization to the state of the data to check for anomalies and nulls.

<br>

* #### **Step four: Drop nulls**

Drop null values, as we don't have too much of it.

<br>

* #### **Step five: Curve visualization on tracks**

We proceed to generate tracks to visualize features.

<br>

* #### **Step six: Creating flags**

We create "flags" of data where NPHI is above 0.30 (too high porosities) and below 0 (nulls) so we can look where these values could be found and then be able to determine causes, and if we have to eliminate them (anomalies caused by faulty data).

<br>

* #### **Step seven: Drop abnormal values**

We proceed to eliminate abnormal values.

<br>

* #### **Step eight: Selecting models**

We try multiple models and select the one which repair better the NPHI curve without being overfitted. We use "MAE" as error metric (Mean absolute error).

<br>

* #### **Step nine: We use better model to predict NPHI in both wells**

We use better model to predict NPHI in both wells, having previously determined that a model could work for both wells (same field wells with similar values and data distribution)

<br>

* #### **Step ten: Curve repairing interpretation**

We analize curve repairing and what we could need to improve the results.
